TeamCity 4.x: Exception in thread "Lucene Merge Thread #0", Negative seek offset ?

Permanently deleted user

Created August 11, 2010 12:02

Hello,

my customer (ebay motors) has the following problems with TeamCity 4.x now:

Symtoms:
- no build triggering seems to work
- after the "java.io.IOException: background merge hit exception" first occur, 10 Hours of TC log output are missing, and it most likely was not working at all. But TC was out NOT after the first such exception, and i am not sure when did it start
- now we have this exception each 10-15 minutes, also regular outOfMemory in TC plugins

TeamCity server info:
- the system runs under Linux Debian Lenny with 3 Gb Memory
- teamcity tomcat takes 1.5 Gb in mem, (and i dont know if it is normal for our server)
- only 150 Gb of free memory left
- there are also two apache2 process running, each takes about 220mb memory. As far as i know they are also a part of teamcity runtime, but i may be wrong here

TC tomcat log:

java.io.IOException: background merge hit exception: _1e4g:C529808 _3dv7:C190561 _3dwf:C303 _3dwg:C42 _3dwh:C16->_3dwh into _3dwi [optimize]
        at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2258)
        at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2203)
        at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2183)
        at jetbrains.buildServer.serverSide.search.SearchService.optimizeIndex(SearchService.java:237)
        at jetbrains.buildServer.serverSide.search.BackgroundIndexer$1.run(BackgroundIndexer.java:55)
        at jetbrains.buildServer.serverSide.impl.cleanup.ServerCleanupManagerImpl.executeWithInactiveCleanup(ServerCleanupManagerImpl.java:24)
        at jetbrains.buildServer.serverSide.impl.cleanup.ServerCleanupManagerImpl$$FastClassByCGLIB$$ba2c8525.invoke(<generated>)
        at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149)
        at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:700)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
        at jetbrains.buildServer.serverSide.impl.auth.TeamCityMethodSecurityInterceptor.invoke(TeamCityMethodSecurityInterceptor.java:33)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.springframework.aop.framework.Cglib2AopProxy$FixedChainStaticTargetInterceptor.intercept(Cglib2AopProxy.java:582)
        at jetbrains.buildServer.serverSide.impl.cleanup.ServerCleanupManagerImpl$$EnhancerByCGLIB$$3a4e3dbb.executeWithInactiveCleanup(<generated>)
        at jetbrains.buildServer.serverSide.search.BackgroundIndexer.run(BackgroundIndexer.java:44)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Negative seek offset
        at java.io.RandomAccessFile.seek(Native Method)
        at org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:591)
        at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
        at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)
        at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:92)
        at org.apache.lucene.store.IndexOutput.copyBytes(IndexOutput.java:172)
        at org.apache.lucene.index.FieldsWriter.addRawDocuments(FieldsWriter.java:249)
        at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:350)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4226)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3877)
        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:260)
Exception in thread "Lucene Merge Thread #0" org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: Negative seek offset
        at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:309)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:286)
Caused by: java.io.IOException: Negative seek offset
        at java.io.RandomAccessFile.seek(Native Method)
        at org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirectory.java:591)
        at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
        at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)
        at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:92)
        at org.apache.lucene.store.IndexOutput.copyBytes(IndexOutput.java:172)
        at org.apache.lucene.index.FieldsWriter.addRawDocuments(FieldsWriter.java:249)
        at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:350)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4226)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3877)
        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:260)

Questions:
- what does the exception mean?
- which is the best solution?
- which is a normal memory usage for TC with VERY large number of agents / Projects ?
- how to prevent such problem?

thank you // Dimitri Uwarov

8 comments

Permanently deleted user

Created August 11, 2010 14:22

Hi Dimitri,

- what does the exception mean?
The indices seem to be corrupted. You can safely delete the search caches when the server is down.
Also this exception is affecting only search, all other problems are likely to be unrelated.

- which is the best solution?
We don't plan to integrate fixes in TeamCity 4.x, so I would suggest you to upgrade to the latest TeamCity (5.1.3 as of now).

- which is a normal memory usage for TC with VERY large number of agents / Projects ?
Could you please specify how many agents / projects you have?

- how to prevent such problem?
Most probable cause of the error is disk corruption. Can you check your hard disk?

Permanently deleted user

Created August 11, 2010 14:49

Hello Maxim,

The indices seem to be corrupted. You can safely delete the search  caches when the server is down.
Also this exception is affecting  only search, all other problems are likely to be unrelated.
- How do i delete the corrupt indices, where are they located?

We don't plan to integrate fixes  in TeamCity 4.x, so I would suggest you to upgrade to the latest  TeamCity (5.1.3 as of now).
- we did, but older servers are still heavily involved in production process

Could you please specify how many agents /  projects you have?
- 27 Agents, 52 projects and 692 build configurations.

Most probable cause of the error is disk corruption. Can you check your  hard disk?
- you mean the problem of "corrupted indices"?

What about the failing triggers and othe misbehavior. We thinks, that is just a memory problem (100-200 mb left) and we think restart will fix it (restart in our case is a big decigion, because of the scale of our integration).
The question is also: do you recommend regular restarts of TC servers (maybe once every 10 days). Or is huge (1.5 gb) memory uasge always a sign of some bug (memory leak) ?

Permanently deleted user

Created August 11, 2010 15:05

Hi Dimitri,

- How do i delete the corrupt indices, where are they located?
Check your .BuildServer/system/caches/search directory.

- we did, but older servers are still heavily involved in production process
Is the problem only with older versions of TeamCity?

- 27 Agents, 52 projects and 692 build configurations
It's not that big, I think the server should consume less memory (unless each agent is running a build with 100M log each minute).
Usually 1.5G indicates memory leaks, so the restart should help for some time.
You can try to take a memory-snapshot from the server and send it over, but again the best solution would to upgrade the servers.

As for other problems, each of them requires additional investigation. Please send us the logs.

Permanently deleted user

Created August 11, 2010 15:50

Is the problem only with older versions of TeamCity?
I dont know, i am not managing this TC' servers.

You can try to take a memory-snapshot from the server and send it over ...
Considering that i have only ssh access, which method to make a memory snapshot would you suggest?

- Please send us the logs.
i maybe contact you later with the logs. For now we go with TC restart and i would make a snapshot, just tell me which method should i use.

Permanently deleted user

Created August 16, 2010 17:48

Hi Dimitri,

Please take a look: http://confluence.jetbrains.net/display/TCD5/Reporting+Issues

Permanently deleted user

Created September 08, 2010 11:20

Will do. I am currently doing an upgrade to 5.1.3, hope the issue will no longer exist.

Permanently deleted user

Created September 09, 2010 09:10

Thanks Dimitri,

Please write back the results.
I'll mention again: you can safely delete the search caches when the server is down.

---
Maxim

Permanently deleted user

Created September 17, 2010 20:25

Hello Maxim,

thank you for the info and i did remove the caches on many occasions now, which soved that and many other issues. So, it helped.

kind regards // dimitri

Please sign in to leave a comment.