100% cpu usage and really bad performance
Hi,
After upgrading TeamCity from 5.0.x to 5.1.1 we experience really bad performance. The teamcity "Tomcat" process keeps running at 100% cpu and has been doing so for the last week. Nobody is using the server and no build is in progress.
Any ideas how to find out what is causing it and what to do about it?
jstack dump attached.
Attachment(s):
jstack-teamcity-tomcat-2010-05-11.txt.zip
Please sign in to leave a comment.
Please try to take several thread dumps from Server Configuration -> Diagnostic page with 2-5 seconds interval and attach to this thread.
When trying to shutdown the server I got the following in the teamcity-server.log, might or might not be related:
[2010-05-11 15:35:07,116] ERROR - verSide.impl.BuildServerRunner - java.lang.OutOfMemoryError: Java heap space
[2010-05-11 15:35:19,662] INFO - verSide.impl.BuildServerRunner - Start shutdown Ping
[2010-05-11 15:35:22,134] INFO - verSide.impl.BuildServerRunner - Start shutdown Flush Queue
[2010-05-11 15:35:22,135] INFO - verSide.impl.BuildServerRunner - Start shutdown Vcs Monitor
[2010-05-11 15:35:31,279] WARN - jetbrains.buildServer.SERVER - Waiting for VCS changes loader to complete
[2010-05-11 15:35:59,436] INFO - verSide.impl.BuildServerRunner - Start shutdown Build Triggers Monitor
OutOfMemory seems a bit odd as TEAMCITY_SERVER_MEM_OPTS is set to "-Xms768m -Xmx768m -XX:MaxPermSize=128m"
Will try to get the thread dumps next.
OutOfMemory error can actually be the cause of 100% CPU usage (i.e. CPU is used by garbage collector). Please try to add the following parameter to MEM_OPTS env variable:
-XX:+HeapDumpOnOutOfMemoryError
In this case an .hprof file will be automatically generated in the TeamCity/bin directory. If you will have such a file, please upload it to our ftp (in compressed form):
http://confluence.jetbrains.net/display/TCD5/Reporting+Issues#ReportingIssues-sendingLargeFiles
Also please make sure you do not use x64 version of Java, because in this case you have to double memory settings:
http://confluence.jetbrains.net/display/TCD5/Installing+and+Configuring+the+TeamCity+Server#InstallingandConfiguringtheTeamCityServer-x64jvm
Attached are some thread dumps made from the Diagnostics view
Attachment(s):
threadDump-2010-05-11_16.11.01.txt.zip
threadDump-2010-05-11_16.10.58.txt.zip
threadDump-2010-05-11_16.10.55.txt.zip
threadDump-2010-05-11_16.10.53.txt.zip
threadDump-2010-05-11_16.10.51.txt.zip
Did you take them when CPU load was high?
Yes and there is not much going on, at least from what I can see. Still the process is running at 100% from the moment it starts
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12234 root 20 0 1081m 282m 9340 S 97.5 13.9 6:49.12 java
Could you please try to dump memory snapshot using Diagnostics tab? If it is big enough, i.e. close to the value specified in -Xmx parameter, then please upload it to our ftp.
Tried the memory dump and it produced only a 70MB file
Please try to take CPU snapshot, see more here:
http://confluence.jetbrains.net/display/TCD5/Reporting+Issues#ReportingIssues-serverperformance
strace suggests a thread is continously trying to read 0 bytes from the file buildServer/system/messages/CH27/23127.msg3
What are these files? Judging from the content it is somehow related to a very old build that has failed
The cpu profiling also shows that the most active thread is
jetbrains.buildServer.serverSide.statistics.impl.StatisticsPublisher$PublishChunkTask.run()
jetbrains.buildServer.serverSide.statistics.impl.StatisticsPublisher.publishHistoryChunk(ValueProvider)
jetbrains.buildServer.serverSide.statistics.impl.StatisticsPublisher.publishBuildForValueType(SBuild, ValueProvider)
jetbrains.buildServer.serverSide.statistics.BuildDurationTimes.buildFinished(SBuild)
jetbrains.buildServer.serverSide.buildLog.LogMessageFacade$LogMessageIterator.hasNext()
jetbrains.buildServer.serverSide.buildLog.LogMessageFacade$LogMessageIterator.readMessagesChunkIfBufferEmpty()
jetbrains.buildServer.serverSide.buildLog.LogMessageFacade.access$1400(LogMessageFacade, Runnable)
jetbrains.buildServer.serverSide.buildLog.LogMessageFacade.runExclusively(Runnable)
jetbrains.buildServer.serverSide.buildLog.LogMessageFacade$LogMessageIterator$2.run()
jetbrains.buildServer.serverSide.buildLog.AbstractLowLevelReader.next()
jetbrains.buildServer.serverSide.buildLog.AbstractLowLevelReader.readUTF()
java.io.RandomAccessFile.readUTF()
jetbrains.buildServer.util.BufferedRandomAccessFile.read(byte[], int, int)
jetbrains.buildServer.util.BufferedRandomAccessFile.fillBufferIfNeeded()
jetbrains.buildServer.util.BufferedRandomAccessFile.fillBuffer()
java.io.RandomAccessFile.read(byte[], int, int)
This file contains build log in TeamCity own format. Is it possible to send all files with name 23127.* to teamcity-feedback [at] jetbrains.com?
As a workaround for your performance problems you can try to rename this file.
I removed the 23127.* files and the CPU usage went down to 5-10%. Seems a lot more sane.
Thank you very much for your help! I will send the files to teamcity-feedback in case somebody wants to inspect them.
I would really appreciate if you send us these files. This can help if other users will see the same issue.