TeamCity Build Agent High CPU During Testing

We're having a minor issue with our build agents while building.

While they're running tests through Gallio using MbUnit and Selenium on a remote computer (in browser testing), the java.exe engine is thrashing my CPU on the TeamCity server, 90-100% CPU utilization the entire time, being as the TeamCity server is also the IIS test bed, this makes it so that the IIS service has to compete hard against the java application, and because of this testing occurs much slower, about 10x+ slower, this is causing some tests to fail.

It isn't even Gallio.echo.exe that is spinning, it is simply Java.exe which as far as I can tell is just the build agent. Typically I run 3 build agents all sharing the same java install, here is their config:

#******************************************************************** # Java Service Wrapper Properties for TeamCity Agent Launcher #******************************************************************** ##################################################################### ### ###   The path should point to 'java' program. ### ##################################################################### wrapper.java.command=C:\Program Files\Java\jre6\bin\java wrapper.java.mainclass=org.tanukisoftware.wrapper.WrapperSimpleApp # Java Classpath wrapper.java.classpath.1=../launcher/lib/wrapper.jar wrapper.java.classpath.2=../launcher/lib/launcher.jar # Java Library Path (location of Wrapper.DLL or libwrapper.so) wrapper.java.library.path.1=../launcher/lib wrapper.java.library.path.2=../launcher/bin # TeamCity agent launcher parameters #Preventing launcher exit on x64 when user logs off wrapper.java.additional.1=-Xrs # Initial Java Heap Size (in MB) wrapper.java.initmemory=128 # Maximum Java Heap Size (in MB) wrapper.java.maxmemory=1024 ########################################################### ### TeamCity agent JVM parameters ### ### NOTE: There should be no gaps in parameters numbers, if ### NOTE: you change parameters, you need to update numbering ### ########################################################## # Application parameters. wrapper.app.parameter.1=jetbrains.buildServer.agent.StandAloneLauncher wrapper.app.parameter.2=-ea wrapper.app.parameter.3=-Xmx512m # The next line can be removed (and the rest of the parameter names MUST BE renumbered) to prevent memory dumps on OutOfMemoryErrors wrapper.app.parameter.4=-XX:+HeapDumpOnOutOfMemoryError # Preventing process exiting on user log off wrapper.app.parameter.5=-Xrs # Uncomment the next line (insert the number instead of "N" and renumber the rest of the lines) to improve JVM performance wrapper.app.parameter.6=-server wrapper.app.parameter.7=-Dlog4j.configuration=file:../conf/teamcity-agent-log4j.xml wrapper.app.parameter.8=-Dteamcity_logs=../logs/ wrapper.app.parameter.9=jetbrains.buildServer.agent.AgentMain # TeamCity agent parameters wrapper.app.parameter.10=-file wrapper.app.parameter.11=../conf/buildAgent.properties wrapper.working.dir=../../bin wrapper.ping.timeout=300 #******************************************************************** # Wrapper Logging Properties #******************************************************************** # Format of output for the console.  (See Java Service Wrapper documentation for formats) wrapper.console.format=PM # Log Level for console output.  (See docs for log levels) wrapper.console.loglevel=INFO # Log file to use for wrapper output logging. wrapper.logfile=../logs/wrapper.log # Format of output for the log file.  (See docs for formats) wrapper.logfile.format=LPTM # Log Level for log file output.  (See docs for log levels) wrapper.logfile.loglevel=INFO # Maximum size that the log file will be allowed to grow to before #  the log is rolled. Size is specified in bytes.  The default value #  of 0, disables log rolling.  May abbreviate with the 'k' (kb) or #  'm' (mb) suffix.  For example: 10m = 10 megabytes. wrapper.logfile.maxsize=10m # Maximum number of rolled log files which will be allowed before old #  files are deleted.  The default value of 0 implies no limit. wrapper.logfile.maxfiles=10 # Log Level for sys/event log output.  (See docs for log levels) wrapper.syslog.loglevel=NONE #******************************************************************** # Wrapper Windows Properties #******************************************************************** # Title to use when running as a console wrapper.console.title=TeamCity Build Agent #******************************************************************** # Wrapper Windows NT/2000/XP Service Properties #******************************************************************** # WARNING - Do not modify any of these properties when an application #  using this configuration file has been installed as a service. #  Please uninstall the service before modifying this section.  The #  service can then be reinstalled. # Name of the service wrapper.ntservice.name=TCBuildAgent1 # Display name of the service wrapper.ntservice.displayname=TeamCity Build Agent Service 1 # Description of the service wrapper.ntservice.description=TeamCity Build Agent Service 1 # Service dependencies.  Add dependencies as needed starting from 1 wrapper.ntservice.dependency.1= # Mode in which the service is installed.  AUTO_START or DEMAND_START wrapper.ntservice.starttype=AUTO_START # Allow the service to interact with the desktop. wrapper.ntservice.interactive=true



I can explicitly say that this isn't the TeamCity Java, that is running 6.0.19 IIRC, while my Java install on the services are 6.0.23 (off the top of my head I'm sure there is a mismatch, so I can verify what is running what). The Java I am running on the build agents are 64bit, I'm not sure if that would be a problem, I ran into the same issue even worse when it was on the Java installed with TeamCity, so I moved it off to 64bit and tried to allocate more RAM to it to make sure it wasn't some configuration issue from tacking 3 build agents onto the side of a TeamCity server.

I explicitly set it to the JRE install for the system, not the TeamCity JRE (it was performing even worse there), it sits around 72MB memory usage even though I believe I've allocated it much more, system has 900MB of memory available during this so it isn't swapping itself todeath.

This is on a Windows Server 2008 R2 box.

As much as I can probably tweak my tests to require a little less speed out of the system, and I can drop another vCPU on this virtual machine, I want both of these to be last measures, and CPU thrashing hints at me misconfiguring something.

Currently I'm even running on one agent with no improvement.


Thanks guys. :) Other than this little issue I've had TeamCity has been amazing.


And of course, worse comes to worse, I can set up build agent only VMs, but I'd hate to allocate RAM and OSes to just doing that if I can fix the Java setup.

This is on TeamCity 6.0.2



Edit:

I forgot, we're also using NANT to script the steps of our build, if that makes a difference, though the NANT exe is not thrashing either.

12 comments
Comment actions Permalink

Hi William,

Could you grab and post several thread dumps please.

Michael

0
Comment actions Permalink

5 Thread dumps are attached, they were taken while the Java thread was thrashing during testing.

Removed our VSS paths, the VSS paths are valid (files do check out), but I marked the individual paths with IDs, especially being as I saw that one is locking another, but that shouldn't be a huge deal (thread is waiting, and I seriously doubt it's programmed to spinwait, plus it only is doing this in one dump).

So obviously did a little bit of digging, but I have no clue if this includes the build threads (would be awesome if it did), so I'll have to hand this off to someone else that understands TeamCity better than I do.

Thanks a lot for the help, sorry I didn't include the thread dumps in my main post, I should have started there. :)


Edit:

I have also manually set the CPU values of the individual build agents, being as they'd all scramble to startup at boot time, their CPU values were in the 40s-50s, I hard set them to 250... this is only for queue management and job distribution, not actually a way that the system will consume resources, right?



Attachment(s):
threadDump-2011-02-01_09.14.33.txt.zip
threadDump-2011-02-01_09.14.23.txt.zip
threadDump-2011-02-01_09.14.14.txt.zip
threadDump-2011-02-01_09.14.06.txt.zip
threadDump-2011-02-01_09.13.58.txt.zip
0
Comment actions Permalink

Also attached, a snapshot, I know this is more for the TeamCity dev team though, I haven't found a way to open it...



Attachment(s):
TeamCity-2011-02-01.snapshot.zip
0
Comment actions Permalink

Thanks William

I've passed it to a developer, please let us a few time to investigate

Michael

0
Comment actions Permalink

Bingo, got VisualVM to pick up on jetbrains.buildServer.agent.AgentMain.

Results are attached.

Of course the time isn't exactly accurate, VisualVM sucks down a lot of CPU, but it should give us an idea of what is going on, though I'm sure the developer's snapshot includes this too.



Attachment(s):
snapshot.png
0
Comment actions Permalink

Unfortunately CPU snapshot does not help much. The problem is not with server JVM as we thought initially, server JVM is mostly idle (at least this is what I see from the CPU snapshot and thread dumps).
So the java.exe process which eats CPU belongs to agent. Please try to take thread dumps from the agent processes: http://confluence.jetbrains.net/display/TCD6/Reporting+Issues#ReportingIssues-Agentthreaddump

0
Comment actions Permalink

1 and 2 are early in the build (running NANT/Gallio/etc).

3, 4 and 5 are during testing (CPU 90%+ solid).


Thanks, hope this helps.



Attachment(s):
stacktrace-run2-5.txt.zip
stacktrace-run2-4.txt.zip
stacktrace-run2-3.txt.zip
stacktrace-run2-2.txt.zip
stacktrace-run2-1.txt.zip
0
Comment actions Permalink

Greetings:

For extra info, I thought I would add that we found the same problem.

We run Java/ant builds and Maven2/Selenium.  When we upgraded to 6.0.2, our Maven/Selenium build pinned the CPU as well.

Today we ran our Maven2/Selenium build against TeamCity 6.0.1 without a problem.

I include this log snippet to illustrate the time where build seems to hang

[11:41:11]: [Surefire report watcher] C:\buildagent\work\438f3a0d3a34c26f\target\surefire-reports\TEST-*.xml
[12:11:35]: [Selenium:Selenium] Process exited with code 1


Thanks for looking into this.

------------------------------------------------------

[11:41:08]: [Step 1/2] Using runner: Maven2
[11:41:09]: [Step 1/2] ##teamcity[buildNumber '348']
[11:41:09]: [Step 1/2] Using watcher: C:\buildagent\plugins\mavenPlugin\maven-watcher\maven-watcher-agent.jar
[11:41:09]: [Step 1/2] MAVEN_OPTS =   -javaagent:C:\buildagent\plugins\mavenPlugin\maven-watcher\maven-watcher-agent.jar  -Dteamcity.agent.dotnet.build_id=3502  "-Dteamcity.buildConfName=Selenium Test Firefox"  -Dagent.home.dir=C:\buildagent  -Dteamcity.runner.properties.file=C:\buildagent\temp\buildTmp\teamcity.runner1236224463949656513.properties  -Dbuild.number=348  -Dteamcity.dotnet.nunitlauncher2.0=C:\buildagent\plugins\dotnetPlugin\bin\JetBrains.BuildServer.NUnitLauncher2.0.exe  -Duser.language=en -Dos.version=5.1 -Dfile.encoding=Cp1252  "-Dteamcity.projectName=CommSalesDev Branch"  -Dteamcity.dotnet.nunitlauncher2.0.vsts=C:\buildagent\plugins\dotnetPlugin\bin\JetBrains.BuildServer.NUnitLauncher2.0.VSTS.exe  -Djava.io.tmpdir=C:\buildagent\temp\buildTmp  -Dteamcity.dotnet.nunitlauncher=C:\buildagent\plugins\dotnetPlugin\bin\JetBrains.BuildServer.NUnitLauncher.exe  -Duser.name=autobuild "-Dteamcity.version=6.0.2 (build 15857)"  -Dteamcity.auth.password=6nNJVJsnKmYywrbCSA1m8Wuedx69Oc5r  -Dteamcity.build.checkoutDir=C:\buildagent\work\438f3a0d3a34c26f  -Dteamcity.idea.home=C:\buildagent/plugins/idea  -Dteamcity.buildType.id=bt2  -Dteamcity.dotnet.nunitlauncher1.1=C:\buildagent\plugins\dotnetPlugin\bin\JetBrains.BuildServer.NUnitLauncher1.1.exe  -Duser.variant=  -Dcom.jetbrains.maven.watcher.report.file=C:\buildagent\temp\buildTmp\maven-build-info.xml   -Dteamcity.dotnet.nunitlauncher.msbuild.task=C:\buildagent\plugins\dotnetPlugin\bin\JetBrains.BuildServer.MSBuildLoggers.dll  -Dpath.separator=;  -Dteamcity.build.workingDir=C:\buildagent\work\438f3a0d3a34c26f  -Dbuild.vcs.number.SeleniumDevelopment=92d0dae3d109196258001de654154c4366679e1b   -Dteamcity.build.properties.file=C:\buildagent\temp\buildTmp\teamcity.build2879762386073776952.properties  -Dagent.name=wpgteamcityagnt -Dagent.ownPort=9090  -Dteamcity.dotnet.platform=C:\buildagent\plugins\dotnetPlugin\bin\JetBrains.TeamCity.PlatformProcessRunner.1.1.exe  -Duser.country=US  -Dteamcity.configuration.properties.file=C:\buildagent\temp\buildTmp\teamcity.config7437392935778670586.properties  -Duser.timezone=America/Chicago -Dteamcity.build.id=3502  -Dagent.work.dir=C:\buildagent\work -Dfile.separator=\ -Dos.arch=x86  -Dteamcity.build.changedFiles.file=C:\buildagent\temp\buildTmp\changedFiles2744586089162822529.txt   -Dteamcity.tests.recentlyFailedTests.file=C:\buildagent\temp\buildTmp\testsToRunFirst7268417894676877823.txt  -Dteamcity.build.tempDir=C:\buildagent\temp\buildTmp  -Dbuild.vcs.number=92d0dae3d109196258001de654154c4366679e1b  -Dbuild.vcs.number.1=92d0dae3d109196258001de654154c4366679e1b  "-Dos.name=Windows XP"  -Dteamcity.dotnet.nunitaddin=C:\buildagent\plugins\dotnetPlugin\bin\JetBrains.TeamCity.NUnitAddin-NUnit  -Dteamcity.auth.userId=TeamCityBuildId=3502  -Dteamcity.agent.dotnet.agent_url=http://localhost:9090/RPC2  -Dteamcity.agent.cpuBenchmark=199 "-Duser.home=C:\Documents and  Settings\autobuild"
[11:41:09]: [Step 1/2] M2_HOME = C:\buildagent\plugins\maven
[11:41:09]: [Step 1/2] PATH =  C:\buildagent\plugins\maven\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program  Files\Common Files\Lenovo;C:\Program  Files\Java\jdk1.5.0_22\bin;C:\apache-ant-1.8.1\bin
[11:41:09]: [Step 1/2] JAVA_HOME = C:\Program Files\Java\jdk1.5.0_22
[11:41:09]: [Step 1/2] Starting: C:\buildagent\plugins\maven\bin\mvn.bat -f C:\buildagent\work\438f3a0d3a34c26f\pom.xml -B clean install
[11:41:09]: [Step 1/2] in directory: C:\buildagent\work\438f3a0d3a34c26f
[11:41:11]: [Step 1/2] [INFO] Scanning for projects...
[11:41:11]: [Step 1/2] [INFO] ------------------------------------------------------------------------
[11:41:11]: [Step 1/2] [INFO] Building Selenium
[11:41:11]: [Step 1/2] [INFO]    task-segment: [clean, install]
[11:41:11]: [Step 1/2] [INFO] ------------------------------------------------------------------------
[11:41:11]: [Step 1/2] Selenium:Selenium (30m:39s)
[11:41:11]: [Selenium:Selenium] Importing data  from  'C:/buildagent/work/438f3a0d3a34c26f/target/surefire-reports/TEST-*.xml'  with 'surefire' processor
[11:41:11]: [Selenium:Selenium] Surefire report watcher
[11:41:11]: [Surefire report watcher] Watching paths:
[11:41:11]: [Surefire report watcher] C:\buildagent\work\438f3a0d3a34c26f\target\surefire-reports\TEST-*.xml
[12:11:35]: [Selenium:Selenium] Process exited with code 1
[12:11:50]: [Selenium:Selenium] Surefire report watcher
[12:11:50]: [Selenium:Selenium] Step failed
[12:11:50]: Killing 1 service processes
0
Comment actions Permalink

I've created an issue on our bug tracker http://youtrack.jetbrains.net/issue/TW-15335
In the issue you can find plugin patch with the corresponding fix

0
Comment actions Permalink

Thanks a lot! Fixed 100%, CPU idling at about 90-98% during testing, IIS compiles the site and has it ready in no time, tests running smoothly.

0
Comment actions Permalink

I installed the plugin and all is well. The selenium build works as expected.

I installed the plugin in the .BuildServer/plugins directory.  I assume that that this fix will be rolled into the TeamCity 6.0.3 release?

Would we be expected to remove the plugin after the next upgrade?

Thanks for the fix

0
Comment actions Permalink


I installed the plugin in the .BuildServer/plugins directory.  I assume that that this fix will be rolled into the TeamCity 6.0.3 release?

Would we be expected to remove the plugin after the next upgrade?

The fix will roll into 6.0.3 for sure.
The plugin will be automatically overriden by the one which comes with the 6.0.3 and higher release.

0

Please sign in to leave a comment.