Server web page response slows way down after long night of JUnit runs

We have a large series of JUnit runs, that generates couple of gigs of build logs and lasts about 12-13hrs, reporting results for about ~80000 tests.

When we come in in the morning, after the test runs are finished, the TeamCity server is very very slow, sometimes taking longer the 2 minutes to load a individual project page.  Restarting the TeamCity server causes the issue to go away.

It also appears that the tests runs start to get slower as the evening goes on.

I have uploaded a profile snapshot, taken when the server was running super slow, to ftp://ftp.intellij.net/.uploads/slow_after_night_of_junits.snapshot.

I figure part of the problem is all the data we dump to the console/logs, and I am working on tweaking that so we dump less.  But even then, it shouldn't slow down the webapp like this.

7 comments
Comment actions Permalink

Hello James,

  Could you also please post the archived TeamCity/logs directory?

  Another question is whether there were running builds with JUnit tests when you made the snapshot?
  I'm asking, because, according to the snapshot, most of the time was spent by parsing incoming HTTP requests, as if there were a lot of incoming data.

  Regards,
  KIR

0
Comment actions Permalink

I believe there where 5 builds running when I turned on profileing. (Our system is running 33 build agents, currently).  The nightly runs use 9 agents and they are all runnning back to back from about 8pm to 8am.  Those runs where stopped when I came in.

During the day, we might have 15-20 builds running at any one time, and the server respondes just fine.  It is only after these long JUnit runs that it starts to hang.

The problem has gotten worse over the last few weeks.  The server used to stay up for weeks at a time, without any slow down.  But now it needs to be rebooted every morning.

I uploaded our logs to: long_junits_logs.zip.  This is just our current logs, I didn't get a chance to grab those from the exact same time as the profile was taken (although most of them look like they stretch back that far).

0
Comment actions Permalink

Ok, I disabled the nightly test run, and the server was still slow/unresponsive this morning.  I made sure no builds where running and took another snapshot, I also did a memory snapshot and grabbed all the log files.

Uploaded as: idle_2009_06_19.zip

0
Comment actions Permalink

James,

   Thanks a lot for providing debug information for the problem. We'll try to research the problem ASAP and let you know about the results.
   I hope to provide some feedback after the weekend.

   Thanks again,
   KIR

0
Comment actions Permalink

Hello James,

   Unfortunately, I cannot find explicit and clear trouble points in the logs and materials you've sent.

   According to the slow_after_night_of_junits.snapshot, TeamCity has a trouble while returning connections to the database pool. As a result, we have many blocked threads, slow access to the database, and overall slowness when accessing TeamCity UI. The problem is, I cannot say for sure why there are problems accessing the database.

   I suppose that because of the large number of test runs, you have pretty large test_info table and this may explain slowness of some requests. Could you please take a look at the teamcity database - how many rows are there in tables test_info and test_names?

   I've also noticed some relatively slow requests to tables users and user_properties. How many users do you have who login to TeamCity installation?

   Another thing I'd like to ask is to check the process load of the TeamCity server and database. I'm asking this because, according to snapshots, CPU load of TeamCity is not really high. So my question is where is actual bottleneck - on the server, on on the database. The process load can be easily checked with process explorer utility. Regarding the database, you can make several "show full processlist" queries (when TeamCity is slow), or verify the log of slow mysql requests, if you have it enabled.

   I've also found some weak points in our code, which perform badly. One of these is located in Problematic Tests tab, and it may affect performance of a project page. You can try disabling this tab by altering XML descriptor of TeamCity, but this doesn't explain why TeamCity works normally after restart. To disable this tab, comment out bean named ProjectProblematicTestsTab in TeamCity/webapps/ROOT/WEB-INF/buildServerSpringWeb.xml.

  Kind regards,
  KIR

0
Comment actions Permalink

I don't see a real solution in this thread, but because it is very related to the problems we have I had the description of our problem.

When we do a Junit run which include about 20.000 Junits they will run for at least 20 hours.

We experience the same problems. The server with teamcity running has a very high cpu usage and is almost not reachable anymore. We only run junits on one of our clients.

We use MSSQL as the database for teamcity.

 

0
Comment actions Permalink

Please submit a bug report to our tracker: http://youtrack.jetbrains.net
Also please provide more details on your setup: version of TeamCity and MSSQL
If possible provide CPU snapshot taken with help of profiling plugin: http://confluence.jetbrains.net/display/TCD5/Reporting+Issues#ReportingIssues-serverperformance

0

Please sign in to leave a comment.