Page Loading Extremely Slow, Intermittent Timeouts, Errors in logs

I'm not sure where to begin. I've been having lots of problems with extreme slowness ever since upgrading to 4.0. Loading any page takes a very long time. If a build is failing and I click on the link for that build, it can take minutes for the page to load. Builds are timing out sometimes after all of the  tests are finished running. I have looked at the various logs and see errors but I don't know what they are all about.


For one build agent that times out, I get the following in the build logs before it times out:

[12:03:12]: [junit] Tests run: 333, Failures: 0, Errors: 0, Time elapsed: 150.91 sec
[12:24:32]: The build 10001 has been running for more than 30 minutes. Terminating...
[12:24:32]: [Execution timeout] {build.status.text}
[13:08:17]: Process exit code: 0
[13:08:57]: Build finished


In the teamcity-server.log I am constantly getting the error:

jetbrains.buildServer.SERVER - Failed to load finished build instance: Cannot find build promotion with ID 8187

In the build agent's error.log, I see the following, repeatedly:
     [2009-01-07 15:54:47,158]   WARN -       org.apache.xmlrpc.XmlRpc - java.net.SocketTimeoutException: Read timed out

Haven't seen anything particularly interesting in the other logs.


I am currently using the internal database, have 3 build agents and the following environment variables set:

  • TEAMCITY_AGENT_MEM_OPTS=-Xms512m -Xmx1024m -XX:MaxPermSize=256m
  • TEAMCITY_SERVER_MEM_OPTS=-Xms512m -Xmx1024m -XX:MaxPermSize=256m


Help!

0
63 comments

What is the size of your database file (.BuildServer/system/buildserver.data)? It looks like much time is spent on this file scanning. I would suggest to switch to standalone database like MySQL (http://www.jetbrains.net/confluence/display/TCD4/Migrating+to+an+External+Database).

0
Avatar
Permanently deleted user

I will upgrade to an external database as soon as I have the time. In the mean time, things are extremely slow again and I tried to do a thread dump and am getting this message:

Thread dumps of Production Branch-FN16 :: WSAPI Server Integration #552

Loading...
TeamCity was unable to locate any processes of this build.

Process tree:

Click on an item in the tree to view thread dump.


I wonder if this is related to the other errors I see in the logs.
0

If server is slow you should make thread dump of the server process, see how to do this here: http://www.jetbrains.net/confluence/display/TCD4/Reporting+Issues#ReportingIssues-HangsandThreadDumps

0
Avatar
Permanently deleted user

I just uploaded a file called serverthreaddump.txt. At the time the projects page wouldn't even load. Unfortunately there just hasn't been time to upgrade to external db but thought I'd send the thread dump along in case it's a different problem.

0

Still it looks like hsqldb database is the main bottleneck. I asked some time ago but you did not answered, what is the size of your database file (.BuildServer/system/buildserver.data)?

0

It seems I've found the cause of slowdown. Slowdown can appear if many tests are failing. We will try to fix this problem ASAP.

0
Avatar
Permanently deleted user

sorry. the size of my db is 311M

0
Avatar
Permanently deleted user

Great. I think it is only doing this when there are a lot of failures which is unfortunatlely when I need to look at the pages the most to see what's going on. Looking forward to the update. Hopefully we'll be able to upgrade the db next week.

0

We've made a fix which could help you. Could you please try to install this build: ftp://ftp.intellij.net/.1283517263/TeamCity-8197.war
This build has bugfixes only (it can be considered as 4.0.2 beta). Before installing, do not forget to backup your installation and .BuildServer folder.

0
Avatar
Permanently deleted user

great! downloading it now.

0

Have you tried this build? Can we consider the problem solved?

0
Avatar
Permanently deleted user

I installed it Friday and yesterday was a holiday... Unfortunately, I'm still having problems. If a build is running with lots of failing tests it is still taking over 1 minute to load any build results page. I was hoping one of our unix guys would have time to install the database for me on Friday since I am on a tight deadline project and haven't had the time yet. Maybe I will be able to get to it today. I created another profile snapshot with the latest attempt to load the build results page while the build is running. I will upload now.

0
Avatar
Permanently deleted user

I uploaded Bootstrap-2009-01-20.snapshot

0

Thank you for snapshot, there was another related change which could help. The build containing this change is available on FTP server too:
ftp://ftp.intellij.net/.1283517263/TeamCity-8198.war

If you will have a chance please try to install this build. Also if possible upgrade your profiling plugin, and take snapshot with J2EE option enabled. Note that profiling plugin installation procedure has been changed a bit.

0
Avatar
Permanently deleted user

I'll try it now.

0

One more thing, usually when server starts it can be slow for some time. So it is ok if slowdown is experienced right after the startup. You should wait till all agents are connected and finish their upgrade, after that TeamCity should respond faster.

0
Avatar
Permanently deleted user

ok. I assume I should still keep the -Dtc.search.index.tests=false there?

0

Yes if it helps you.

0
Avatar
Permanently deleted user

Bootstrap-2009-01-20(1).snapshot uploaded. Still very slow to see what's going on while build is running with failing tests.

0

According to the uploaded snapshot the build you installed does not contain performance fix. Are you sure you installed build #8198? What build number is shown in the footer of TeamCity UI? Anyway please try to install this build:
ftp://ftp.intellij.net/.1283517263/TeamCity-8203.war

Please check after the installation that build number in the footer is changed to 8203.

0
Avatar
Permanently deleted user

Yes, I made sure the footer said 8198. I will try the new build.

0
Avatar
Permanently deleted user

Still really slow. I verified I am running the 8203 build. I have uploaded 2 files. Both are trying to open the build results page on a bad run but the last snapshot is basically when nothing else is going on on the server whereas the first one I think other builds were running. I did get timed out builds after I did the upgrade.

0

Could you please describe this build? How many tests are failed? What is the total number of tests? How many builds ago there was a successful build in this configuration since this build? What is the size of the history of this configuration?

I uploaded another build: ftp://ftp.intellij.net/.1283517263/TeamCity-8204.war it has one more improvement. Thank you for your patience and for your help!

0
Avatar
Permanently deleted user

The build I was using yesterday has been failing for a very long time and I haven't been fixing it so that I would have a build to test against while we're trying to fix this problem. It has 33 passed tests and 99 failed tests.This build does have artifacts (the tests are server tests and the artifacts are server logs). I don't know how to tell the size of the configuration.


Throughout this process I have used various builds that were failing to use for profiling. I have uploaded a new file from a different build that has only been failing for 2 builds. I would have to do a little more work to fail one of the other builds. I did that because I don't see a relationship between how long the build has been failing and the slowness. The latest upload is from a build with 23 failed tests and 102 passed. I began trying to load the build results when there were 3 tests failing and it took foreever to bring up the page.

I'm going to install the latest release now.

0

By history size I meant how many builds are in the configuration with "bad" build? If there is a successful (green) build how many builds passed since this successful build in this configuration?

0
Avatar
Permanently deleted user

I just uploaded several files from passing builds that loaded slow (albiet not as slow as the ones with failing tests). After that I did some delayed commits to break 2 of the builds and the pages loaded super fast for that run so I really don't understand what's going on. I do have more trouble with the server test builds and they do break more often for all sorts of reasons. The build you were asking about has too may failed builds to count.

I hate to ask this but is it possible for me to revert back to version 3?Maybe I should start with a new database (internal) and see how it goes? I really don't have the time or the permission to take the time to do the external database migration.

--Melody

0

You can switch to 3.x if you have backup of your data before upgrade to 4.0. But you will lost new data created since 4.0 installation.

Though, I did not understand have you tried the last build I've sent you? Does it help?

As for database, still I would recommend to switch to external database as it will scale better and there are good chances that your server will work faster. Especially if you are going to start with new database because in this case there is no need to migrate your data.

0
Avatar
Permanently deleted user

My previous comments and the latest uploads were regarding the latest build. I'm running my always-failing build now to see if that has improved. I'll post my results in a minute.

0
Avatar
Permanently deleted user

I don't think it's improved. Trying to open the build results page for any build while those tests are failing is very slow, but pulling up the page for the failing build is even slower.  To know overall if things are improved I'll probably have to wait and see how things go as people break builds.

0

All this is really strange. One of the SQL statements is too slow. It should not be so slow, moreover when this statement is executing 70% of time is spent on determining the length of the database file (this code is inside HSQLDB engine). I suspect there is a lot of disk activity on your server. Dumb question: is it possible that your .BuildServer is on the network drive? I really hope this is not the case

Anyway for now I do not have other recomendations as to switch to external database. It seems that HSQLDB has reached its limits in your case.

0

Please sign in to leave a comment.