Server stops collecting changes

Hi,

I am running TeamCity 5.0.1 Enterprise (build 10715).  After a few days of uptime, TeamCity seems to stop collecting changes.  When this happens, even forcing the server to check for pending changes does not find anything.  If you start a build manually, the pending changes do get collected and show up in the UI.  

Also, when the server is in this state, VCS triggers do not work obviously, but even time-based triggers do not work, so I think the two may be related.

The last occurrence of this was yesterday, Jan 13 at 17:19.  When I look through the teamcity-vcs.log file, I can see that there are constantly log entries for VCS change collections, until that time.  Then it only seems to collect changes when a build is run.  When I watch the log file and force collecting changes through the UI, nothing shows up in the log.  After bouncing the server, everything goes back to normal.

If there is any more data I can collect that might be useful, please let me know, otherwise any help with this issue would be great.  I've looked through all the other server logs around the time the collection stopped and did not see anything out of the ordinary, but if you'd like to see the logs I have saved them off.

I have attached the teamcity-vcs.log from this time, with the repository urls and project names removed.  Our VCS changes check interval is set to 300 seconds and we use subversion for our VCS.

Thanks,

--Steve



Attachment(s):
teamcity-vcs.log.zip
10 comments
Comment actions Permalink

Hello Steve,

  Looks you're right - changes stopped collecting near 17:19. It is not clear what caused the problem. The best way for that is to take a thread dump at the moment when changes stop collecting.
  Please see instructions in our docs.

  Kind regards,
  KIR

0
Comment actions Permalink

Hi Kirill,

Thanks for the reply.  I've attached a thread dump from our server.  It was taken about 40 mins after the server stopped collecting changes, as I wasn't able to catch the exact moment.  Taking the thread dump made me realize that the server had been running on JDK 1.5 instead of 1.6.  I forgot to update that after we had upgraded from 4.5 to 5.0, but it is fixed now.

It seems that the vcs change collector uses the same scheduler as the build triggering mechanism, is that correct?  Without knowing any internal details, from the outside it appears that the scheduler gets backed up with too many requests and gets hung up.  There are many blocked threads reported in the dump, and some appear to be attempting to open an svn+ssh connection.  We have a few VCS roots setup to connect over svn+ssh and I have noticed periodically that there are errors in connecting to those repositories, so maybe that is a clue into what is happening with our server.

I'll continue to monitor the server now that we are running with the right JDK and let you know if anything changes.

Thanks,

--Steve



Attachment(s):
teamcity-10715-threaddump.rtf
0
Comment actions Permalink

Hello Steve,

   Basically, you're right with the diagnostics. When a SVN+SSH connection is made, a global lock is obtained.
   The lock is needed to provide connection pooling which is used SVNKit's SVN+SSH session implementation.

   If there is a hang during obtaining the connection, other processes will be blocked on the lock.
   You can try disabling connection pooling by setting system property svnkit.ssh2.persistent=false for the
   TeamCity process, this should prevent the locking.

   It also looks like SVN+SSH connections have no connection timeout - and I'll try to fix this problem.

   Kind regards,
   KIR

0
Comment actions Permalink

Okay, thanks for the help Kirill.  I'll try out your suggestion for setting the system property.

0
Comment actions Permalink

Hi Kirill,
After setting the svnkit.ssh2.persistent property to false, the server still stops collecting changes.  Attached is a thread dump from the lastest occurence.  Am I not setting the property correclty?  I have added it as a -D parameter to CATALINA_OPTS in teamcity-server.sh:

/System/Library/Frameworks/JavaVM.framework/Home/bin/java -Djava.util.logging.config.file=/usr/local/teamcity/10715/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -server -Xms512m -Xmx512m -XX:MaxPermSize=128m -Dlog4j.configuration=file:/usr/local/teamcity/10715/bin/../conf/teamcity-server-log4j.xml -Dteamcity_logs=../logs/ -Djava.awt.headless=true -Dteamcity.data.path=../data/ -Dsvnkit.ssh2.persistent=false -Djava.endorsed.dirs=/usr/local/teamcity/10715/endorsed -classpath :/usr/local/teamcity/10715/bin/bootstrap.jar -Dcatalina.base=/usr/local/teamcity/10715 -Dcatalina.home=/usr/local/teamcity/10715 -Djava.io.tmpdir=/usr/local/teamcity/10715/temp org.apache.catalina.startup.Bootstrap start

Maybe there is something else that is hanging?

Thanks,

--Steve



Attachment(s):
teamcity-10715-threaddump2.txt.zip
0
Comment actions Permalink

Hello Steve,

   Looks like you've done everything right. The problem you're experiencing is caused by a hanging connection to SVN+SSH server.
   This connection blocks other background processes which detect changes. I'm not sure why do you have this problem with connection, may be
   server overload, bad network, something else.
   The workaround could be to avoid using SSH and use plain SVN protocol.

   May I ask to to make an experiment. Please download SVNKit library . Unzip it. This library contains bin/jsvn script, which is an analog of simple svn command.
   Please try to checkout your proect sources using bin/jsvn co svn+ssh://your.repo and compare results with native svn checkout speed.

   Regards,
   KIR

0
Comment actions Permalink

Hi Kirill,

Here are the results from my checkout test between svnkit and native svn with the 'time' command:

svnkit-

test1:
real 0m47.359s
user 0m15.412s
sys 0m8.403s

test2:
real 0m33.766s
user 0m15.436s
sys 0m8.299s

native svn-

test1:
real 1m24.897s
user 0m3.071s
sys 0m4.124s

test2:
real 1m33.117s
user 0m3.054s
sys 0m4.045s

Size of files checked out was 106MB.

Also, has there been an issue created for this that I can follow/vote on?

Thanks,

--Steve

0
Comment actions Permalink

Hello Steve,

  Thanks for your benchmarks, looks like they are normal. We've had an issue with extreme slowness of SVNKit's svn+ssh access,
  but it looks like this is not the case.

  I've posted your issue as http://youtrack.jetbrains.net/issue/TW-10843 , please watch it.

  Thanks again,
  KIR

0
Comment actions Permalink

Okay, thanks so much for your help and quick responsiveness.  Only a small set of our VCS roots are going over svn+ssh, so for now I have archived the projects that use svn+ssh repositories.  One of the developers that works on those projects said they have been seeing their own problems with svn in the last month, so it might be something on our end causing the hanging connections.  Still, it would be great to have a connection timeout option in case this is an ongoing problem.

Thanks!

--Steve

0
Comment actions Permalink

Hello Steve,

  The timeout parameter will be available in 5.0.2. Thanks for your help!

  Regards,
  KIR

0

Please sign in to leave a comment.