TeamCity failing to update Mercurial repository, hg init failure and locks
One of my TeamCity agents has developed a problem all of a sudden where it's having major problems pulling from a Mercurial repo that it's supposed to build from.
We were using TortoiseHg 3.6.2 on the machine and when the problem started I updated it to 4.6.1, but I'm still seeing the same behaviour.
There are a couple of different but clearly related issues. I'm using Agent side checkout.
1. Hg init failure
Steps:
- Manually delete the whole working directory from the buildagent, so that's .hg folder and it's parent folder.
So I can verify that the working folder doesn't even exist, so TeamCity will have to completely recreate the folder. - Run build on TeamCity, with Clean all files selected.
- Build starts, creates directory and calls hg init.
- Error message that hg init failed because the "repository already exists".
- When I look at the directory I can see a .hg folder, and some files inside it including a wlock file.
2. Pull failure
Steps:
- Leave the working directory from problem 1 in place, including the .hg directory.
- Ensure any lock files are deleted and hg recover has been run just in case.
- Run build on TeamCity, without cleaning the directory.
- The logs show hg pull starting and bundling files, but also says "waiting for lock on working directory of E:\blah held by process '3408' on host 'BUILDAGENT'
3408 here is an example, the number changes every time and corresponds to the hg.exe process that seems to be doing the pull. - Eventually after a lot of bundling and files messages I'll get a message saying it timed out waiting for the lock.
But of course the lock it's waiting for seems to be the lock it's holding itself! - If I delete the wlock file during this time, I'll see messages saying "got lock after X seconds" and immediately after it "waiting for lock on repository E:\blah held by process '3408' on host 'BUILDAGENT'. Then eventually it'll fail with a message about an abandoned transaction.
I'm at a complete loss as to what's going on.
- I've updated TortoiseHg/Mercurial on both build agent and server to 4.6.1.
- I've tried manually pulling into the working directory, and the first build after that will often succeed, but the second one will fail.
- I've rebooted everything.
- TeamCity is running under the same domain user and I use to log in and manually pull.
Other notes are:
- I'm seeing two hg.exe processes during the pull, and they both disappear when the pull finishes.
- The repo being pulled from is a folder on the network.
- The faulty agent is Windows 7, I've also got a Windows 10 agent that's working fine.
Does anyone have any suggestions for troubleshooting steps, or have you seen this before?
I'm hoping to avoid having to rebuild my agent from scratch.
Please sign in to leave a comment.
If you didn't change anything on TeamCity's side, then it has to be system having been changed. If you didn't change anything, then it might have been a windows update which messed with the hg installation.
We usually recommend going through this steps: https://confluence.jetbrains.com/display/TCD18/Common+Problems#CommonProblems-BuildworkslocallybutfailsormisbehavesinTeamCity
If following those steps you still can only reproduce the issue in teamcity and not locally on the same machine, I'd still recommend filing up a bug in our tracker, as mentioned there. We might need to take a look on whether a windows update might have triggered that behavior, and might need to work around it.
I can't reproduce the behaviour manually on the machine, but the failures even via TeamCity are not consistent.
Sometimes it's failing complaining about the lock file, other times it says it can't process a specific file in the .hg\store because "it's being used by another process". Pulls will just fail randomly for no apparent reason as well, like literally the only information in the log is "command failed" and line by line print out of manifest/bundling/files.
But equally sometimes each of these things will work fine, and then a subsequent step will fail.
I've started building a new agent VM, but I'm going to keep tinkering with the existing one to see if I can figure out the problem.
I suspect part of it is that Mercurial is crashing for some reason and leaving the lock files in place.
If I can get any further consistent information on it I'll update here and/or open a case.
This turned out to be a problem with TeamCity. The agent was behaving as if it had two identical build agents installed, so two sets of java processes were running and carrying out build steps and this caused the odd behaviour. See this related ticket.
I finally established what was going on when I noticed during a build that it started repeating the steps of the build after it was already down at step 3.
So Mercurial was getting blamed in the wrong.
I did a full uninstall of the agent and reinstalled it and now it's behaving correctly again.