TeamCity 2020.2 - many builds hanging after upgrading
Answered
After updating our TeamCity server to 2020.2, many of our nightly builds are now hanging. Before the update, we rarely, if ever, had this issue, but over the past couple of days, after the update, every day we've had one or two builds hanging from the night before.
Is this a known issue? Is there a work-around for this?
Thanks for your help.
Jim
Please sign in to leave a comment.
Hello Jim,
Could you please confirm the behavior you see (that is, are the builds getting stuck on agent side, are they sitting on the queue or are they not started at all)?
In the first case, please send over the teamcity-agent.log from agent side and, if the issue is still present, several thread dumps from agent side.
If this is the second case and the builds are in queue for the prolonged period of time - what do you see on the build queue view? Are there any agents suitable for the build?
Speaking of the third case, am I correct to assume these run on a schedule trigger? If you could share the teamcity-server.log and check if there are any server thread dumps taken automatically, this would come handy.
For the file sharing, I would suggest to use https://uploads.jetbrains.com/ as to share the data in private manner.
I will work on getting you the files. I'll answer some of your questions first, and mention something that's happening this morning with the server/agents.
When these builds are 'hanging', they appear, in the UI, to have been running for many hours. The build seems to start, but it never finishes. They are not sitting in the queue. I have to hit the 'stop build' button and then restart them. All of our Agents (6 of them) can run any of the builds we have setup, and has been running fine for several years now.
Many of the builds run on a schedule trigger, but many are also setup as CI builds that run after every commit.
This morning, I went to check to see if any of the builds hung, and I didn't see anything immediately in the UI. When looking at the top of the screen, at the 'Agents' indicator, it did show one agent in use. When I click on the 'Agents' screen, I do see a build running for the past 6 hours, so it's in a 'hung' state. (Normally takes around 23 minutes). I haven't seen this before where it's hanging but not listed in the UI. I had to click the Agents view to see what build has the issue. This is the first time I've seen it like this.
Thanks for your help,
Jim
I've uploaded two files.
One is a thread-dump from the agent that is currently running a build in a 'hung' state. (.txt file)
The second is a .zip file of all the thread-dumps found on the Server. (.zip file)
Here is the Upload ID: 2020_12_11_XihLZQJXYPy6Qibu (files: threadDump-2020-12-11_06.35.20.txt, Server_threadDumps.zip)
Jim
The build on the agent this morning, that was in a 'hung' state, is no longer running. The build is NOT listed in the Build Project's 'History' tab, so I have no idea what happened with it. It just seems to have disappeared.
Jim
Hello Jim,
Sorry for the delay here; thread dumps for server are fairly old (so whatever the issue was on Friday, it might have gone unnoticed there), and agent one does not specify anything noteworthy. Could you please share the teamcity-agent.log and teamcity-build.log covering the date of freeze on the agent in question and confirm the build configuration ID or name as well as approximate time when the build was running, so I will check what was happening on the agent side?
Upload ID: 2020_12_14_7W7QoiBvrG8mm7q8 (file: Agent01-Logs.zip)
The Build ID is: Monaco_WinnersWorld_PlatformDailyBuildWinnersWorld, although there have been other build projects that have hung.
The Build Number: 0.12.0.198
The Build Date/Time: 08 Dec 20 22:00 - 10:03 (12h:03m)
Looking through the build history it hung two more times recently on different agents. I zipped up the entire logs folder, just in case. Please let me know if you need something else.
Thanks,
Jim
Hello Jim,
Thanks a lot! As per teamcity-agent.log, build was running as normal - except for the fact that 12th step, handling ReSharper, was running from 11 PM to 10 AM. For other builds that were staled, do the configurations use ReSharper (with InspectCode) as well? Were there any changes in tool version recently?
Unfortunately, teamcity-build.log was already rolled over so there`s no details available; however, these lines look particularly interesting:
Could you please also turn on Enable debug output option so to allow debug entries from InspectCode to the build log?
We haven't changed anything with that in a long time. I looked through the history and the hangs started right after we updated to 2020.2. There is a hung build in this state this morning as well.
Jim
Hello Jim,
I see, thank you for the details! As per the log, bundled version of dotCover is being used (for TeamCity 2020.2 this is dotCover 2020.2.4). If you could enable the debug output for dotCover as suggested above, I would check if the tool is related or not. Could I also ask you to enable debug logging on one of the agents handling the build and run a few builds there to see if the issue reproduces (and share the logs with me)?
I've enable both the ReSharper and Agent debug logging. I enabled it on all 6 agents as I have no way to know where the build will be running when it hangs and I didn't want to force all the builds on to one agent as they would queue a bunch.
Do the agents need to be restarted?
Let me know what you will need when it happens again. Thanks,
Jim
Hello Jim,
Thank you! No, agents do not need a restart; when the issue reoccurs, could you please share the ReSharper output log file and teamcity-agent.log/teamcity-build.log files from agent side?
In the meantime, I have checked known issues with dotCover 2020.2.4 but unfortunately found none with matching symptoms so far. Thank you for the prolonged patience - I hope the above logging will be enough to catch the root cause.