Hanging build agent since updating to latest version (using docker in docker)

Since updating to the latest TeamCity server & agent versions (and also updating docker and ubuntu), our build agent is hanging on an execution of a docker command. The container is no longer running, but the agent does not seem to be notified/aware that the execution finished.

Docker version 27.5.1, build 9f9e405
Teamcity version 2025.11.1

The build log shows the step is completed, but the agent keeps stuck in the same step (not continuing to the next one):

11:47:35    [Ksut-1 exited with code 0
11:47:35   Aborting on container exit...
11:47:35    Container docker-sut-1  Stopping
11:47:35    Container docker-sut-1  Stopped
11:47:35   Process exited with code 0

Output for the agent (running in docker):

[2026-01-07 03:38:11,965]   INFO - erStages.start.CallRunnerStage - ----------------------------------------- [ Step 4/6: 'docker up sut (Command Line)' (simpleRunner), Build "API / CI" #16194 {id=149422, buildTypeId='Etv5Api_Build'} ] -----------------------------------------
[2026-01-07 03:38:11,966]   INFO - GenericCommandLineBuildProcess - Starting "docker compose -f "docker/docker-compose-ci.yml" up --abort-on-container-exit --force-recreate sut" in directory "/opt/buildagent/work/c6eba5b8863dd55b"
[2026-01-07 03:38:11,971]   INFO - nner2.OsProcessHandlerListener - docker compose -f "docker/docker-compose-ci.yml" up --abort-on-container-exit --force-recreate sut
[2026-01-07 03:38:12,666]   INFO -    jetbrains.buildServer.AGENT - Updating agent parameters on the server: AgentDetails{Name='teamcity-agent1-1-1', AgentId=75, BuildId=149422, AgentOwnAddress='null', AlternativeAddresses=[], Port=9090, Version='207998', PluginsVersion='207998-md5-3f2f8d5c9b958110b82c66d68140c49a', AvailableRunners=[Ant, cargo-deploy-runner, csharpScript, DockerCommand, DockerCompose, dotcover, dotnet, dotnet-tools-dupfinder, dotnet-tools-inspectcode, Duplicator, ftp-deploy-runner, gradle-runner, Inspection, jetbrains_powershell, JPS, kotlinScript, Maven2, nodejs-runner, nunit-console, python-runner, Qodana, rake-runner, SBT, simpleRunner, smb-deploy-runner, ssh-deploy-runner, ssh-exec-runner], AvailableVcs=[tfs, jetbrains.git, mercurial, svn, perforce], AuthorizationToken='9b9b13ad9c7f97c05be01796baafe3cd', PingCode='uxJ6q3E1bMc7wGv3FQ60rfZs9Mw5qwA3'}
[2026-01-07 03:47:35,272]   INFO - nner2.OsProcessHandlerListener - Process exited with code 0. Command line: docker compose -f "docker/docker-compose-ci.yml" up --abort-on-container-exit --force-recreate sut
[2026-01-07 03:47:35,272]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.start.AgentBuildStepStatusFixer
[2026-01-07 03:47:35,272]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.start.FlushArtifactsStage

Using docker ps in the agent does not show the sut container still running. It seems like there is some internal TeamCity issue that does not continue the build to the next step.
Could this be a bug in the most recent version ?

0
11 comments

The build step is hanging for between 2 and 5 hours, and is continuing without error after hanging (see bump from 03:47 to 06:04 in this example):

[2026-01-07 03:47:35,272]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.start.FlushArtifactsStage
[2026-01-07 06:04:30,681]   INFO - ges.RunnerFinishStagesExecutor - Call finish stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.finish.PublishStepStatusFStage
[2026-01-07 06:04:30,682]   INFO - ges.RunnerFinishStagesExecutor - Call finish stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.finish.UnsubscribePropertiesFileUpdaterRunnedFStage
[2026-01-07 06:04:30,682]   INFO - ges.RunnerFinishStagesExecutor - Call finish stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.finish.FlushBuildLogRunnerFStage
[2026-01-07 06:04:30,683]   INFO - ges.RunnerFinishStagesExecutor - Call finish stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.finish.FireRunnerFinishedFStage
[2026-01-07 06:04:30,711]   INFO - ges.RunnerFinishStagesExecutor - Call finish stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.finish.FlushBuildLogRunnerFStage
[2026-01-07 06:04:30,940]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.start.CreateBuildWorkingDirectoryStage
[2026-01-07 06:04:30,940]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.startStages.CreateBuildTempDirectoryStage
[2026-01-07 06:04:30,941]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.startStages.CreateAgentTempDirectoryStage
[2026-01-07 06:04:30,941]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.startStages.CreateCheckoutDirectoryStage
[2026-01-07 06:04:30,941]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.start.RegisterPerBuildFileWriterPropertiesRunnerStage
[2026-01-07 06:04:30,942]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.start.FireBeforeRunnerStartedStage
[2026-01-07 06:04:30,944]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.start.CallBuildRunnerPrecondition
[2026-01-07 06:04:30,944]   INFO - rt.CallBuildRunnerPrecondition - Call BuildRunnerPrecondition: class jetbrains.buildServer.agent.feature.RubyEnvConfiguratorService
[2026-01-07 06:04:30,945]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.startStages.PerformFinalParametersResolveStage
[2026-01-07 06:04:30,945]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.start.SavePropertiesToFilesStage
[2026-01-07 06:04:31,408]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.start.CallRunnerStage
[2026-01-07 06:04:31,410]   INFO - erStages.start.CallRunnerStage - ----------------------------------------- [ Step 5/6: 'docker stop (Command Line)' (simpleRunner), Build "API / CI" #16194 {id=149422, buildTypeId='Etv5Api_Build'} ] -----------------------------------------

0

Hi,

Thanks for reaching us. Sorry for any inconvenience caused.
To proceed with a deeper investigation, could you please help collect and share the following information from the affected build agent:
1. Agent debug logs
- Please provide the teamcity-agent.log with debug logging enabled, as well as the related build log files.
- Instructions on how to enable and collect agent debug logs can be found here:
https://www.jetbrains.com/help/teamcity/viewing-build-agent-logs.html#Generic+Debug+Logging

2. Agent thread dump files

- Please also collect the agent thread dump files during the time when the build appears to be hanging.
- You can follow the instructions in this article:
https://teamcity-support.jetbrains.com/hc/en-us/articles/206545419-Agent-Thread-Dump

These logs will help us better understand where the agent is blocked and identify the root cause.
Large files can be uploaded via https://uploads.jetbrains.com/. Please let us know the exact id after the upload.

Thank you for your cooperation. 

 

0

Hi Tom,

Thanks for your answer.

I've tried to enable DEBUG level for the agent by editing both teamcity-agent-log4j2.xml and buildAgent.properties , but I'm still only getting INFO level logs , nothing on DEBUG level.

cat agent1_conf/buildAgent.properties | grep DEBUG
teamcity.agent.log.level=DEBUG

cat agent1_conf/teamcity-agent-log4j2.xml | grep DEBUG
   <Logger name="jetbrains.buildServer" level="DEBUG">

I've also recreated the container after editing these.
Is there anything else I need to do for enabling logging in docker specifically?
 

0

The agent thread dump file can be found here:

Upload id: 2026_01_08_LrLYwmfUPLDV3KHkhyiYCT (file: scratch_526.txt)

0
Hi,

You have likely done the configuration correctly. Debug logs should be written to the teamcity-agent.log file inside the container's log directory.
Additionally, the teamcity.agent.log.level property in buildAgent.properties is not a valid standard property for controlling the agent's own logging, only the log4j2.xml file controls this.

For more detailed information, please refer to https://www.jetbrains.com/help/teamcity/viewing-build-agent-logs.html#Generic+Debug+Logging.
0

Hi Tom,

Thanks. Is there any problem visible in the thread dump that I shared?

0
Hi,

Thank you for sharing the thread dump.

Based on the analysis, there are no signs of deadlocks, memory exhaustion, or critical system hangs. The agent is operating normally and is currently in the final phase of a build, waiting for data to be transferred to the TeamCity server.

The primary activity observed is on Thread 149, which corresponds to Build ID 149526 (Etv5Api_Build):

Thread state: TIMED_WAITING

Current activity: ArtifactProcessor.waitForPublishingFinish

This indicates that the build steps themselves have already completed successfully. At this stage, the agent is uploading artifacts (such as logs, test reports, or build outputs) to the TeamCity server. The thread is temporarily waiting while the artifact publishing process completes in the background.

From a resource usage perspective:

- Memory: The agent is using approximately 150 MB of heap memory out of a maximum of 1.6 GB, which indicates no memory pressure.
- CPU: The system load average is 4.05 on a machine with 48 CPU cores, suggesting the system is largely idle.

At this point, there are no indications of an agent-side issue in the thread dump. I will also reach out to our Development team to double-check this behavior. If I receive any additional feedback or recommendations, I will update you accordingly.
0

Hi Tom,

Thanks for investigating and the clear explanation.

It does sound like a problem with the latest TeamCity server and/or agent version.
If a build step suddenly is waiting hours for ‘artifact publishing’, while this build step is hardly doing anything worth publishing except some console output, there must be some internal problem right?

0
Hi,

You are welcome.

> It does sound like a problem with the latest TeamCity server and/or agent version.

It appears that this may be an issue in the latest TeamCity server affecting Docker-in-Docker scenarios, especially if it was working correctly before the upgrade. However, I still need to confirm this with our Development team.

Thank you for your understanding.

0

We have downgraded to the previous version (and restored from a recent backup as the current state was not ‘downgradable’), and our builds are running properly again.

So it really seems to be an issue with the newer version.

0
Thank you for the update.

It’s good to hear that downgrading to the previous version and restoring from a recent backup resolved the issue and that your builds are running properly again.

We have already registered this as a bug and are tracking it here:
TW-98084 – A build hangs on FlushArtifactsStage stage
https://youtrack.jetbrains.com/issue/TW-98084/A-build-hangs-on-FlushArtifactsStage-stage

Our development team will continue investigating the root cause. We will keep this ticket updated with any progress, workarounds, or fixes as they become available. You can vote and subscript it for updates.

Thank you for your patience and for taking the time to confirm the downgrade behavior.
0

Please sign in to leave a comment.