TeamCity agent 2024-07-02 docker in docker stopped working

Since installing the latest TeamCity server update + updating Ubuntu 24.04 with the latest docker version (27.2.0, build 3ab4256), the agents were no longer working (we had them locked to 2022.10.3-linux-sudo).

But when upgrading the agents to 2024-07-02-linux-sudo, they cannot access docker anymore.

The container starts fine:

teamcity-agent2-1  | /run-services.sh
teamcity-agent2-1  | /services/run-docker.sh
teamcity-agent2-1  |  * Starting Docker: docker
teamcity-agent2-1  |    ...done.
teamcity-agent2-1  | Docker daemon started
teamcity-agent2-1  |  * Docker is running
teamcity-agent2-1  | /run-agent.sh

But at one point, I can see the following errors:

 

teamcity-agent2-1  | [2024-09-03 08:00:51,881]   INFO - ains.buildServer.util.FileUtil - Unable to remove directory /opt/buildagent/temp on unix platform, try to fix permissions and repeat delete operation
teamcity-agent2-1  | [2024-09-03 08:00:51,918]   WARN - ains.buildServer.util.FileUtil - Unable to remove directory /opt/buildagent/temp using command line: rm -Rf /opt/buildagent/temp
teamcity-agent2-1  | execution code: 1
teamcity-agent2-1  | std out:
teamcity-agent2-1  | std err: rm: cannot remove '/opt/buildagent/temp': Device or resource busy

 

Docker config looks like this:

  agent2:
   image: jetbrains/teamcity-agent:2024.07.2-linux-sudo
   networks:
     - teamcity
   volumes:
     - ./agent2_conf:/data/teamcity_agent/conf
     - agent2_docker_volume:/var/lib/docker
   environment:
     - SERVER_URL=http://server:8111
     - DOCKER_IN_DOCKER=start
   privileged: true
   restart: unless-stopped

Could this be related to the docker version, or any other obvious configuration issue?

Uploaded the full agent log with id: 2024_09_03_boq9dQAELRZ1eUTRAaTWMY

1
10 comments

Eventually, the agent will register to the server, but the agent is not compatible with our build configurations due to:

Unmet requirements:

  • docker.server.osType exists
0
Hi,

Do you run TeamCity in a Kubernetes cluster by any chance?

Best regards,
Anton
0

Hi Anton,

No, the server and agents are configured in a docker-compose.yml file (and the docker config shared above is part of docker-compose.yml)

0

So the good news is, starting another agent with this command is working properly:

docker run -e SERVER_URL="http://server:8111"  \
    -u 0 \
    -v /home/webdev/services/teamcity/agent4_conf:/data/teamcity_agent/conf \
    -v docker_volumes:/var/lib/docker \
    --network teamcity_teamcity \
    --privileged -e DOCKER_IN_DOCKER=start \
    jetbrains/teamcity-agent:2024.07.2-linux-sudo

Now trying to find out what is missing from my docker compose config, I suspect it's the user that is configured to run the container.

0
Great, thank you for the information.
Please let me know if you find anything.

Best regards,
Anton
0

Still don't have a clue how to solve this with docker compose. Adding user: 0 does not solve the problem.

Perhaps something changed regarding privileged with recent docker compose changes.

0

Looks like the problem was specifically in the volumes that were bound to /var/lib/docker, and needed to be purged / recreated. Perhaps this was related to a docker update on the host system.

Fully removing all volumes and recreating the containers solved the problem.

0
Thank you for letting us know. And I'm glad to hear that the issue has been resolved.

Best regards,
Anton
0

Apparently, the problems are still active:

- I cannot SSH into agent containers that are running

docker exec -it teamcity-agent3-1 bash
NotFound: task c29e722a27aff89d0f8a1807683a90cd9be00ab4ea621aa301ec430bab35cf48 not found: not found

- After restarting a container, docker is unavailable again in the container  (docker.server.osType empty again)

- After deleting volumes and creating a new container, the problems are resolved. 

One thing that we are doing, is running a docker prune inside the buildagents (to avoid massive amounts of disk space piling up):

docker exec teamcity-agent3-1 docker system prune -a --volumes -f

But when I do this manually, the container still is responsive / docker in docker is still working (even after a restart).

0
Hi,

Just to make sure we're on the same page, the above symptoms are reproduced only when using docker compose? If you start the agent with docker run, it works correctly?

Best regards,
Anton
0

Please sign in to leave a comment.