Docker-in-Docker docker-compose fails to start container: cannot enter cgroupv2 "/sys/fs/cgroup/docker" with domain controllers -- it is in an invalid state: unknown
I have an instance of TeamCity Server 2023.05.4 (build 129421), on Ubuntu 22.04, that I am running in Docker and have one build agent connected to it that is also running in docker and on the same host, for testing purposes. My build process relies on docker-compose to be usable, so I enabled docker-in-docker mode using -e DOCKER_IN_DOCKER=start
in docker run and when I attach to the build agent container on startup, I can see that docker is started successfully. I set up a build configuration that first starts up the docker containers using docker-compose and then runs the tests for my HTTP API. The issue arises when docker-compose tries to run the container images after extraction. The following issue is reported and docker-compose fails and the build is stopped:
[13:52:15]W: [Step 1/2] Creating testing_environment_memcached_1 ... error
[13:52:15]W: [Step 1/2]
[13:52:15]W: [Step 1/2] ERROR: for testing_environment_memcached_1 Cannot start service memcached: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: cannot enter cgroupv2 "/sys/fs/cgroup/docker" with domain controllers -- it is in an invalid state: unknown
[13:52:15]W: [Step 1/2]
[13:52:15]W: [Step 1/2] ERROR: for memcached Cannot start service memcached: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: cannot enter cgroupv2 "/sys/fs/cgroup/docker" with domain controllers -- it is in an invalid state: unknown
[13:52:15]W: [Step 1/2] Encountered errors while bringing up the project.
[13:52:15]W: [Step 1/2] Process exited with code 1
I have tried many fixes to try and rectify the issue. The ones that come to mind are the following:
- This fix: https://github.com/kubernetes/minikube/pull/17032/commits/67083dc89b6cb7c5909031a4a892c5e0c33aa8df
- Forcing docker commands to be run as root using
teamcity.docker.use.sudo=true
(From the logs I can see docker does indeed run as root, but the build behavior doesn't change)
Here is my docker-compose.json:
version: '3'
services:
litespeed:
image: litespeedtech/openlitespeed:1.7.18-lsphp81
logging:
driver: none
env_file:
- .env
volumes:
- ./lsws/conf:/usr/local/lsws/conf
- ./lsws/admin-conf:/usr/local/lsws/admin/conf
- ./bin/container:/usr/local/bin
- ../:/usr/local/lsws/conf/vhosts/api/site
links:
- memcached
ports:
- 80:80
- 443:443
restart: always
environment:
TZ: ${TimeZone}
deploy:
resources:
limits:
memory: 4096M
networks:
- default
memcached:
image: memcached:1.6.22-alpine3.18
networks:
- default
logging:
driver: none
deploy:
resources:
limits:
memory: 512M
networks:
default:
driver: bridge
The host docker version is 24.0.7, build afdd53b. I may or may not have left out necessary information from this post, so please let me know so I can provide said information. Thank you for your help on this matter in advance.
Please sign in to leave a comment.
Hi,
I can't reproudce the issue from my side with you supplied compose file.
But according to your description, it seems that error is due to incorrect permissions. Could you please have a try to run teamcity agent with root user?
docker run -d --name teamcity-agent --user 0 --network my-teamcity-network -e SERVER_URL=http://teamcity-server:8111 -v /opt/dockeragent/agent/conf:/data/teamcity_agent/conf -e DOCKER_IN_DOCKER=start -v /var/run/docker.sock:/var/run/docker.sock jetbrains/teamcity-agent
If it still doesn't work, for further investigation, please share the teamcity-build.log & teamcity-agent.log to https://uploads.jetbrains.com/. And please let us know the exact Ids after the upload.
Thank you for reaching out. I appreciate your help. I am indeed elevating the privileges allowed in the docker agent container. Below is the full docker run command that I use. Please note that the docker image I'm using is a custom image based on (FROM) TK with minor additions of build tools meant to fit with my build process.
Here is the upload ID of the two requested files:
2023_11_21_W57JRFrWeZ8DbimUuSdbJL
In an effort to help with debugging, the following are the build configurations as code, with sensitive information redacted when appropriate.
Build Configuration Template:
Build Configuration
Thank you again for your help. Let me know if you need any more information. I'll be happy to provide it.
If it helps, I can run
docker run hello-world
and the container runs just fine and both of the containers in the docker-compose file usingdocker run
separately, for that matter. docker-compose is the only command that is broken and can't seem to get the containers up and running. Docker containers that require cgroupv2 are the only ones that are broken. I also realize that the docker engine version is VERY old (20.10) and barely has a version that supports cgorupv2. Any suggestions you have? Thank you.I have already reviewed the file you uploaded, but I haven't found anything particularly.
However, based on the command you shared, it appears that the error is caused by the absence of -v /var/run/docker.sock:/var/run/docker.sock. When aiming to control or inspect Docker containers from within another container, it's necessary to include the -v /var/run/docker.sock:/var/run/docker.sock option.
It's important to note that giving a container access to the Docker socket provides significant privileges, as it allows the container to control the Docker daemon. This should be done cautiously, and only trusted containers should be given such access due to potential security risks.
if it doesn't work, we'll continue investigating:
1. Manually access the container with root user to run docke-comopse command to check if there is any difference. If it still doesn't work, please have a try to manually install the docker compose in the container and try it again.
curl -L "https://github.com/docker/compose/releases/download/v2.12.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
mv /usr/local/bin/docker-compose /usr/bin/docker-compose
chmod +x /usr/bin/docker-compose
2. use the default agent image to instead of your customized agent image for test.
Thank you again for reaching out. I was under the impression that the teamcity-agent image starts its own dockerd inside of the container. passing in the dockerd socket of the host ends up working fine by the way, but that means that all of the containers and images end up “spilling” onto the host. For one, the host itself has limited storage space and I'm storing the agent data on a very large and fast network share, bypassing that limitation. Using the host's docker daemon doesn't allow for storage expansion of container images. Second, it's harder to maintain and separate out which images are genuinely used for running things on the host or the agent and which containers need to be kept during routine maintenance. With more than one agent (currently the plan) this will only become more difficult. I will try not passing in the host's dockerd socket with a manually-installed version of docker-compose soon once I have time, though, and report back results.
I just checked the Docker Hub README and indeed my assumption is correct. I'm trying to set up docker-in-docker. Please see the following quote:
--privileged -u 0 -e DOCKER_IN_DOCKER=start
should be all I need to make this work.I'm also finding that any exposed ports inside of the agent are actually only opened on the host due to the docker socket being that of the host's inside of the agent container.
Hi ,
>I will try not passing in the host's dockerd socket with a manually-installed version of docker-compose soon once I have time, though, and report back results.
Have you had a chance to try it out yet? Take your time and let me know if there's anything I can do to assist you further.
Yes. I tried it and posted my results in the previous comments on this post. Host docker socket works fine, but the containers and images leak onto the host's Docker (as expected) and so do the port exposes (this is unworkable and and did not expect this). dockerd inside of the teamcity agent image works fine with docker run, but even the latest docker-compose fails to run with the original error being present. The images are downloaded just fine. Thanks for your help.
Hi,
I have submitted a ticket TW-85440 to YouTrack on your behalf. Please follow this ticket for updates.
Thank you very much. You’ve been most helpful on this matter.