Error starting docker-compose
We’re encountering an intermittent issue with TeamCity builds that are configured to run functional tests. The functional tests depend on services that we attempt to start with Docker Compose, specifically a PostgreSQL database and a RabbitMQ message bus. In approximately 70% of cases, the build fails to start with an error message, and in the remaining 30% of cases, the build completes successfully.
Environment
- TeamCity Version: TeamCity Professional 2024.07.3 (build 160765)
- Docker Version on Agent: Docker version 27.3.1, build ce12230
- Operating System on Agent: Linux, version 6.8.0-1014-azure
-
Docker-Compose File: See the attached
docker-compose.teamcity.yml
Steps to Reproduce
-
Configure a TeamCity build to run the docker-compose file below
version: '3.8' services: minio: user: "${UID}" image: minio/minio container_name: minio volumes: - ./volumes/minio-data:/data environment: - MINIO_ROOT_USER=minio - MINIO_ROOT_PASSWORD=minio ports: - "9000:9000" command: server --console-address :9001 /data healthcheck: test: mc ready local interval: 5s postgres: user: "${UID}" container_name: postgres image: postgres:latest environment: - POSTGRES_USER=postgres - POSTGRES_PASSWORD=postgres volumes: - ./volumes/postgres-data:/var/lib/postgresql/data ports: - "5432:5432" healthcheck: test: pg_isready -d postgres -U $$POSTGRES_USER interval: 10s rabbit: user: "${UID}" container_name: rabbit hostname: rabbit image: rabbitmq:3.12.12-management ports: - '5672:5672' volumes: - ./volumes/rabbit-data:/var/lib/rabbitmq healthcheck: test: rabbitmq-diagnostics -q ping interval: 10s dummy: image: alpine:latest entrypoint: nc -l -p 8080 healthcheck: test: exit 0 interval: 1s depends_on: postgres: condition: service_healthy minio: condition: service_healthy rabbit: condition: service_healthy
- Run the build multiple times to observe intermittent failures.
Observed Behavior
- Error Message: "Error starting docker-compose"
-
Logs: The build logs indicate that the container failed to run.
Build Log:[10:20:11] Step 1/7: Start services (Docker Compose) (7s) [10:20:11] [Step 1/7] Starting docker-compose for docker-compose.teamcity.yml [10:20:11] [Step 1/7] Starting: /bin/sh -c docker compose -f "docker-compose.teamcity.yml" up -d [10:20:11] [Step 1/7] in directory: /home/user/TeamCity/work/91b189cc04d05877 [10:20:11] [Step 1/7] time="2024-11-07T10:20:11Z" level=warning msg="/home/user/TeamCity/work/91b189cc04d05877/docker-compose.teamcity.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion" [10:20:11] [Step 1/7] Network 91b189cc04d05877_default Creating [10:20:11] [Step 1/7] Network 91b189cc04d05877_default Created [10:20:11] [Step 1/7] Container postgres Creating [10:20:11] [Step 1/7] Container rabbit Creating [10:20:11] [Step 1/7] Container minio Creating [10:20:12] [Step 1/7] Container minio Created [10:20:12] [Step 1/7] Container rabbit Created [10:20:12] [Step 1/7] Container postgres Created [10:20:12] [Step 1/7] Container 91b189cc04d05877-dummy-1 Creating [10:20:13] [Step 1/7] Container 91b189cc04d05877-dummy-1 Created [10:20:13] [Step 1/7] Container minio Starting [10:20:13] [Step 1/7] Container postgres Starting [10:20:13] [Step 1/7] Container rabbit Starting [10:20:13] [Step 1/7] Container postgres Started [10:20:13] [Step 1/7] Container rabbit Started [10:20:13] [Step 1/7] Container minio Started [10:20:13] [Step 1/7] Container postgres Waiting [10:20:13] [Step 1/7] Container minio Waiting [10:20:13] [Step 1/7] Container rabbit Waiting [10:20:14] [Step 1/7] Container postgres Error [10:20:15] [Step 1/7] Container minio Error [10:20:19] [Step 1/7] Container rabbit Error [10:20:19] [Step 1/7] dependency failed to start: container postgres exited (1) [10:20:19] [Step 1/7] Process exited with code 1 [10:20:19] [Step 1/7] Error starting docker-compose, see log for details [10:20:19] [Step 1/7] Step Start services (Docker Compose) failed
Container Info:
07 Nov 24 10:20:12 Container f1fe0cfd1441… CREATE ; from: postgres:latest 10:20:12 Container 2a19adc5dc92… CREATE ; from: rabbitmq:3.12.12-management 10:20:12 Container 32c44afd7611… CREATE ; from: minio/minio 10:20:13 Container 5e03c9fbf62e… CREATE ; from: alpine:latest 10:20:13 Container f1fe0cfd1441… START 10:20:13 Container 2a19adc5dc92… START 10:20:13 Container 32c44afd7611… START 10:20:14 Container f1fe0cfd1441… DIE 10:20:15 Container 32c44afd7611… DIE 10:20:19 Container 2a19adc5dc92… DIE
- Agent Check: When logging into the TeamCity agent after the failure, the container appears to start successfully but is then unexpectedly killed.
-
Local Reproducibility: Running the same
docker-compose.teamcity.yml
file locally does not produce any issues.
Expected Behavior
The Docker Compose services should start consistently on the TeamCity agent, allowing functional tests to run reliably.
Additional Details
- Frequency of Issue: Occurs in roughly 70% of builds.
-
Attachment:
docker-compose.teamcity.yml
file, which includes the configuration for the Postgres and RabbitMQ services.
Steps Taken to Troubleshoot
- Verified that the
docker-compose.teamcity.yml
file runs without issues in a local environment. - Verified that the
docker-compose.teamcity.yml
file runs without issues on a Teamcity agent when started manually via CLI. - Checked container status on the TeamCity agent, confirming it starts and then stops unexpectedly.
Request for Support
We would appreciate guidance on:
- Diagnosing what might cause this intermittent issue, particularly any settings within TeamCity or Docker that may impact Docker Compose consistency.
- Any recommended changes in TeamCity configuration or Docker Compose that could enhance stability for Docker-dependent builds.
- Logs or diagnostic files that would assist in resolving this issue.
Please sign in to leave a comment.
Hi Alexey,
When using Docker Compose in TeamCity builds for functional testing, issues may stem from factors such as resource limitations or service readiness timing. Consider these steps to help resolve potential issues:
1. Check Resource Limits on the TeamCity Agent
Resource Allocation: Ensure your TeamCity agent has sufficient resources, particularly CPU and memory, since limitations can lead to intermittent container failures. If these issues don’t occur locally, allocate additional memory for resource-intensive services like PostgreSQL and RabbitMQ.
Docker Resource Constraints: Define memory and CPU limits directly in your docker-compose.yml file to secure sufficient resources for each service. For example:
Adjust these settings as your need.
2. Adjust Health Check Intervals and Timeouts
Service startup failures may occur in CI environments if a service is marked as “healthy” before it's fully ready. Modify health check intervals and increase retries to allow more time for services to start.
Similarly, adjust health checks for other services, such as RabbitMQ and Minio.
3. Review Docker Logs for Error Messages
Examine Docker logs on the agent after a failure to pinpoint errors specific to PostgreSQL and RabbitMQ, such as port conflicts or initialization errors. Use
This can reveal any specific issues that may cause containers to exit or restart unexpectedly.
4. Enable Restart Policies
Adding restart policies in Docker Compose can automatically restart services if they exit unexpectedly, improving resilience:
Best Regards,
Tom
Hi Tom,
Thanks for getting back to me and sharing your recommendations. I’ve given each of them a try, but unfortunately, the issue persists.
Here’s what I’ve noticed: the problem consistently happens when TeamCity builds a commit from a branch that’s different from the one built previously. For example, if the last successful build was on the
main
branch and I trigger a build for thedevelop
branch, it fails with the error we discussed. Oddly enough, if I re-trigger the build for the same revision ofdevelop
, it works fine—the services start up, and the build succeeds without any issues.This had me thinking the problem might be tied to the state of the repository between builds, so I enabled the Swabra cleaning feature to ensure the repository is fully cleaned before each build. Unfortunately, that didn’t solve it either.
What makes this even more confusing is that I can’t seem to reproduce the problem manually. When I run the same steps on the build agent through the CLI, everything works as expected—the
docker-compose
services always start up successfully, no matter which branch or commit I’m working on. This leads me to believe the issue might have something to do with how TeamCity handles the output fromdocker-compose
or perhaps some part of its internal state management.I’d love to hear your thoughts on this, especially if you’ve encountered something similar before or have ideas on what else I could try. Let me know if there’s any other information I can provide that might help us get to the bottom of this.
Regards,
Alexey.
Hi Alexey,
Thanks for the detailed update! The behavior you’re describing is indeed peculiar.
Based on your description, please ensure there are no leftover containers, networks, or volumes causing interference. You can do this by running the following commands:
Alternatively, you can enable Clean Checkout in TeamCity to ensure no residual files from previous branch builds:
Go to the VCS settings of the build configuration and check the option Clean all files before build.
Best Regards,
Tom