Error starting docker-compose

We’re encountering an intermittent issue with TeamCity builds that are configured to run functional tests. The functional tests depend on services that we attempt to start with Docker Compose, specifically a PostgreSQL database and a RabbitMQ message bus. In approximately 70% of cases, the build fails to start with an error message, and in the remaining 30% of cases, the build completes successfully.

Environment

  • TeamCity Version: TeamCity Professional 2024.07.3 (build 160765)
  • Docker Version on Agent: Docker version 27.3.1, build ce12230
  • Operating System on Agent: Linux, version 6.8.0-1014-azure
  • Docker-Compose File: See the attached docker-compose.teamcity.yml

Steps to Reproduce

  1. Configure a TeamCity build to run the docker-compose file below

    version: '3.8'
    
    services:
        minio:
            user: "${UID}"
            image: minio/minio
            container_name: minio
            volumes:
                - ./volumes/minio-data:/data
            environment:
                - MINIO_ROOT_USER=minio
                - MINIO_ROOT_PASSWORD=minio
            ports:
                - "9000:9000"
            command: server --console-address :9001 /data
            healthcheck:
                test: mc ready local
                interval: 5s
        
        postgres:
            user: "${UID}"
            container_name: postgres
            image: postgres:latest
            environment:
                - POSTGRES_USER=postgres
                - POSTGRES_PASSWORD=postgres
            volumes:
                - ./volumes/postgres-data:/var/lib/postgresql/data
            ports:
                - "5432:5432"
            healthcheck:
                test: pg_isready -d postgres -U $$POSTGRES_USER
                interval: 10s
        
        rabbit:
            user: "${UID}"
            container_name: rabbit
            hostname: rabbit
            image: rabbitmq:3.12.12-management
            ports:
                - '5672:5672'
            volumes:
                - ./volumes/rabbit-data:/var/lib/rabbitmq
            healthcheck:
                test: rabbitmq-diagnostics -q ping
                interval: 10s
        
        dummy:
            image: alpine:latest
            entrypoint: nc -l -p 8080
            healthcheck:
                test: exit 0
                interval: 1s
            depends_on:
                postgres:
                    condition: service_healthy
                minio:
                    condition: service_healthy
                rabbit:
                    condition: service_healthy
  2. Run the build multiple times to observe intermittent failures.

Observed Behavior

  • Error Message: "Error starting docker-compose"
  • Logs: The build logs indicate that the container failed to run.
    Build Log:

    [10:20:11]	Step 1/7: Start services (Docker Compose) (7s)
    [10:20:11]	[Step 1/7] Starting docker-compose for docker-compose.teamcity.yml
    [10:20:11]	[Step 1/7] Starting: /bin/sh -c docker compose  -f "docker-compose.teamcity.yml" up -d
    [10:20:11]	[Step 1/7] in directory: /home/user/TeamCity/work/91b189cc04d05877
    [10:20:11]	[Step 1/7] time="2024-11-07T10:20:11Z" level=warning msg="/home/user/TeamCity/work/91b189cc04d05877/docker-compose.teamcity.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion"
    [10:20:11]	[Step 1/7]  Network 91b189cc04d05877_default  Creating
    [10:20:11]	[Step 1/7]  Network 91b189cc04d05877_default  Created
    [10:20:11]	[Step 1/7]  Container postgres  Creating
    [10:20:11]	[Step 1/7]  Container rabbit  Creating
    [10:20:11]	[Step 1/7]  Container minio  Creating
    [10:20:12]	[Step 1/7]  Container minio  Created
    [10:20:12]	[Step 1/7]  Container rabbit  Created
    [10:20:12]	[Step 1/7]  Container postgres  Created
    [10:20:12]	[Step 1/7]  Container 91b189cc04d05877-dummy-1  Creating
    [10:20:13]	[Step 1/7]  Container 91b189cc04d05877-dummy-1  Created
    [10:20:13]	[Step 1/7]  Container minio  Starting
    [10:20:13]	[Step 1/7]  Container postgres  Starting
    [10:20:13]	[Step 1/7]  Container rabbit  Starting
    [10:20:13]	[Step 1/7]  Container postgres  Started
    [10:20:13]	[Step 1/7]  Container rabbit  Started
    [10:20:13]	[Step 1/7]  Container minio  Started
    [10:20:13]	[Step 1/7]  Container postgres  Waiting
    [10:20:13]	[Step 1/7]  Container minio  Waiting
    [10:20:13]	[Step 1/7]  Container rabbit  Waiting
    [10:20:14]	[Step 1/7]  Container postgres  Error
    [10:20:15]	[Step 1/7]  Container minio  Error
    [10:20:19]	[Step 1/7]  Container rabbit  Error
    [10:20:19]	[Step 1/7] dependency failed to start: container postgres exited (1)
    [10:20:19]	[Step 1/7] Process exited with code 1
    [10:20:19]	[Step 1/7] Error starting docker-compose, see log for details
    [10:20:19]	[Step 1/7] Step Start services (Docker Compose) failed

    Container Info:

    07 Nov 24 10:20:12 Container f1fe0cfd1441… CREATE ; from: postgres:latest
    10:20:12 Container 2a19adc5dc92… CREATE ; from: rabbitmq:3.12.12-management
    10:20:12 Container 32c44afd7611… CREATE ; from: minio/minio
    10:20:13 Container 5e03c9fbf62e… CREATE ; from: alpine:latest
    10:20:13 Container f1fe0cfd1441… START
    10:20:13 Container 2a19adc5dc92… START
    10:20:13 Container 32c44afd7611… START
    10:20:14 Container f1fe0cfd1441… DIE
    10:20:15 Container 32c44afd7611… DIE
    10:20:19 Container 2a19adc5dc92… DIE
  • Agent Check: When logging into the TeamCity agent after the failure, the container appears to start successfully but is then unexpectedly killed.
  • Local Reproducibility: Running the same docker-compose.teamcity.yml file locally does not produce any issues.

Expected Behavior

The Docker Compose services should start consistently on the TeamCity agent, allowing functional tests to run reliably.

Additional Details

  • Frequency of Issue: Occurs in roughly 70% of builds.
  • Attachment: docker-compose.teamcity.yml file, which includes the configuration for the Postgres and RabbitMQ services.

Steps Taken to Troubleshoot

  • Verified that the docker-compose.teamcity.yml file runs without issues in a local environment.
  • Verified that the docker-compose.teamcity.yml file runs without issues on a Teamcity agent when started manually via CLI.
  • Checked container status on the TeamCity agent, confirming it starts and then stops unexpectedly.

Request for Support

We would appreciate guidance on:

  1. Diagnosing what might cause this intermittent issue, particularly any settings within TeamCity or Docker that may impact Docker Compose consistency.
  2. Any recommended changes in TeamCity configuration or Docker Compose that could enhance stability for Docker-dependent builds.
  3. Logs or diagnostic files that would assist in resolving this issue.
0
3 comments

Hi Alexey,

When using Docker Compose in TeamCity builds for functional testing, issues may stem from factors such as resource limitations or service readiness timing. Consider these steps to help resolve potential issues:

1. Check Resource Limits on the TeamCity Agent

Resource Allocation: Ensure your TeamCity agent has sufficient resources, particularly CPU and memory, since limitations can lead to intermittent container failures. If these issues don’t occur locally, allocate additional memory for resource-intensive services like PostgreSQL and RabbitMQ.

Docker Resource Constraints: Define memory and CPU limits directly in your docker-compose.yml file to secure sufficient resources for each service. For example:

services:
  postgres:
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: '0.5'
  rabbit:
    deploy:
      resources:
        limits:
          memory: 256M
          cpus: '0.5'

Adjust these settings as your need.

2. Adjust Health Check Intervals and Timeouts

Service startup failures may occur in CI environments if a service is marked as “healthy” before it's fully ready. Modify health check intervals and increase retries to allow more time for services to start.

healthcheck:
  test: pg_isready -d postgres -U $$POSTGRES_USER
  interval: 10s
  retries: 5  # Increase the retry count for PostgreSQL to allow more time for startup
  timeout: 30s

Similarly, adjust health checks for other services, such as RabbitMQ and Minio.

3. Review Docker Logs for Error Messages

Examine Docker logs on the agent after a failure to pinpoint errors specific to PostgreSQL and RabbitMQ, such as port conflicts or initialization errors. Use

docker logs <container_id>

This can reveal any specific issues that may cause containers to exit or restart unexpectedly.

4. Enable Restart Policies

Adding restart policies in Docker Compose can automatically restart services if they exit unexpectedly, improving resilience:

services:
  postgres:
    restart: always
  rabbit:
    restart: always

Best Regards,

Tom

1

Hi Tom,

Thanks for getting back to me and sharing your recommendations. I’ve given each of them a try, but unfortunately, the issue persists.

Here’s what I’ve noticed: the problem consistently happens when TeamCity builds a commit from a branch that’s different from the one built previously. For example, if the last successful build was on the main branch and I trigger a build for the develop branch, it fails with the error we discussed. Oddly enough, if I re-trigger the build for the same revision of develop, it works fine—the services start up, and the build succeeds without any issues.

This had me thinking the problem might be tied to the state of the repository between builds, so I enabled the Swabra cleaning feature to ensure the repository is fully cleaned before each build. Unfortunately, that didn’t solve it either.

What makes this even more confusing is that I can’t seem to reproduce the problem manually. When I run the same steps on the build agent through the CLI, everything works as expected—the docker-compose services always start up successfully, no matter which branch or commit I’m working on. This leads me to believe the issue might have something to do with how TeamCity handles the output from docker-compose or perhaps some part of its internal state management.

I’d love to hear your thoughts on this, especially if you’ve encountered something similar before or have ideas on what else I could try. Let me know if there’s any other information I can provide that might help us get to the bottom of this.

Regards,
Alexey.

0

Hi Alexey,

Thanks for the detailed update! The behavior you’re describing is indeed peculiar.

Based on your description, please ensure there are no leftover containers, networks, or volumes causing interference. You can do this by running the following commands:

docker-compose -f docker-compose.teamcity.yml down --volumes --remove-orphans
docker volume prune -f
docker network prune -f

Alternatively, you can enable Clean Checkout in TeamCity to ensure no residual files from previous branch builds:

Go to the VCS settings of the build configuration and check the option Clean all files before build.

Best Regards,

Tom

0

Please sign in to leave a comment.