Find out why a build is in a queue

Answered

We are using 2019.1.3 with AWS EC2 cloud instances started on demand.

Last night, we had several builds which were scheduled to trigger at 01:00, but did not execute until 08:09. There were plenty of unused licenses available and the agents should have run immediately. There was nothing else happening on the server.

How can I figure out why the builds remained in the queue for so long?

10 comments
Comment actions Permalink

Hi Dan,

Would you be able to share some logs with me? I'd like to look at your teamcity-server.log and teamcity-agent.log for the period of this build, if that would be possible. If you are able to privately share them, please sent them through the Submit a Request button on the top of the page. If this is not a possibility, let me know. 

Thanks,

Eric

0
Comment actions Permalink

Hi,

I have submitted a support request with attached teamcity-server log file for the period in question - Early morning on the 11th Dec.

I'm not sure if it's relevant, but I have noticed 2 minor issues:

1. I had a CloudProfile with 2 instances configured - 1 (which I used for testing) had a limit of 0 and an AMI which no-longer exists. I have since changed it to an AMI which does exist.

2. I have disabled a plugin on the server since building the AMI, so there are log messages related to this.

Thanks,

Dan

0
Comment actions Permalink

Thanks, Dan. I've received your log and will be reviewing it today. Since you've made these two changes, has the problem recurred? 

0
Comment actions Permalink

No, it looks like a one-off.

Just to be clear, I disabled the plugin a while back. I have not changed the plugin settings since the problem.

While looking into the issue, I noticed the error relating to the AMI, so I fixed it. The AMI problem was probably around for a few weeks/months.

The issue with the queued build doesn't seem to be related to either of these issues - I only mentioned them because you may find logs related to them.

0
Comment actions Permalink

I see a few builds during this timeframe. Could you please provide me with the Build ID that corresponds to your build log you initially shared? This will save me a bit of time. 

I found it might also be helpful to check the statistics for this Build ID, which can be found easily with our Rest API. Sometimes the queue reason is listed in there. https://www.jetbrains.com/help/teamcity/rest-api.html#RESTAPI-Statistics

You can format your request like this: <teamcity server>/app/rest/builds/<buildId>/statistics

0
Comment actions Permalink

Sure, the build ID is 379208.

 

I'll take a look at the build stats, thanks.

0
Comment actions Permalink

There doesn't seem to be much in the build stats, although 'No_available_agents' looks like a possible reason if TC was unable to launch one. But I don't know why it would have had a problem launching. I can think of a few possibilities:

1. All other agents busy? There wasn't enough else going on to use up 7 agents for 7 hours.

2. Problems calling the AWS api? I would guess we would see this in the logs?

3. AWS account limits - I don't think this is the case because it's been working fine ever since.

 

<property name="buildStageDuration:firstStepPreparation" value="6" />

<property name="buildStageDuration:sourcesUpdate" value="4452" />

<property name="buildStageDuration:toolsUpdating" value="1" />

<property name="BuildTestStatus" value="1" />

<property name="queueWaitReason:Build_settings_have_not_been_finalized" value="41" />

<property name="queueWaitReason:No_available_agents" value="25779384" />

<property name="serverSideBuildFinishing" value="66" />

<property name="SuccessRate" value="1" />

<property name="TimeSpentInQueue" value="25779404" />

<property name="VisibleArtifactsSize" value="704773" />

0
Comment actions Permalink

I'm not finding your Build ID in the log file you sent to me, can you double-check that number? As you noticed, it does look like the build was in the queue waiting for an agent for quite some time. An easy thing to check would be your agent workload statistics, to see how many agents were in use at the time: https://confluence.jetbrains.com/display/TCD18/Viewing+Agents+Workload

Since you're using the AWS agents, you could also take a look at your teamcity-cloud.log for any clues. That would be where the AWS errors would appear if there are any.

 

 

0
Comment actions Permalink

Sorry, I think I gave you incorrect logs. I will send the correct logs now.

There are no 'ERROR' logs for the teamcity-cloud.log file and only a single 'WARN' from that day:

[2019-12-12 09:06:26,218] WARN [o-8111-exec-411] - .beans.CloudTabFormFactoryImpl - Can't find projectId by projectExtId ''

Interestingly, there are no log lines in that file  for the period between 2019-12-12 00:36:55,313 - 2019-12-12 08:07:56,530. This period covers the time while the build was queued.

I did check the agent statistics for the period - no builds ran at all.

 

0
Comment actions Permalink

Hi, Dan. I am stumped as to why TeamCity didn't think there were any agents available. I'm asking around internally for any ideas and I'll let you know if I come up with anything there. If you are able to reproduce the issue, it might be interesting to look at some thread dumps from the period of queuing to the point the build started. 

0

Please sign in to leave a comment.