Find out why a build is in a queue
Answered
We are using 2019.1.3 with AWS EC2 cloud instances started on demand.
Last night, we had several builds which were scheduled to trigger at 01:00, but did not execute until 08:09. There were plenty of unused licenses available and the agents should have run immediately. There was nothing else happening on the server.
How can I figure out why the builds remained in the queue for so long?
Please sign in to leave a comment.
Hi Dan,
Would you be able to share some logs with me? I'd like to look at your teamcity-server.log and teamcity-agent.log for the period of this build, if that would be possible. If you are able to privately share them, please sent them through the Submit a Request button on the top of the page. If this is not a possibility, let me know.
Thanks,
Eric
Hi,
I have submitted a support request with attached teamcity-server log file for the period in question - Early morning on the 11th Dec.
I'm not sure if it's relevant, but I have noticed 2 minor issues:
1. I had a CloudProfile with 2 instances configured - 1 (which I used for testing) had a limit of 0 and an AMI which no-longer exists. I have since changed it to an AMI which does exist.
2. I have disabled a plugin on the server since building the AMI, so there are log messages related to this.
Thanks,
Dan
Thanks, Dan. I've received your log and will be reviewing it today. Since you've made these two changes, has the problem recurred?
No, it looks like a one-off.
Just to be clear, I disabled the plugin a while back. I have not changed the plugin settings since the problem.
While looking into the issue, I noticed the error relating to the AMI, so I fixed it. The AMI problem was probably around for a few weeks/months.
The issue with the queued build doesn't seem to be related to either of these issues - I only mentioned them because you may find logs related to them.
I see a few builds during this timeframe. Could you please provide me with the Build ID that corresponds to your build log you initially shared? This will save me a bit of time.
I found it might also be helpful to check the statistics for this Build ID, which can be found easily with our Rest API. Sometimes the queue reason is listed in there. https://www.jetbrains.com/help/teamcity/rest-api.html#RESTAPI-Statistics
You can format your request like this: <teamcity server>/app/rest/builds/<buildId>/statistics
Sure, the build ID is 379208.
I'll take a look at the build stats, thanks.
There doesn't seem to be much in the build stats, although 'No_available_agents' looks like a possible reason if TC was unable to launch one. But I don't know why it would have had a problem launching. I can think of a few possibilities:
1. All other agents busy? There wasn't enough else going on to use up 7 agents for 7 hours.
2. Problems calling the AWS api? I would guess we would see this in the logs?
3. AWS account limits - I don't think this is the case because it's been working fine ever since.
<property name="buildStageDuration:firstStepPreparation" value="6" />
<property name="buildStageDuration:sourcesUpdate" value="4452" />
<property name="buildStageDuration:toolsUpdating" value="1" />
<property name="BuildTestStatus" value="1" />
<property name="queueWaitReason:Build_settings_have_not_been_finalized" value="41" />
<property name="queueWaitReason:No_available_agents" value="25779384" />
<property name="serverSideBuildFinishing" value="66" />
<property name="SuccessRate" value="1" />
<property name="TimeSpentInQueue" value="25779404" />
<property name="VisibleArtifactsSize" value="704773" />
I'm not finding your Build ID in the log file you sent to me, can you double-check that number? As you noticed, it does look like the build was in the queue waiting for an agent for quite some time. An easy thing to check would be your agent workload statistics, to see how many agents were in use at the time: https://confluence.jetbrains.com/display/TCD18/Viewing+Agents+Workload
Since you're using the AWS agents, you could also take a look at your teamcity-cloud.log for any clues. That would be where the AWS errors would appear if there are any.
Sorry, I think I gave you incorrect logs. I will send the correct logs now.
There are no 'ERROR' logs for the teamcity-cloud.log file and only a single 'WARN' from that day:
[2019-12-12 09:06:26,218] WARN [o-8111-exec-411] - .beans.CloudTabFormFactoryImpl - Can't find projectId by projectExtId ''
Interestingly, there are no log lines in that file for the period between 2019-12-12 00:36:55,313 - 2019-12-12 08:07:56,530. This period covers the time while the build was queued.
I did check the agent statistics for the period - no builds ran at all.
Hi, Dan. I am stumped as to why TeamCity didn't think there were any agents available. I'm asking around internally for any ideas and I'll let you know if I come up with anything there. If you are able to reproduce the issue, it might be interesting to look at some thread dumps from the period of queuing to the point the build started.