EC2 Cloud agent not authorized automatically
Hi,
I am running an on premises TeamCity Professional 2024.03.2 (build 156319) and I am observing some strange behavior on my EC2 cloud agents, which seems to have started since the last update.
TL,DR:
The agents are starting up as unauthorized after the second start-up. These are the log entries from teamcity-clouds.log:
[2024-06-26 08:21:21,302] INFO [; Flush Queue 1] - .instances.StartInstanceAction - Starting cloud instance: profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root}, AmazonImageInstance{id=Agent Smith, amazonId=***}, temporary id=[THIS_ID_IS_DIFFERENT_THAN_THE_AUTHORIZATION_TOKEN], reason=Automatic start: Compatible with queued build [BUILD_NAME](promotionId=13679)
[2024-06-26 08:21:51,196] WARN [nio-8111-exec-7] - .instances.RunningInstanceTask - The unauthorized agent '"Agent-Smith" {id=137}' cannot be matched with the cloud instance 'profile-id: amazon-3, image-id: Agent Smith, instance-id: Agent Smith' because there is an authorized agent associated with the same cloud instance
The first time the agents start-up, either because there's a build on the queue or by manually starting the instance on the Agents tab, everything works fine, they get authorized and available. Here are the log entries for that event:
----------
[2024-06-26 08:09:46,021] INFO [; Flush Queue 1] - .instances.StartInstanceAction - Starting cloud instance: profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root}, AmazonImageInstance{id=Agent Smith, amazonId=***}, temporary id=[THIS_IS_THE_SAME_AS_THE_AUTHORIZATION_TOKEN], reason=Automatic start: Compatible with queued build [BUILD_NAME](promotionId=13677)
[2024-06-26 08:09:46,021] INFO [; Flush Queue 1] - .server.impl.CloudEventsLogger - Cloud instance start succeeded: profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root}, Agent Smith
[2024-06-26 08:09:46,129] INFO [uled executor 2] - .instances.StoppedInstanceTask - Instance has changed status from stopped to Scheduled to start: Amazon Instance{instanceId=Agent Smith, imageId=Agent Smith, amazonImageId=***, amazonInstanceId=***, status: Scheduled to start}, profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root}
[2024-06-26 08:09:51,129] INFO [uled executor 3] - .server.impl.CloudEventsLogger - Cloud instance entered 'starting' state, profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root}, Agent Smith
[2024-06-26 08:10:18,462] INFO [nio-8111-exec-1] - .instances.RunningInstanceTask - The cloud agent '"Agent-Smith" {id=137}' with the starting token '[AUTHORIZATION_TOKEN]' is successfully matched with the cloud instance 'profile-id: amazon-3, image-id: Agent Smith, instance-id: Agent Smith'
[2024-06-26 08:10:18,463] INFO [nio-8111-exec-1] - r.impl.DBCloudStateManagerImpl - Image: AmazonImageInstance{id=Agent Smith, amazonId=***}, profile: profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root} was marked to CONTAIN agent
[2024-06-26 08:10:18,463] INFO [nio-8111-exec-1] - .server.impl.CloudEventsLogger - Detected cloud agent "Agent-Smith" {id=137}, profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root}, Agent Smith
[2024-06-26 08:10:21,132] INFO [uled executor 5] - .server.impl.CloudEventsLogger - Cloud instance entered 'running' state, profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root}, Agent Smith
----------
Then, the agent is terminated, either stopped manually on the Agents tab, or automatically by the idle timeout (I have it at 5 min).
----------
[2024-06-26 08:20:39,498] INFO [uled executor 3] - te.IdleTimeoutTerminateFactory - Will stop instance Amazon Instance{instanceId=Agent Smith, imageId=Agent Smith, amazonImageId=***, amazonInstanceId=***, status: Running}, profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root} with idle time 5 minutes (idle time is 5 minutes)
[2024-06-26 08:20:39,498] INFO [uled executor 3] - mpl.CloudInstancesProviderImpl - Instance 'Agent Smith' is marked for termination after current build finishes
[2024-06-26 08:20:39,498] INFO [uled executor 3] - mpl.CloudInstancesProviderImpl - Agent '"Agent-Smith" {id=137}' is marked for termination after current build finishes
[2024-06-26 08:20:39,498] INFO [uled executor 3] - l.instances.StopInstanceAction - Terminating instance 'Agent Smith', profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root}, reason: "Idle timeout reached"
[2024-06-26 08:20:39,498] INFO [uled executor 3] - r.impl.DBCloudStateManagerImpl - Image: AmazonImageInstance{id=Agent Smith, amazonId=***}, profile: profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root} was marked to CONTAIN agent
[2024-06-26 08:20:39,877] INFO [9 Stop Instance] - r.impl.DBCloudStateManagerImpl - Image: Agent Smith, Instance: Agent Smith, profile=amazon-3 is marked with state: stopped.
[2024-06-26 08:20:41,165] INFO [uled executor 4] - .server.impl.CloudEventsLogger - Cloud instance entered 'scheduled to stop' state, profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root}, Agent Smith
[2024-06-26 08:20:41,597] INFO [requests pool 1] - ces.terminate.AgentRemoverImpl - Unauthorizing cloud agent "Agent-Smith" {id=137} because: Instance has been terminated. Amazon Instance{instanceId=Agent Smith, imageId=Agent Smith, amazonImageId=***, amazonInstanceId=***, status: Running}, profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root}
[2024-06-26 08:21:21,167] INFO [uled executor 5] - .server.impl.CloudEventsLogger - Cloud instance entered 'stopped' state, profile 'Root Cloud Profile'{id=amazon-3, projectId=_Root}, Agent Smith
[2024-06-26 08:21:21,168] INFO [uled executor 5] - .server.impl.CloudEventsLogger - Cloud agent Unregistered agent "Agent-Smith" {id=137} is no longer recognized as cloud agent because instance has gone: profileId=amazon-3, imageId=Agent Smith, instanceId=Agent Smith
[2024-06-26 08:21:21,168] INFO [uled executor 5] - ces.terminate.AgentRemoverImpl - Unauthorizing cloud agent Unregistered agent "Agent-Smith" {id=137} because: Cloud instance (id=Agent Smith, imageId=Agent Smith, profileId=amazon-3) has gone (is not reported by cloud profile)
----------
And when attempting a second start-up, I get the log entry at the beginning, and the agent gets stuck on “Unauthorized” state until I manually authorize it.
This seems to get “restarted” after some time passes (dunno how long though, I noticed it between one day and the next one), so the next day, the first attempt is successful (???).
Oddly enough, on the second attempt, the starting-up log describes a temporary id that's different from the authorization token. Could it be that the server is generating always a new one, and that clashes with the existing one?
Thanks a lot for the help!
Please sign in to leave a comment.
It is a known issue that will be fixed in 2024.03.3, which should be published this week. A workaround is also available: https://youtrack.jetbrains.com/issue/TW-88068#focus=Comments-27-9864960.0-0.
Best regards,
Anton
Thank you so much for your quick response Anton. I will look forward for the fix and thanks for the link to the workaround.
Kind regards.