vSphere VMs are killed while booting
Hey!
we are evaluating TeamCity with the vSphere Cloud Plugin using Windows7 VMs for our Builds. Currently, three Win7 Agents are configured in the Cloud Profile using the start/stop setting and "Terminate Instance after Build finished" tagged. The VMware vSphere Version is 5.5.0
Using TC10.0.4, we observed https://youtrack.jetbrains.com/issue/TW-48904 and thus updated to 10.0.5.
Using 10.0.5, our Setup completely stopped working. The first build triggered is running fine, but every subsequent request ends with the following symptoms:
TC requests that a VM agent is started
TC immediatly(!) trys to shutdown the VM
TC kills the VM
TC requests that a VM is started
---endless loop
I have attached a screenshot of the vSphere Log. TeamCity requests a VM to start and kills it immediatley every few microseconds. The VM starts booting but well, microseconds wont bring you even near a windows desktop ;).

Since one error message states that VMWare Tools are missing, we made sure to have the correct Version installed, but the problem was not solved (i assume the tools are only needed for a graceful shutdown and the VM is terminated if not available). Reverting back to Teamcity 10.0.4 made the setup work again with the occassional canceled build (see https://youtrack.jetbrains.com/issue/TW-48904). We are not sure how to proceed from here, since the cancelled build behaviour is not desireable and the fix in 10.0.5 seems to have broken the plugin for us.
Below is a short snippet of the teamcity-clouds.log file, its essentially repeating forever. If you need more Information, please ask for it since we are new to TeamCity. Any help is really appreciated!
[2017-03-13 13:42:46,471] INFO [re'{id=vmw-1} 1] - uds.base.AbstractCloudInstance - Changing win7-vs2013-agent-1(9ee81b) status from Error to Stopped
[2017-03-13 13:42:48,358] DEBUG [8 Flush Queue 1] - .server.impl.CloudEventsLogger - BeforeStart instance: profile 'VMWare vSphere'{id=vmw-1}, VmwareCloudImage{myName='win7-vs2013-agent-1'}
[2017-03-13 13:42:48,358] INFO [8 Flush Queue 1] - .instances.StartInstanceAction - Starting cloud instance: profile 'VMWare vSphere'{id=vmw-1}, VmwareCloudImage{myName='win7-vs2013-agent-1'}, hash=zUx5rpG5DtiP6ZjJI3qbGkpDA6J28nz4, reason=Automatic start: Compatible with queued build Projects :: EzrtIdl/trunk :: vs2013 x86 debug(promotionId=541), CloudInstanceUserData{myAgentName=<none>, myAuthToken='', myServerAddress='http://ip1195.iis.fhg.de:8081', myProfileDescription='profile 'VMWare vSphere'{id=vmw-1}', myIdleTimeout=<disabled>, myCustomAgentConfigurationParameters={system.cloud.profile_id=vmw-1, teamcity.cloud.agent.terminate.after.build=true, teamcity.cloud.instance.hash=zUx5rpG5DtiP6ZjJI3qbGkpDA6J28nz4}}
[2017-03-13 13:42:48,358] INFO [8 Flush Queue 1] - clouds.vmware.VmwareCloudImage - Won't create a new instance - using original
[2017-03-13 13:42:48,358] INFO [8 Flush Queue 1] - uds.base.AbstractCloudInstance - Changing win7-vs2013-agent-1(9ee81b) status from Stopped to Scheduled to start
[2017-03-13 13:42:48,358] DEBUG [re'{id=vmw-1} 1] - nnector.CloudAsyncTaskExecutor - Starting Preparing to start new instance...
[2017-03-13 13:42:48,358] INFO [8 Flush Queue 1] - .server.impl.CloudEventsLogger - Cloud instance start succeeded: profile 'VMWare vSphere'{id=vmw-1}, VmwareCloudInstance{myInstanceName='win7-vs2013-agent-1'myState='SCHEDULED_TO_START'myStatusUpdateTime='Mon Mar 13 13:42:48 CET 2017'}
[2017-03-13 13:42:48,358] INFO [re'{id=vmw-1} 1] - uds.base.AbstractCloudInstance - Changing win7-vs2013-agent-1(9ee81b) status from Scheduled to start to Starting
[2017-03-13 13:42:48,389] DEBUG [re'{id=vmw-1} 1] - nnector.CloudAsyncTaskExecutor - Finished Preparing to start new instance...
[2017-03-13 13:42:53,210] INFO [re'{id=vmw-1} 1] - uds.base.AbstractCloudInstance - Changing win7-vs2013-agent-1(9ee81b) status from Starting to Running
[2017-03-13 13:42:53,210] INFO [re'{id=vmw-1} 1] - clouds.vmware.VmwareCloudImage - Instance started successfully
[2017-03-13 13:42:57,079] INFO [uled executor 4] - .server.impl.CloudEventsLogger - Cloud instance entered 'running' state, profile 'VMWare vSphere'{id=vmw-1}, VmwareCloudInstance{myInstanceName='win7-vs2013-agent-1'myState='RUNNING'myStatusUpdateTime='Mon Mar 13 13:42:53 CET 2017'}
[2017-03-13 13:42:57,079] WARN [uled executor 4] - ateInstancesCollectionTaskImpl - Terminating expired instance 'win7-vs2013-agent-1' as there's no agent
[2017-03-13 13:42:57,079] INFO [uled executor 4] - l.instances.StopInstanceAction - Terminating instance: VmwareCloudInstance{myInstanceName='win7-vs2013-agent-1'myState='RUNNING'myStatusUpdateTime='Mon Mar 13 13:42:53 CET 2017'}
, profile 'VMWare vSphere'{id=vmw-1}, reason: kill agent, when idle (current build finishes, if any)
[2017-03-13 13:42:57,079] INFO [uled executor 4] - r.impl.DBCloudStateManagerImpl - Image: VmwareCloudImage{myName='win7-vs2013-agent-1'}, profile: profile 'VMWare vSphere'{id=vmw-1} was marked to CONTAIN agent
[2017-03-13 13:42:57,079] DEBUG [uled executor 4] - .server.impl.CloudEventsLogger - CloudInstance terminate command called: profile 'VMWare vSphere'{id=vmw-1}, VmwareCloudInstance{myInstanceName='win7-vs2013-agent-1'myState='RUNNING'myStatusUpdateTime='Mon Mar 13 13:42:53 CET 2017'}
[2017-03-13 13:42:57,079] INFO [uled executor 4] - clouds.vmware.VmwareCloudImage - Stopping instance win7-vs2013-agent-1
[2017-03-13 13:42:57,079] INFO [uled executor 4] - uds.base.AbstractCloudInstance - Changing win7-vs2013-agent-1(9ee81b) status from Running to Scheduled to stop
[2017-03-13 13:42:57,079] INFO [uled executor 4] - uds.base.AbstractCloudInstance - Changing win7-vs2013-agent-1(9ee81b) status from Scheduled to stop to Stopping
[2017-03-13 13:42:57,110] WARN [uled executor 4] - nnector.VMWareApiConnectorImpl - Guest tools not installed or unavailable for 'win7-vs2013-agent-1'
[2017-03-13 13:42:57,110] INFO [uled executor 4] - nnector.VMWareApiConnectorImpl - Will attempt to force shutdown due to error: com.vmware.vim25.ToolsUnavailable
[2017-03-13 13:42:57,125] INFO [uled executor 4] - .server.impl.CloudEventsLogger - Cloud instance entered 'stopping' state, profile 'VMWare vSphere'{id=vmw-1}, VmwareCloudInstance{myInstanceName='win7-vs2013-agent-1'myState='STOPPING'myStatusUpdateTime='Mon Mar 13 13:42:57 CET 2017'}
[2017-03-13 13:42:59,840] INFO [re'{id=vmw-1} 1] - uds.base.AbstractCloudInstance - Changing win7-vs2013-agent-1(9ee81b) status from Stopping to Stopped
[2017-03-13 13:43:02,133] INFO [uled executor 2] - .server.impl.CloudEventsLogger - Cloud instance entered 'stopped' state, profile 'VMWare vSphere'{id=vmw-1}, VmwareCloudInstance{myInstanceName='win7-vs2013-agent-1'myState='STOPPED'myStatusUpdateTime='Mon Mar 13 13:42:59 CET 2017'}
[2017-03-13 13:43:02,133] INFO [uled executor 2] - .server.impl.CloudEventsLogger - Cloud instance entered 'stopped' state, profile 'VMWare vSphere'{id=vmw-1}, VmwareCloudInstance{myInstanceName='win7-vs2013-agent-1'myState='STOPPED'myStatusUpdateTime='Mon Mar 13 13:42:59 CET 2017'}
[2017-03-13 13:43:03,412] DEBUG [3 Flush Queue 1] - .server.impl.CloudEventsLogger - BeforeStart instance: profile 'VMWare vSphere'{id=vmw-1}, VmwareCloudImage{myName='win7-vs2013-agent-1'}
[2017-03-13 13:43:03,412] INFO [3 Flush Queue 1] - .instances.StartInstanceAction - Starting cloud instance: profile 'VMWare vSphere'{id=vmw-1}, VmwareCloudImage{myName='win7-vs2013-agent-1'}, hash=1c4QrOR1NGnwUeja5UIrAbBS9NQs0xzX, reason=Automatic start: Compatible with queued build Projects :: EzrtIdl/trunk :: vs2013 x86 debug(promotionId=541), CloudInstanceUserData{myAgentName=<none>, myAuthToken='', myServerAddress='http://ip1195.iis.fhg.de:8081', myProfileDescription='profile 'VMWare vSphere'{id=vmw-1}', myIdleTimeout=<disabled>, myCustomAgentConfigurationParameters={system.cloud.profile_id=vmw-1, teamcity.cloud.agent.terminate.after.build=true, teamcity.cloud.instance.hash=1c4QrOR1NGnwUeja5UIrAbBS9NQs0xzX}}
[2017-03-13 13:43:03,412] INFO [3 Flush Queue 1] - clouds.vmware.VmwareCloudImage - Won't create a new instance - using original
[2017-03-13 13:43:03,412] INFO [3 Flush Queue 1] - uds.base.AbstractCloudInstance - Changing win7-vs2013-agent-1(9ee81b) status from Stopped to Scheduled to start
[2017-03-13 13:43:03,412] DEBUG [re'{id=vmw-1} 1] - nnector.CloudAsyncTaskExecutor - Starting Preparing to start new instance...
[2017-03-13 13:43:03,412] INFO [3 Flush Queue 1] - .instances.StoppedInstanceTask - Instance has changed status from stopped to Scheduled to start: VmwareCloudInstance{myInstanceName='win7-vs2013-agent-1'myState='SCHEDULED_TO_START'myStatusUpdateTime='Mon Mar 13 13:43:03 CET 2017'}
, profile 'VMWare vSphere'{id=vmw-1}
Thanks!
Please sign in to leave a comment.
Hi, thank you for the report! Apparently, it's https://youtrack.jetbrains.com/issue/TW-49206. I'm going to fix it and will provide a build with fix.
Hi Sergey,
thank you for your quick response! We've encountered an additional problem after we reverted back to 10.0.4. Im not quite sure how to describe this in short and if this is correlated with the error in 10.0.5, so im just adding it to this thread.
With 10.0.4 - the cloud setup described above (vSphere 5.5, 3 win7 VMs, terminate instance after first build tagged) stops working after a short while.
In the TeamCity WebInterface, 1 agent is shown as online and idle, 2 agents are shown as offline. The build queue is filled but no build is triggered. The VM that is online and idle shows the error icon in the agents tab, but theres no error text.
The build history for the running idle agent that shows the error icon has a canceled build as last entry
in the log files, theres alot of rumble about license problems, but for evaluation purposes we are currently using a maximum of 3 agents with 18 configurations. i have uploaded a set of teamcity-clouds logs.
http://filebin.ca/3Gg42sC2bxD8/teamcity-clouds-log.7z
just hitting the "save button" in the cloud profile, without doing any changes, has at least once made the process work again until it stopped again in the same error state, so currently i can reproduce this behaviour any time.
thanks for your effort!