We have one portion of our build process that performs an intensive database load process that can take anywhere from 30 minutes to 2+ hours depending on which client's data we're loading for analysis. I was working on trying to offload this processing to the cloud.
I have configured an EC2 image to run these builds, however I believe that the "Terminate Instance Idle Time" setting is misleading, and I believe the sensible approach to EC2 integration is competely missing from TeamCity.
First, because of TeamCity's behavior of terminating the instance when the "idle" time expires, I have had to bump up the setting to exceed our longest anticipate load time (this is not so great for our 30 minutes loads that now will have to incur the cost of our longest load). For this particular run, I bumped the setting up to 1h:50m because I felt that should be enough time for the process to finish and I wanted to avoid a 3rd hour of EC2 charges. However, I was watching the build agent server as it made it all the way through our extended load process and then was in the stage where it was backing up the database after which is was to copy the image to an S3 bucket. Once the build exceeded the "idle" timeout period, TeamCity unceremoniously terminated the server instance right in the middle of the build! Honestly, this makes absolutely no sense to me. Why would you shut down a server on which a build is currently running because the "idle" time expired? You could control EC2 charges by using a "hanging build timeout" setting. That way I could set it to a large value (like 4h) to protect myself from excessive EC2 charges. In any case, my build agent was not "idle" by any reasonable interpretation.
Furthermore, since Amazon charges partial hours as full hours, what sense does it make even to have an "idle" timeout period? It seems to me that TeamCity knows when it fired up the server and out of the box it should optimize server up time around each 1 hour period. If it fires up an EC2 instance to run a build that finishes in 15 minutes, it shouldn't shut it down at all, even after 30 minutes of true idle time. It should leave that server up until 55 minutes have expired, and then terminate it ONLY if there is no other build running. It should never terminate a server on which a non-hanging build is currently running. If it is running a build during the expiration of a 1 hour time period, it should leave the server running until it approaches the end of the next hour (at 1h 55m, for example). Only then if the server is truly idle should the instance be terminated.
Also, I saw it mentioned elsewhere, but it would have saved me a ton of time if I could have created EC2 instances to be used as Cloud Build Agents and then told TeamCity about the instances instead of the images. This forced me down a relatively painful path of needing to createa separate persistent volume for the build agent's work/temp volumes, and to write a script that executes at startup to attach the volume (and then troubleshoot a variety of problems such as the volume not coming online at all, or coming online in Windows as "read only").