AWS protection - starting agent instances that won't start

My company recently added 20 agent licenses to our TeamCity server to bring us up to a max agent count of 120. About a week after we added the licenses, one of our developers noticed that we hadn't gone over 100 agents since we added the licenses. On researching the issue I noticed TeamCity would start a new AWS instance and then it would immediately terminate. This happened for both on-demand and spot instances. We were well under our AWS instance limit, so I took the issue to Jetbrains support.

Support was able to direct me to the right troubleshooting steps and we discovered that we were running into our AWS EBS size limit. After clearing out several left-over volumes we were able to start instances and run agents on them.

Everything is working now, but 2 weeks of non-stop instance spawning caused our monthly AWS bill to be significantly higher. I was wondering what Jetbrains would think of adding some retry logic that ceases to attempt to spawn new instances after X number of failures, or something similar. These types of silent failures are tough to catch and troubleshoot, and it turns out they can be very expensive as well.

0
2 comments

Hi Jasen,

thanks for your report and sorry for the inconveniences caused. Please open a request in our tracker here: https://youtrack.jetbrains.com/issues/TW, we will have a look at what we can do and you will be informed of progress on it.

0
Avatar
Permanently deleted user

Thanks for the recommendation, Denis. I'll open the request.

0

Please sign in to leave a comment.