Avoid multiple build sets in build queue - Configure VCS triggers to dump current build and rebuild.
Summary: Could we get config option on VCS triggers to stop current build when the trigger triggers again? And maybe also the option to await triggering again until the current build chain is complete?
We have a lot of projects in a snapshot dependency tree, where it is not that infrequent that projects high up in the tree changes. This seems to periodically create huge queues. I have the option of:
- Keeping a small quiet period, causing devs to quickly get their changes built when noone else have committed in a while. However, this easily cause the build queue to become many hours long when others have recently committed, because TC might start a build of everything every 5 minutes, while it uses an hour to complete a build.
- Keeping a huge quiet period, so we build everything within the quiet period, ensuring we only have one build in the build queue at the time, we get an ok worst case build time, but now the average build time is huge too.
What we want is to start builds quickly, but when we see that new commits come in, we want to dump building the already outdated code, and restart building the most recent one. Current workaround is that in most cases devs wait way too long for results of their changes, and then the queue is too long, someone has to manually go in and start cancelling builds, and since it's hard to target all but the most recent one, we typically need to manually cancel the entire queue and then manually trigger building of master again.
We don't want to have to control when devs push code nor do we want to wait unnecessary for TC to start build new changes.
It would be great if we could configure build triggers to get better behavior. A configure option to dump current build and restart if VCS trigger triggers again would be great. Another good option would be to be able to configure it to delay triggering again until the entire build chain caused by the previous trigger is complete.
Would be good if builds triggering on completion of the snapshot chain builds, would also be affected.
Please sign in to leave a comment.
Hi Hakon,
I'd like to address that your two questions in the summary go against each other. The approach for that would be that a new trigger would override previous builds and run immediately, or would wait until the next happens. If it cancels the currently running, then there is no wait for it to end because it just happened.
About builds on the queue, they should already be replaced: https://www.jetbrains.com/help/teamcity/build-queue.html#BuildQueue-BuildQueueOptimizationbyTeamCity
I'm afraid that it's not directly possible to stop running builds on the other hand. The current workaround would be to leave a short quiet period, then add a first step in the configuration that checks the running builds for instances of the same configuration and then stops them. This would make it easy to apply your own logic to the cancellation. If it would be on specific instances where you want to override, we have an "Urgent" feature request here: https://youtrack.jetbrains.com/issue/TW-15226 . Otherwise, please feel free to add your request to the tracker.
Hope this helps.
As a quick follow up, may I request to explain why this need to stop running builds once they are already started for new code? It seems like cancelling them would eventually just hide useful information about those commits and clutter the data. We struggle to find a use case for this so it's hard to consider it for adding as a feature. Having some explanation as to its usefulness would help our devs consider it.
Why need to stop running builds:
- We want to ensure HEAD of master branch is in good state and deploy HEAD to production. We don't want to wait for HEAD-2 and HEAD-1 builds to complete before TC starts building HEAD. In practical cases, we might need to wait for 2 hours for TC to complete older builds before HEAD actually gets to start building. If we have an issue in production, possibly involving downtime and customer issues, waiting 2 hours before starting to build fix is not acceptable.
We could probably solve the queuing by using a quiet period that is at least as long as building everything takes, but that means that in the average case, where TC is already idle and a dev is pushing something, he will get a report of him breaking the build 65 minutes after pushing instead of 5 minutes after. At that time, the dev may have already had 2-3 context switches, making it more tedious to fix the issue, and have created issues for many other devs that have pulled his breaking changes.
If many TC builds are scheduled for the same VCS state, they are reduced to one build per config, but as long as we trigger them by VCS changes after quiet period we seem to keep a lot.
Typical issue.
Utility project A is used by a lot of projects. Lets say build B, C & D have snapshot dependency on A and set up to trigger builds when A change.
In turn, project B, C & D, each have a build to create a deployment package to deploy somewhere, triggered when they are successfully built. (These cannot have snapshot deps on build, as to keep only wanted artifacts, we keep artifacts of snapshot deps of deploy configs)
Then we have deploy to staging environment configs that automatically deploy new deploy packages generated with snapshot deps to package creation configs.
- Developer John pushes a change to A. TC waits for quiet period, currently set to 60 seconds. A build of A, B, C & D are put in the queue. As they build, builds to create packages and deploy these projects are queued too. It will take Team City 1 hour to work through all these build configs.
- Developer Jane pushes a new change to A 5 minutes after. As she has changed A, it forces a new build of everything, as there are actually VCS changes affecting all of the projects.
- Developer Kerry pushes a new change to A 5 minutes after. As he changed A again, it forces a third build of everything.
In our expericence, TC now has several hours of backlog, and we would likely want whatever remains of John's and Jane's builds to be thrown away, so we can build the last, which will test the sum of all their changes.
Typically when this happens, devs either get unnecessarily late reports of build failures, or if something important needs to be done, I have to go into TC queue and start deleting all jobs, and then trigger a build everything altered job to schedule a new build of master. This would be far less tedious if there was a way to clear the entire queue, but even greater if I could somehow make old automatically triggered builds automatically be aborted/removed from queue when the trigger triggers again, so it happened automatically.
When I delete builds with snapshot deps, I get to remove more builds than one at a time, but as many of the builds are triggered due to completion of others, I still need to manually kill of ~50 different builds.
This wouldn't be much of a problem if one git commit ended in 1-3 TC builds, but when 1 commit triggers 100 TC builds, we need to limit what builds are actually built, because the build process takes too long.
I hope to not have to write some separate program to try and list and clear the queue through Rest API.
Possibly related.. We get server health warnings:
SomeBuild has a redundant VCS trigger which can be deleted as triggering a build on VCS changes in dependencies is already configured in dependent build configuration(s). The redundant trigger might cause extra builds in the queue.
- If project A exist and we use it, we want to build it when it changes.
- If project B is created with a snapshot dependency on A (same VCS but different checkout rules) we want to build B when A changes.
- If I understand correctly, this creates the warning above, as both A and B have VCS triggers hitting changes in A.
- To "fix" warning, we need to delete trigger in A.
- When someone removes project B, A is no longer building.
When project A exist independently of B, we don't want A's config to rely on whether B exist or not. I don't think this is the cause of the problem above, but it might add to it.
Hi Hakon,
thanks for the very detailed description. We've been discussing it internally quite in length but we don't really see a way moving forward with this on our end. The issue with your scenario is that if you have an automated chain to push to production that is triggered for every single commit that devs send, there will be scenarios where builds are never finished until the end of the day (leading to even longer delays than the current approach). This is not to say that your situation doesn't need addressing, but it seems like the current set of features might actually be enough.
By default, there is no limit to a given build configuration running multiple instances in parallel in multiple agents. This means that it should be perfectly possible for John's and Jane's builds to run in parallel without needing to wait for the rest. Of course, this means needing more agents to run simultaneously, but it would be able to give feedback to both developers and not just one. In your scenario, an error introduced by john would be reported to jane's build instead, which would eventually delay the end result even longer.
If commits can be sent every few minutes, and produce multiple builds that run for several hours, then it might make sense to review your build hierarchy. It might make sense to restrict which branches are built on time-critical builds (or which ones run through everything), to restrict the packaging builds to certain branches or to schedule trigger them rather than snapshot (and of course to trigger manually when they are needed). It doesn't seem a productive approach to generate packages on multiple different projects after every single commit, and it seems to conflict with not caring about having those packages created because you can cancel their builds if a newer commit is added.
For time-critical fixes such as those for clients in production services, it might make sense to force those builds manually and push them to the top of the queue using the action available for it, and run them on idle agents (or cancel builds on agents running non-critical builds), but this seems like it should always be a manual decision.
If devs need fast feedback on their commits, as you mentioned initially, then it seems like the lack of agent availability plus the spawning of hours-long builds with every commit would be a bigger issue rather than their builds not being automatically cancelled, which would still delay the feedback on their builds results even longer. Having a different pool of agents for the builds that they need for feedback and another one for the longer builds seems to be a better approach.
Now, with this in mind, please feel free to open a feature request in our tracker (https://youtrack.jetbrains.com/issues/TW) if you still think this feature would be needed. As mentioned, we have discussed this internally, and we don't really see a use case for this that doesn't generate a sizeable amount of negative consequences and that cannot be addressed differently with the currently available features.
About the health warnings, they seem to be unrelated, but your assessment is mostly correct. It seems like you might want to use a Finish build trigger rather than a VCS Trigger on B, as that will wait until A finishes to trigger B, rather than trigger both B and A at the same time. This said, it's usually expected that a single VCS Trigger on B would be ideal, and that if B is "removable", the same admin that removes B will ensure to move the trigger to A. But the finish build trigger should get the job done and would make A "fully independent" of B, as in you could remove B and A would continue working just fine. If this solution doesn't cut it, please open a new thread and we can discuss it there.
Hope this helps.
Thanks for the detailed response.
Adding hardware and agents could drastically reduce issue. We're not a big software house so we're trying to keep cost down. If we needed the builds we would go this way, but as of now, we're just ~15 devs and mostly we're working on distinct parts of the code, so we haven't got issues with building 4 peoples changes in the same build.
Pushing critical builds on top of queue requires constant tending to them if time is critical:
- The biggest issue is that we have many parts of the build triggering of completion of other builds. These will by default be scheduled at the bottom of the queue, and we'd manually have to push all of these individually to the top of the queue. Detecting which ones are for the critical build and which ones aren't isn't obvious and as they get scheduled during the build, one needs to sit there and monitor it.
- If we cannot use all the agents for the critical builds, then it will start using agents for later builds in the queue, and when the critical build can use that agent, it is now busy.
Due to the above reasons, when there's something critical, we typically identify it takes time, then go in and manually abort everything in the queue, and then we have to rescheduled the critical build as we have aborted the critical build too, because it's hard to identify what's what when aborting stuff. Would be nice to be able to just dump the queue fast.
As you say, if a dev pushes something into a utility lib that triggers build of everything, this will take some time anyhow, so it's only the projects close to the code change that will report errors quickly. I guess we could have a decent workaround by turning off "Trigger a build on changes in snapshot dependencies", only build local builds on changes, and have periodic trigger that schedules builds of everything that has changed, and set this period to be long enough for there to be time to build everything before it triggers again. If it was possible to configure trigger based on size of current queue, we could even use a short time period, and be able to react fast when TC is idle as we'd like to.
We have tried earlier to group builds and try to avoid group of utility builds to not trigger builds of everything else, but we need snapshot dependencies to create consistent builds for production deployments, and "Trigger a build on changes in snapshot dependencies" does not give us any control of rebuilding if dependency A or B change, but not if dependency C or D change.