Build parallel steps
Is is possible to run build steps in parallel? In my build there are several steps that could be run at the same time which would speed up the overall build time, and I wondered if there was a way to configure this in TeamCity? I was reading this article about Pipelines https://bitbucket.org/blog/speed-build-parallel-steps-pipelines and they have just added this feature in
Thanks!
Please sign in to leave a comment.
Hi Heather,
the equivalent to parallel build steps in pipelines in other CIs for TeamCity is having multiple build configurations within a build chain. All build steps are assumed that need to be run in order. If they can be taken in parallel, they should be split in build configurations and tied through dependencies.
A common scenario for an example would be:
Build B compiles, builds T1, T2 and T3 are different kinds of tests that can be done in parallel. B produces artifacts, which T1, T2 and T3 select as artifact dependencies, and T1, T2 and T3 have a snapshot dependency on B. All of them will wait for B to finish to be started, will pull the artifacts (usually the compiled project) and will run their tests in parallel.
Hope this helps.
Hi, but in this case builds waste time on checkout sources etc. Also in this case user should have the same count of agents as build configurations. Do you have plans to implement parallel build steps somehow?
Hi Erez,
"Parallel build steps" are already implemented as separate build configurations, as already mentioned. Yes, the user needs to have as many agents as build configurations they want to run in parallel, or they need to have a build runner that can run its own tasks in parallel. Some test runners do support this and it works fine.
If checking out sources in multiple agents is a problem, you can set up a custom checkout directory in the vcs settings of the build configuration, have multiple build agents in the same machine and configure all builds to use that custom checkout directory. You will need to be careful that the runners don't step on each other, as build tools often create locks (outside of teamcity's control) or write to the same temp folders which could lead to issues, but this would be the same if "parallel steps" rather than parallel builds were to be implemented.
If you have any further feedback you would like to provide, please use this issue in our tracker for it: https://youtrack.jetbrains.com/issue/TW-13849
To say this is implemented is like saying that you can sit on a table because it has 4 legs and a piece of wood to place your bottom on. Sure you can but it's not what it was designed for. Using different build configurations to achieve build parallelism is hack at best. I understand that this might work for some teams and is a good interim solution. But there are serious reasons to consider actually implementing parallel build steps.
Thanks for bringing this up, as this highlights multiple of the most common misunderstandings with TeamCity with regards to build configs and parallel tasks, so it's a great opportunity to clarify them.
-"Using different configurations for parallelism is a hack". The answer to this is that having everything in a single build configuration is like having a god object in OOP. Smaller units that provide reusable, independent functionality are superior, and this is our approach with it as well. We consider a build configuration to be the basic building block (which can be further subdivided into sequential build steps) of more complex structures, not a fully self-contained project delivery. All of our products (to my knowledge) are built using TeamCity and the vast majority of them use long, complex build chains.
-"A single build config should be a self contained unit.": Understanding a typical scenario for parallelization, and for the description you give for self contained unit, this is a bad idea. A common scenario for parallelization is to have a process like "compile", "unit test", "integration test" and "ui test", where each set of tests can be run in parallel. If a build is fully self-contained, if the integration tests fail due to a secondary service failing and not the code, you will need to repeat the full process, including compilation, because "it's a self contained unit". If you have different build configurations, you can just rerun the integrations test once you fix the problem, which will reuse the already existing compiled build. It's even clearer if you plan on integrating a deployment into the same "build configuration". If the build passes and everything works fine, but the deployment fails due to a networking issue, a problem with the remote server or anything of that sort, there is no point in recompiling and retesting everything, having a separate deploy build configuration allows you to simply reuse the same builds that already exist and deploy them. This falls back in what I said in the previous point. Having smaller, more granular builds rather than long, complex unitary ones allows for this kind of flexibility, which a self-contained unit does not. To be fair, my suggestion can also be defined as "self contained unit", just much more granular.
-"What is you have an agent that's got the computing power to run...". Agents are not machines, they are processes. It's not only possible, but very common to have multiple agents in the same machine. We even have instructions for it: https://www.jetbrains.com/help/teamcity/setting-up-and-running-additional-build-agents.html#SettingupandRunningAdditionalBuildAgents-InstallingSeveralBuildAgentsontheSameMachine . Having 3, 4 or 5 build agents on a same machine allows that one machine to run parallel builds just fine using as many resources as the build runners will want. There are possible issues when build runners do not use the temporary folders we provide them and try to use system-wide temp folders instead, as running two different builds with the same runner at the same time might conflict there, but this is a very rare scenario, as most do use the provided folders.
For more small scale parallelization, such as running multiple unit tests in parallel, TeamCity delegates each step to the build runner, which most commonly is a vendor-provided tool. Gradle builds are delegated to gradle, VS to visual studio or the command line to a shell. If the runners support parallelization of tasks, teamcity does not prevent it from working. If the runners do not support this themselves, when they specialize on running the code, it's that much harder for teamcity to jump in the middle of it and split it into more subtasks.
Now, I will grant that TeamCity does not do a good job of explaining the setup we suggest in the UI when setting up your projects. Dependencies are just listed as an option but never brought up when setting up a project. This has been a topic of long, heated discussions in the team but designing and implementing a better UI for this is not a trivial task. We hope to be able to provide a better UI for it during the rework through the experimental UI, but there is nothing concrete about it yet. If you are interested in that aspect, please vote for the issue here: https://youtrack.jetbrains.com/issue/TW-64309
To put an example, we have just introduced pipelines in our DSL: https://github.com/JetBrains/teamcity-pipelines-dsl, with full support for parallel tasks. This is internally reconverted to multiple builds with snapshot dependencies, but there is no current comparable to it in the UI that makes it clear visually.
As a last comment, I'd like to note that the team answering in the forums are for the most part support engineers. We have provided already a link to the specific issue in the tracker where the devs discuss this very specific approach to the feature. Posting here is beating a dead horse, you will gain nothing from it, just annoy support people that have no power over the decision. We can only explain what it is, why it is, and how to do it, and address technical concerns, but the design decision will not be challenged here.
I'm pretty new to TeamCity and I still have some open questions about what you wrote. In general the idea of having multiple build configs and connecting them via (artifact) dependencies is something I think I could get used to. But ...
Lets say our build produces around 100 GB of artifacts that are required for testing stages. Those stages are run in parallel. Depending on the available HW setup transferring those artifacts might or might not be a good option time wise.
From what I understood, one would opt in for having multiple agents installed on the same machine. Is there a mechanism to ensure that all agents running on the same machine are only used by one build chain? Since the build itself will consume the entire available CPU power as well as memory of those machines due to a high parallelization on the end of the build tool. Thus having other builds run on the same machine would be disadvantageous.
Or is this where the open TW-13849 steps in?
Just want to add that this seems pretty unreasonable given the pricing structure charging for build configurations, and as such makes TC a terrible choice for decent CICD for microservices and/or larger dev teams. You're suggesting 5+ per project and that's before enviroment specific deployments (yes you could make complicated parameterised builds but for example it's useful to have prod deployments separate to other environments, and that seems to be the whole point of build templates no?!). Not ignoring that this makes the overall number of build configurations required clog up the UI. It also makes tracing all the build steps that run on a single commit more difficult. It would be appreciated if the TC stance on this could be reviewed.
@Fabzo: 100 gb of artifacts is not a common use case, particularly not for the professional edition. For particular use cases such as that one, typically a more in depth analysis of the specific use case is required. Are those 100GB then published as artifacts or are they discarded at the end of the build? Are they generated fully by a previous step or are they pulled from a separate place? Are they binary or text? Are they a small amount of very large files or are they a very large amount of small files? Each combination of answers to each question leads to different suggestions. In a generic sense, you can do what you said, but it will be rarely optimal. You can set either explicit agent requirements in your build configurations, or you can set agents to only run some configurations. I also fail to see how parallel build steps would be any different than parallel build configurations in terms of resource usage. It would seem to be roughly identical as you will still have n processes running the tasks at the same time. If you want advice on that specific scenario, please open a separate thread for it.
Matt Weeks: The pricing structure charging for build configurations is only partially valid for the professional (free) edition. Once you get to the point of caring about the pricing structure, you should notice that it scales up with build agents for both editions, but with build configurations only for one. That is, it always scales with the number of parallel tasks teamcity will orchestrate on its own (it doesn't limit what the build scripts themselves can do). TeamCity is architected particularly for large teams and projects, since we use it ourselves internally for all of our products and our own needs are one of the factors when determining the direction of development. The idea of microservices is that they should be components which are independent from each other, so I don't get the concern of having them run in parallel as part of the same build. Seems like they should be built fully independently and them being on parallel or not being just a matter of you having enough resources (including licensed agents) for it.
TeamCity has a custom way for displaying this kind of scenarios in the UI, called build chains, with screenshots of our internal install with a few more than 5 in our docs: https://www.jetbrains.com/help/teamcity/build-chain.html. The UI clutter is a hard problem to go with, because even if we were to include "parallel steps", reporting for each one would have to go in its own independent section, as independent reporting is critical for many parallelizable tasks. Not to say that our approach is perfect, we keep working on it and are currently doing a large overhaul of the UI in general, including changes on parallel tasks. We do appreciate constructive feedback (even if it's negative) during the process, so if you have specific concerns about clutter, feel free to reach those to us specifically (open a new thread or send them via our tracker)
Thanks for the response, I'll have another look into build chains and see if they accomplish what I'm after more than I thought they would. Will get back to you if I have more comments.
I would agree that parallel build steps is useful. Yes, I could create multiple build configurations (which we do for many of our build chains) does accomplish this, but it is not easy to maintain for certain tasks. For example, we have a unit tests running and the more unit tests we create the longer the build takes. It would be nice if I can add a step to run different sets of Unit Tests without having to create a new configuration every time to run different E2E tests. I would like to create more than one step to split these runs within the same deploy. Also, locking the resource requires that only one build configuration can run while the "lock" is there. If I allow multiple builds for a lock, then another configuration can "sneak" in and run a deploy before the dependent tests run causing the tests to run on a different version of code. We have created powershell scripts using the teamcity web api that will control what runs and how many run in parallel, but it is tedious.
It would be helpful to be able to run the tests parallel within a single configuration for E2E tests.
I can suggest a practical example, when parallel steps can be helpful.
I have a build process, which starts from maven build, then build docker images, then run them and so on.
The docker images have base images, which are downloaded from a repository to a TC agent each time. It takes some time and can be done in advance. So it could be done with "docker pull" command while maven builds sources. It's important it should happen on the same agent.