Artifacts as cache with predefined behaviour on cache misses

Permanently deleted user

Created June 18, 2013 11:01

Hiya,

I'm running a fairly sizeable teamcity operation. Some 20 agents, 140+ projects and nearly 1000 build configurations. We build mainly windows stuff, but also linux and mac. With the recent introduction of git, the load on the teamcity server and agents increased, as the developers start to push and build their feature branches. All that is good and well.

The build system for windows is split up to allow effective usage of agents. In other words, one configuration builds the debug binaries and exports them as artifacts, to be followed by let's say fxcop, unit tests and running moma. Each subsequent step imports what it needs from its upstream. We are shifting from push-based to pull-based build chains.

To conserve disk space I initially set the global retention policy to keep the last 5 builds. However, with all those builds and branches 5 was too low. There could easily be 10 active branches in the larger projects. Thus, I increased the retention to 15 builds, which immediately forced me to hand a few 100 GB worth of disk to the server. While disk in general is cheap, the SAN mounted disks my IT dept gets is significantly pricier than your average USB mounted 3TB disk for home use.

The problem is that when a configuration has been cleaned and a downstream configuration wants to import the artifacts, I get an error (Something along the lines of 'failed to download artifacts'). The history is still available (I keep that for a year), but the artifacts are gone. This makes it very hard to reproduce an old build. Not totally impossible, as I could presumably create a "dummy" branch in git from whatever historical commit and push something to that branch. However, that is not really what I want to do.

My desired way of working would be to see the artifacts as a cache. If there is a cache miss (either due to the fact that the upstream configuration has never been built OR that the artifacts have been cleaned out) I want TeamCity to traverse the graph upstream and build what the downstream is asking for. That would of course cascade and force the entire chain to be rebuilt if I am requesting a rebuild from a commit from last month, but it would work. Now it doesn't.

The proposed approach would also reduce the need for an overly generous retention policy, as artifacts would be recreated (by rebuilding the configuration) when needed. A _possible_ caveat is two builds may not produce bit identical output (due to time stamping of files, injection of date and time by the compiler into the binaries or reports that contain date and time). Personally I don't see that as a major problem - the benefits outweigh this risk by far imho.

If what I am requesting already exists I'd be very happy if someone could point out how to change the configuration to make it behave that way.

--Jesper Hogstrom

3 comments

Michael Kuzmin

Created June 18, 2013 17:19

Hi Jesper.

How exactly are the build configurations linked together? Do you use snapshot dependencies or finish build trigger?
Do you select these cleaned builds manually in Run custom build dialog? Why don't you use latest builds in those build configurations?

We don't restart historical builds automatically by exactly the reasons you mentioned - there are too many risks that new build run will generate different set of artifacts, so they cannot be claimed equal to the original build.

Permanently deleted user

Created June 18, 2013 22:25

Build configurations are migrating to snapshot dependencies. I would have gone that way from day one if I had properly understood the concept. Every project that I shift from perforce to git is switching to pull based builds (aka snapshot dependencies). The problem I have now is in the snapshot dependency projects.

I used to store 5 builds worth of artifacts. With all the branches that build now we can easily lose the tip artifacts of let's say the release branch. If we don't check in anything, there's no way to regenerate the installer of the latest build, as there is nothing new to build, and the upstream artifacts (typically the release-compiled binaries) are gone. A dummy commit will resolve the issue, but I think we can both agree that's kind of kludgy. I could also mash everything together and not have the fine grained control I have now, but that too seems like a step backward.

Given that I am aware of the potential md5 discrepancies between a build of the same sources made today or tomorrow, any chance you could share how I could configure TC to rebuild upstream if artifacts have been cleaned? :)

If I have something that breaks due to that I should probably fix my build scripts and make sure all signing and such is done in-stream at the right time and not checked in based on some arbitrary state of the files.

All the best,

--jesper

Michael Kuzmin

Created June 20, 2013 15:07

Each configuration has a default branch, and it's cleaned separately from other branches.
Today we released version 8.0, and there cleanup logic was improved to process each active branch separately.
As I understand your use case, IT should help to solve the issue when release branches have no latest attifacts at all.

Run custom build dialog has Rebuild all snapshot dependencies transitively option, but it forces rebuild of whole build chain and doesn't reuse suitable artifacts.
At the moment we do not have a feature you are asking for. Please add a new feature request to our issue tracker.

Please sign in to leave a comment.