We have been using TeamCity 7.1.2 hosted on EC2 for about a year with no problems since we set up our current configuration. We have been building our Java 1.7 project with Ant with no changes to build process or configuration for over six months. Everything has been very stable and predictable until the first build of October 24, when we started seeing these changes, very consistently, build after build:
* ~200 unit tests that check the double values produced by lengthy numerical calculations against hard-coded expected results started to fail
* Execution time for passing tests increased, sometimes dramatically: in the worst case, three of our tests went from ~45s to ~15min!
* The following sanity-check test has started to fail (although other similar tests of BigDecimal calculations still pass):
BigDecimal exponent = BigDecimal.valueOf(10);
BigDecimal result = BigDecimalMath.exp(exponent, 20);
result = result.setScale(12, BigDecimal.ROUND_DOWN);
result = BigDecimalMath.pow(result, 4);
assertEquals(Math.exp(40), result.doubleValue(), 0);
These changes are very consistent from build to build: the failing tests have consistently calculated the same values all week, and while execution time varies a bit between builds, the same tests have turned up in top 20 sorted by execution time, and the variation in run time for each of these tests has been low. We cannot reproduce these changes on our development machines. Rolling back recent changes to our source code doesn't make any difference. There have been to changes to our TeamCity, EC2, Ant, etc configurations for months!
Any ideas for isolating the problem and getting our build back to normal?