Slow Artifact Dependency Resolution

Hi all,



I'm wondering if anybody could assist in something I've potentially set up incorrectly, as I've not been using TeamCity for long at all.
I'm using version 6.5.6 (build 18130) and am having an issue with very slow artifact dependency resolution.


Here's the log of the job:


[11:54:32]: Skip checking for changes - changes are already collected
[11:54:37]: Clearing temporary directory: C:\BuildAgent\temp\buildTmp
[11:54:37]: Checkout directory: C:\BuildAgent\work\7d9fb3b885c52763
[11:54:37]: Updating sources: server side checkout...
[11:54:33]: [Updating sources: server side checkout...] Building incremental patch for VCS root: BlitzSDK svn
[11:54:37]: [Updating sources: server side checkout...] Repository sources transferred: 14.18Kb total
[11:54:37]: [Updating sources: server side checkout...] Updating C:\BuildAgent\work\7d9fb3b885c52763
[11:54:37]: Resolving artifact dependencies (22m:16s)
[11:54:38]: [Resolving artifact dependencies] Destination directory cleaned: C:\BuildAgent\work\7d9fb3b885c52763\Bin\win32\Optimised
[12:16:54]: [Resolving artifact dependencies] 204 files retrieved for [EditorExes*.zip!**] downloading pattern
[12:16:54]: Publishing internal artifacts
[12:16:54]: [Publishing internal artifacts] Sending build.start.properties.gz file
[12:16:54]: Step 1/2: Prepare Machine For Build (Command Line)




The step this relies on artifacts it packages into a single zip file as follows:
Bin/win32/Optimised/*.* => EditorExes-%system.build.number%.zip


And then for the current job I extract the zip to the desired folder:
EditorExes*.zip!**   to   %teamcity.build.checkoutDir%/Bin/win32/Optimised




This particular example is cleaning the directory beforehand (as I suspected this was affecting the extraction time)
The zip file itself is rather small, roughly 280MB.
Also, the folder being cleaned is roughly 1GB.

I don't have any examples of the artifact dependency without a dir clean, as the few times it ran it took a long time, and wasn't able to be 'stopped' and forced a server machine restart to clean the build queue.




Is there anything obviously wrong with this setup?
Should the cleanup be required or will the zip extraction simply overwrite any files in the directory?




Thanks,
Jim.
26 comments
Comment actions Permalink

Just a quick update, removed the clean step, and it's still taking a while.

[10:16:15]: Resolving artifact dependencies (7m:33s)
[10:23:49]: [Resolving artifact dependencies] 195 files retrieved for [EditorExes*.zip!**] downloading pattern

Nowhere near as bad, but I could still perform a 'build' of my libs in that time potentially.
I can work with 8 minutes, while not ideal, it's definitely better than 22 minutes.

Thanks,

0
Comment actions Permalink

Hello Jim.

Surely, 8 minutes is far too much for 200Mb archive.

While your set up seems ok, could you please enable sever and agent debug logging and post results, relevant to artifact resolution process, here?

0
Comment actions Permalink

Hi Nikita,

Looking at another example in our build logs suggest the following, that the build was started at 11:19 and lasted 18 minutes

#101 Success View build artifacts 17 Jan 12 11:19 18m:23s None Pin



So checking the Agent logs in teamcity-build.log, I can see the first activity relating to this step was at 11:36:
[2012-01-17 11:36:16,526] out - ----------------------------------------- [ Tools::Deploy PC HAL Packages #101 (buildId = 958) ] -----------------------------------------



So it appears that the job isn't being sent to the machine until 11:36? - here's the agent log from teamcity-agent.log:
[2012-01-17 11:36:16,526]   INFO - erStages.start.CallRunnerStage - Found runner simpleRunner for 958
[2012-01-17 11:36:16,526]   INFO - erStages.start.CallRunnerStage - Start build runner for build 958 simpleRunner
[2012-01-17 11:36:16,526]   INFO - erStages.start.CallRunnerStage - ----------------------------------------- [ Tools::Deploy PC HAL Packages #101 (buildId = 958) ] -----------------------------------------
[2012-01-17 11:36:16,555]   INFO - nner2.OsProcessHandlerListener - C:\BuildAgent\work\7d9fb3b885c52763\Tests\HAL\packages\_BuildPackages.bat PC
[2012-01-17 11:37:37,796]   INFO - nner2.OsProcessHandlerListener -  
[2012-01-17 11:37:37,797]   INFO - ildStages.RunnerStagesExecutor - Call stage jetbrains.buildServer.agent.impl.buildStages.runnerStages.start.InternalErrorsChecker
[2012-01-17 11:37:37,821]   INFO - mpl$TeamCitySpringSubConatiner - Closing jetbrains.buildServer.spring.SpringSubContainerFactroyImpl$TeamCitySpringSubConatiner@1bb4302: display name [SubContext: running build runner context id=958 runner 2 of 2]; startup date [Tue Jan 17 11:36:16 GMT 2012]; parent: jetbrains.buildServer.spring.SpringSubContainerFactroyImpl$TeamCitySpringSubConatiner@4d7fd5
[2012-01-17 11:37:37,829]   INFO - ernalArtifactsProcessingLogger - Publishing artifacts process started
[2012-01-17 11:37:37,829]   INFO - ernalArtifactsProcessingLogger - Start: Searching for artifacts to publish
[2012-01-17 11:37:37,830]   INFO - ernalArtifactsProcessingLogger - Done:  Searching for artifacts to publish
[2012-01-17 11:37:37,830]   INFO - ernalArtifactsProcessingLogger - Start: Pre-processing artifacts if needed (archiving)
[2012-01-17 11:37:37,830]   INFO - ernalArtifactsProcessingLogger - Done: Pre-processing artifacts if needed (archiving)
[2012-01-17 11:37:37,830]   INFO - ernalArtifactsProcessingLogger - Publishing artifacts process started
[2012-01-17 11:37:37,830]   INFO - ernalArtifactsProcessingLogger - Start: Searching for artifacts to publish
[2012-01-17 11:37:37,830]   INFO -    jetbrains.buildServer.AGENT - Internal artifact path teamcity-info.xml not found
[2012-01-17 11:37:37,830]   INFO - ernalArtifactsProcessingLogger - Done:  Searching for artifacts to publish
[2012-01-17 11:37:37,830]   INFO - ernalArtifactsProcessingLogger - Start: Pre-processing artifacts if needed (archiving)
[2012-01-17 11:37:37,830]   INFO - ernalArtifactsProcessingLogger - Done: Pre-processing artifacts if needed (archiving)
[2012-01-17 11:37:37,830]   INFO -    jetbrains.buildServer.AGENT - Publishing artifacts process started
[2012-01-17 11:37:37,831]   INFO -    jetbrains.buildServer.AGENT - Start: Searching for artifacts to publish
[2012-01-17 11:37:37,837]   INFO -    jetbrains.buildServer.AGENT - Done:  Searching for artifacts to publish
[2012-01-17 11:37:37,837]   INFO -    jetbrains.buildServer.AGENT - Start: Pre-processing artifacts if needed (archiving)
[2012-01-17 11:37:42,806]   INFO - ernalArtifactsProcessingLogger - Start: Sending files
[2012-01-17 11:37:42,816]   INFO - ernalArtifactsProcessingLogger - Done publishing artifacts to .teamcity/properties, total files published: 1
[2012-01-17 11:37:42,816]   INFO - ernalArtifactsProcessingLogger - Done: Sending files
[2012-01-17 11:37:42,822]   INFO - ernalArtifactsProcessingLogger - Publishing artifacts process finished
[2012-01-17 11:37:43,391]   INFO -    jetbrains.buildServer.AGENT - Done: Pre-processing artifacts if needed (archiving)



Here's hopefully the relevent info from the Server machine's log, but let me know if you need anything else:
[2012-01-17 11:16:17,682]   INFO - tbrains.buildServer.ACTIVITIES - Configuration added to queue; name=Tools :: Deploy PC HAL Packages {id=bt7}; requestor=Subversion; Runtime PC :: Run HAL PC Tests; promotion id=1286
[2012-01-17 11:19:15,200]   INFO - tbrains.buildServer.ACTIVITIES - Build started; name=Tools :: Deploy PC HAL Packages, personal=false, buildId=958
[2012-01-17 11:19:15,202]   INFO - tbrains.buildServer.ACTIVITIES - Configuration removed from queue; name=Tools :: Deploy PC HAL Packages, requestor=Subversion; Runtime PC :: Run HAL PC Tests, comment=null, user=null, promotion id=1286
[2012-01-17 11:19:15,202]   INFO -   jetbrains.buildServer.SERVER - Flush queue finished, number of builds started: 1
[2012-01-17 11:19:15,228]   INFO -   jetbrains.buildServer.SERVER - Running build saved to DB: Tools :: Deploy PC HAL Packages {id=bt7} # #101 {build id=958} on agent pc623 triggered by ##vcsName='svn' buildTypeId='bt9'
[2012-01-17 11:37:38,522]   INFO - de.impl.history.DBBuildHistory - Start creating history entry for Tools :: Deploy PC HAL Packages 958, date: 1326800263000
[2012-01-17 11:37:38,523]   INFO - de.impl.history.DBBuildHistory - End creating history entry for Tools :: Deploy PC HAL Packages 958
[2012-01-17 11:37:38,527]   INFO - tbrains.buildServer.ACTIVITIES - Finished 958



Thanks for your assistance.

0
Comment actions Permalink

Having read your response more thoroughly, I've yet to enable debug logging, and am still using the default values for logs.
I'll enable the debug logging for both server and agent, and get back to you with some more useful info.

0
Comment actions Permalink

Jim,

My mistake: would you mind attaching debug logs once you have them (instead of posting)?

0
Comment actions Permalink

ok, I think I've enabled the Debug server and agent logging, but can't see anything specifically different in them (just more of them), so perhaps I've made a mistake in following the required steps somewhere.

I've attached the relevent logs for the times we're seeing the problem.
In this example, we're looking at 'Tools::Create/Update Database #26 (buildId = 1032)'

which performs the following:

[14:54:40]: Resolving artifact dependencies (13m:36s)
[15:08:16]: [Resolving artifact dependencies] 195 files retrieved for [EditorExes*.zip!**] downloading pattern


Let me know if that helps at all.
Thanks again.


Attachment(s):
teamcity-vcs.log.zip
teamcity-build.log.zip
teamcity-agent.log.zip
teamcity-activities.log.zip
0
Comment actions Permalink

Jim,

Thank you for the logs. Though they showed no clue to root cause of the issue.

Do you have slow artifact resolution for all build configurations on the server, or just for this specific one?

Can you try to download artifact to agent machine without unpacking? To do this, just limit the pattern to EditorExes*.zip
Also, to get a comparison, would you mind to try to download artifact to agent machine manually (eg., via TeamCity UI)?

0
Comment actions Permalink

The slow artifact resolution can appear on any artifact dependent config on this server.

One thing to note is that as far as I can see, it's only happening when the artifacts are created on a different machine to the one running the current step.
Eg. PC898 builds my Editor executables, then PC623 runs my world file deployment step (which needs my Editor's exes from PC898)

Downloading the zip itself through the UI as 'open' (saving to %temp%) took roughly 2-3 mins averaging at 3MB/sec, but interestingly failed at the final percent due to a connection to the server being 'reset', tried this twice and both failed.
Downloading the zip itself through the UI as 'save' (saving to 'Downloads') was pretty much instant, and completed successfully.

I've just attempted limiting to 'EditorExes*.zip' as suggested, and the same problem persists.


Perhaps this server connection reset issue is causing the problem?

Thanks for all the suggestions so far.

EDIT: Just attempted to download the zip through the UI with Firefox, and had better luck, that final percentage causing IE to give up, took Firefox an extra minute to resolve. So that's about 4 minutes in itself.

0
Comment actions Permalink

JimParker wrote:

One thing to note is that as far as I can see, it's only happening when the artifacts are created on a different machine to the one running the current step.
Eg. PC898 builds my Editor executables, then PC623 runs my world file deployment step (which needs my Editor's exes from PC898)


If you have snapshot dependencies along  with artifact dependencies, "run build on the same agent" option may be a workaround.

0
Comment actions Permalink

Thanks for that suggestion,
I'm happy to use that option, it just makes creating an 'artifact dependency' a pointless task (as the exe's will all be on that machine anyway)

As we're only using 3 agents at the moment, the level of parallelism that can be achieved is quite low in any case, so it'll be fine for now.

My only fear is a config that relies on 2-3 artifacts destined for different machines, based on 'agent requirements'.
eg. a Testing step that relies on an executable that can only be compiled on one machine, with test data that can only be deployed on another.

I'll have a go with the most problematic cases and see if I can get a working solution.

Thanks

0
Comment actions Permalink

Forcing all of the steps previously using artifacts from my EditorExes.zip to now not rely on artifacts, and run on the same machine appears to have resolved all the artifact issues.

An example being one test case that grabs artifacts from three configs and resolves them quickly enough:

[07:35:43]: Resolving artifact dependencies (25s)
[07:35:46]: [Resolving artifact dependencies] 2 files retrieved for [TestExe*.zip!**] downloading pattern
[07:36:06]: [Resolving artifact dependencies] 1207 files retrieved for [*.zip!**] downloading pattern
[07:36:08]: [Resolving artifact dependencies] 32 files retrieved for [*.zip!**] downloading pattern


None of these zips are above 100MB in size though. (roughly 19MB, 97MB and 17MB respectively)


So there's definitely something amiss with the content of the EditorExes.zip file (which is now roughly 280MB), the fact that when manually downloading the file there's a stall on the final few percent (which causes IE to lose connection with the server) feels like it's somehow related, like the file is being downloaded, but server reconnections keep taking place until we get a successful attempt, like the connection times out or something.



In any case, my build times are much more reliable now, so thanks for that.
Does the suggestion made above sound like something that could potentially be occurring?
0
Comment actions Permalink

Though the problem definetly have its root in file transfer, http transport layer does not report any failures (no socket timeout exceptions and other things)...

Is there anything specific in your network configuration? Any firewalls or caching proxies between Teamcity server and agents? Distant agents with ureliable connections?

What really makes it interesting is that limiting builds to same machine lowered transfer times that significantly...

0
Comment actions Permalink

While I changed the EditorExe reliant steps to use the same machine and no artifact dependency, I kept my other artifact dependent steps the same, so the example provided was gathering artifacts from different machines.
All of the machines are on our internal network, and so I'm unsure if there's anything specific that could be causing this issue, I have an open ticket with our IT department to look into this further.

I've studied the 3 successful builds of my 'Run PC Tests' step, and the results make little sense.

PC898 (Gathering 3 Artifact packages from PC590) - 25 secs ~ This example used in previous post
PC590 (Gathering 2 Artifact packages from PC898) - 1 min 40 secs
PC623 (Gathering 2 artifact packages from PC623) - 3 min 10 secs

I'll follow up with our IT department.

0
Comment actions Permalink

I should add that in my set up, the TeamCity server is running on PC898.
So I'm guessing from the times specified, that archives are uploaded to the server machine upon creation? (so it doesn't matter which agent built them as I wrongly suggested in my example above)

So it appears clear that there's a potential descrepency between network speeds of our machines (PC590 and PC623), but furthermore, that if 2 mins for around 100MB of data is 'normal' in my case, we should be looking at around 6-9 mins for that larger troublesome package.

In any case, I'm still following this up internally, so thank you for your help and suggestions so far.

0
Comment actions Permalink

Jim,

Thank you very much for posting back your thoughts.

Your guess is absolutely correct: artifacts are first uploaded to TeamCity server, than dowloaded back again. We are working to improve this: http://youtrack.jetbrains.net/issue/TW-15448
So for builds running on server machine (PC898) will always have shortest artifact downloading times.

Is it possible to split troublesome package EditorExes*.zip into, say, 2 archives of smaller size? If yes, could you try it and remove "run on same agent" limitation.

0
Comment actions Permalink

Jim,

Thank you very much for posting back your thoughts.

Your guess is absolutely correct: artifacts are first uploaded to TeamCity server, than dowloaded back again. We are working to improve this: http://youtrack.jetbrains.net/issue/TW-15448
So for builds running on server machine (PC898) will always have shortest artifact downloading times.

Is it possible to split troublesome package EditorExes*.zip into, say, 2 archives of smaller size? If yes, could you try it and remove "run on same agent" limitation.

0
Comment actions Permalink

Hi again,

I split up my 'EditorExe' package into 3  smaller packages (as a result of this, I managed to remove some extra  filetypes I didn't require) and ended up with 3 zip files.
EditorExes -39MB
EditorGPIs -8.5MB
EditorDLLs - 26MB

By reducing the file sizes you've helped my create a system that is much more managable with a resolution time of:

[20:00:24]: Resolving artifact dependencies (2m:48s)
[20:01:41]: [Resolving artifact dependencies] 45 files retrieved for [EditorDLLs*.zip!**] downloading pattern
[20:02:03]: [Resolving artifact dependencies] 22 files retrieved for [EditorGPIs*.zip!**] downloading pattern
[20:03:13]: [Resolving artifact dependencies] 38 files retrieved for [EditorExes*.zip!**] downloading pattern




I notice that in the Server Admin settings, I'm allowed to set the 'Maximum  build artifact file size', so I'll reduce this value to around  50-100MB, to ensure that packages aren't getting too large, and to also  highlight steps that should perhaps be selected to 'run on same agent'  instead of sending around artifacts.


I'll mark this issue as resolved, as I'm now more aware of the process, and the limitations of our current setup.
The bug you linked would probably help us with speed, but as long as I'm a bit more thoughtful with my artifact packaging, we should be safe.


Thanks again for all of your assistance.
0
Comment actions Permalink

Having tested this a bit further, I'm still having issues.

I'm interested to see whether the problem is at download time, or the time taken to populate the working copy with these downloaded files.

I have an example here:

[14:01:09]Resolving artifact dependencies (14m:36s)
[14:15:45][Resolving artifact dependencies] 118 files retrieved for [EditorDLLs*.zip!**=>C:\Project_TC EditorGPIs*.zip!**=>C:\Project_TC EditorExes*.zip!**=>C:\Project_TC EditorScripts*.zip!**=>C:\Project_TC] downloading pattern


The zips reach a grand total of 100MB.
I've downloaded these files through the webUI on the machine in question, and even using InternetExplorer, I managed to download them in less than 2 minutes (Firefox being about 14 seconds)
So if I'm to image the worst case Windows scenario, 2 of those 15 minutes are spent downloading the files, and roughly 10 seconds extracting the zips, I'm wondering where the time's going.


Could there be potential issues/slowdown overwriting the working copy if the same (older) files already exist?


I'm totally lost, but it takes that long to actually build the solution I'm wanting to collect artifacts for, so feels like I should stick with the 'run on same agent' setting, but I wouldn't be happy doing this when we get a setup with many agents.




My ideal setup is


----------------------------------------------------> Time ------------------------------------------------------------------------>

                           -> Deploy platform1 data    ->   Run Tests on Platform1

                           -> Deploy platform2 data    ->   Run Tests on Platform2

Build our toolset   -> Deploy platform3 data    ->   Run Tests on Platform3

                           -> Deploy platform4 data    ->   Run Tests on Platform4

                           -> Deploy platform5 data    ->   Run Tests on Platform5




But with this potential issue, I'll have to perform the following
----------------------------------------------------> Time ------------------------------------------------------------------------>
Build our toolset   -> Deploy platform1 data -> Deploy platform2 data -> Deploy platform3 data -> Deploy platform4 data -> Deploy platform5 data ->   Run Tests on Platform2 ->   Run Tests on Platform3 -> etc






Artifacts feel like a really big deal for concurrent configuration running, but perhaps I'm using the system incorrectly, and there's a better way.


Any help would be really appreciated.


Thanks,
0
Comment actions Permalink

Jim,

As far as I can see, you have migrated to TeamCity 7.0. May I ask you to attach new debug logs (as described in my older reply) for slow resolution ?

0
Comment actions Permalink

Hi Nikita,

That's correct, we are now using TeamCity 7.0.2.

Apologies for the delayed response, I'd set my fast turnaround builds to run on the same agent as the dependent steps and so had no logs.
I've set things back up to be artifact dependent and available to run on any agent, and replicated the problem, and have attached some zips of the log files.

The exact case is as follows.

The problem build is buildId 14722 (Frequent Builds > Runtime Debug Smoke Tests Xbox360) and the build type id is bt21.
This was build #667 (09 Jul 08:38)

This config's specific run is snapshot and artifact dependent on
- 'Frequent Builds > Deploy Smoke Packages Xbox360' - buildId 14673, buildTypeId bt20, build #727 (06 Jul 12 18:54)
- 'Frequent Builds > Runtime Debug 2010 Libs Xbox360' - buildid 14664, buildTypeId bt5, build #1898 (06 Jul 12 18:38)

And produced the following output:
[08:38:09]Resolving artifact dependencies (11m:42s)
[08:40:46][Resolving artifact dependencies] 37 files retrieved for [*.zip!**=>C:\BlitzTech_TC] downloading pattern
[08:49:52][Resolving artifact dependencies] 2 files retrieved for [TestExe*.zip!**=>C:\BlitzTech_TC] downloading pattern

Let me know if there's anything else I can supply.

Thanks again for the assistance with this.



Attachment(s):
teamcity_agent_logs_2012-07-09.zip
teamcity_server_logs_2012-07-09.zip
0
Comment actions Permalink

Jim,

Apologize for the delay.

Unfortunately, your last build agent logs didn't have debug enabled. Please, follow the instructions here to enable the debug and reproduce the issue.

0
Comment actions Permalink

Hi,

I've hopefully added some agent logs with debug information this time.
BuildID - 20154
BuildTypeID - bt16
Build #360

I've enabled debugging, and have included every file modified during the build to ensure I've not missed anything.

Thanks,



Attachment(s):
BuildAgent logs.zip
0
Comment actions Permalink

Jim,

Thank you for the logs.
I can see that the artifact download times are heavely affected by this issue: http://youtrack.jetbrains.com/issue/TW-22001

It was fixed in recent 7.0.4 release and I recommend you to upgrade your installation, if possible.

0
Comment actions Permalink

Hi Nikita,

Thanks for the reply, I'm just in the process of upgrading to 7.1, so I'll let you know the status of this post-upgrade.

Thanks,

0
Comment actions Permalink

Hi again,

I can confirm that since moving to TeamCity 7.1, artifact resolution appears to be much faster, averaging at around the 2 minute mark.
I've marked your post requesting the move to this version as the 'correct answer' to this thread.

Thanks for all of your help.
Jim

0
Comment actions Permalink

Thank you for letting us know about it. Good luck!

0

Please sign in to leave a comment.