Long delay for initial checkout of Github repos

Hi

We have a project using a GitHub repository, with the agent in the Amazon cloud. Locally, and via the cmd line from the cloud box, a Git clone of this 400MB repos takes seconds. However running the build via TeamCity for the first time, to a clean directory, using checkout on agent, takes approximately 1 hour. We need to be able to bring new boxes up more quickly than this.

We tried another, smaller repository which has the same authentication, from within the same project - that worked correctly.

Can you please give us any pointers to why the Git clone process takes seconds via the command line but >50 mins via the agent on the same box, run by the same user. Presumably the agent runs 'git clone ...'? Help appreciated.

Thanks

Ben

24 comments

Hello Ben,
what version of git-plugin and what protocol do you use?

0

Hi - In the admin section it says: VCS support: Git 13550

We use SSH / Default Private key

Thanks

Ben

0

Ben,

I tried to reproduce your problem, but did not succeed.
In my test I used 1.3Gb repository from a local network and my ubuntu box.
Clone from a command line took ~ 7 minutes and clone using a TeamCity agent took ~ 10 minutes.
What OS is installed on your agents?
Did you try to use non-amazon agents? If so, was there such a problem on them?

0

Also check please do you have any submodules in your repository, if you do - what is your submodule checkout policy?

0

Ben,

Any update after Dmitry's questions?

For our reference, can you please attach the build log (initial checkout-related lines until the build script starts) so that we can analyze the times spent in each phase?

Are you aware of any public repository which has the same issue so that we can try to reproduce it?

0

Hi dmitry,
OS on the agent is Ubuntu 10.04LTS

I have just today tried the non-amazon agent and it still took 50Mins (which was a little odd.) The repo itself is about 500MB and the memory setting of the box is not changed from the default 384MB, but the important point to note on that is that it didn't reach OOM error (which is what i would expect if the entire repo had to be checked into memory first - thats what i've read from post else where at least) I was starting to think that possibly could it be that Swap space was being used on the box instead of memory maybe (i'm just clutching at straws on this really...)

So yes, now i've tried an Amazon instance and a local server inside our current network and on both it took 50Mins approx to download the whole GIT repsoitory.

Thanks,
Mark

PS - I work with Ben

0

No submodules to my knowledge, i'm not even sure of what they are to be honest but I assume as I don't know what they are related to then its un-likely we're using them? :-)

The submodules is set in the dropdown to be 'checkout'

0

Hi Yegor,

Thanks for your contact, I do appreciate it.

As requested, information from a build log...

[11:42:14]: Checking for changes
[11:41:41]: Clearing temporary directory: /usr/local/teamcity/temp/buildTmp
[11:41:41]: Checkout directory: /mnt/TeamCity/hotel-service
[11:41:41]: Updating sources: Agent side checkout... (47m:29s)
[11:41:41]: [Updating sources: Agent side checkout...] Will perform clean checkout. Reason: Checkout directory is empty or doesn't exist
[11:41:41]: [Updating sources: Agent side checkout...] Cleaning /mnt/TeamCity/hotel-service
[11:41:41]: [Updating sources: Agent side checkout...] VCS Root: hotel-service (47m:29s)
[11:41:41]: [VCS Root: hotel-service] revision: 9c33073345c04f0472359eb8cadbe7109d4959fa@12c504eb038
[11:41:41]: [VCS Root: hotel-service] Cleaning /mnt/TeamCity/hotel-service
[11:41:41]: [VCS Root: hotel-service] The .git directory is missing in '/mnt/TeamCity/hotel-service'. Running 'git init'...
[11:41:42]: [VCS Root: hotel-service] Fetching data for 'hotel-service'...
[12:28:29]: [VCS Root: hotel-service] Fetched  revisions up to 9c33073345c04f0472359eb8cadbe7109d4959fa Matt #Project:  Uploading new version of the hotel service project.
[12:28:29]: [VCS Root: hotel-service] Checking  out branch master in hotel-service in /mnt/TeamCity/hotel-service with  revision 9c33073345c04f0472359eb8cadbe7109d4959fa Matt #Project:  Uploading new version of the hotel service project.
[12:29:09]: [VCS Root: hotel-service] Cleaning hotel-service in /mnt/TeamCity/hotel-service the file set ALL_UNTRACKED
[12:29:11]: Start process: /mnt/TeamCity/hotel-service/megatron/trunk/run
[12:29:11]: in: /mnt/TeamCity/hotel-service/megatron/trunk


Well, we are using github.com and not an internally hosted GIT server, but i'm affraid the data isn't going to be public facing to the world anyway. Exactly the same thing happens on the cloud instance and internally to our local organisation.


Things to Note:-
TC server version - 5.1.4

Server TC Agent OS - Ubuntu 10.04 LTS fully patched
$ Java -version (of TC agent)
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) Client VM (build 17.1-b03, mixed mode, sharing)
"Teamcity Server <-----> TC agent" Communication is carried out  via an openVPN Point2Point configuration (but as the agent is doing the  work this doesn't really come into equation really anyway?)
We want the agent to be doing the work, that's why we've set-up the build configuration so the agent does the checkouts etc


So tried the check-out yet again, used linux command line processes such  as tcpdump to see if packets were transferring between the ssh  connection I could see the TC agent had with github.com,  trying to attach to different processes with strace and looking at what  was occuring, tried running some commands manually on the TC agent that  were being run to try to see and understand what it was exactly that  was happening. Also run the 'git clone [REPO]' command to make sure everything was ok with github.com and that we could download everything required of the repo and in  reasonable time. This reasonable time was a completely from fresh  checkout only took 70seconds to retrieve the entire 400MB. Also changing  the way the process was running so running the same checkout but to the  local TC server (also took a very short amount of time), then tried  checkout to the TC server and letting the server push the files (diff's/patches) upto the  agent in the cloud, we timed this to take approx 10mins (and with the  checkout running on the server you get nice messages in the log every 2  mins saying how much has been checked out so far so you know things are  still moving forward). Whilst all this testing was going on I had left  the job running that i'd started earlier to see if it would ever  complete the checkout on the agent. After 50 minutes the checkout had  actually completed to my astonishment.

Seeing as the 'git clone [repo]' took 70secs  to download the 400MB project from fresh when run on the server hosting  the TC agent in the cloud, and the chcekout to TC server with it then  proxying the data up to the TC cloud agent took 10 Minutes, I was a  little amazed to see a checkout that the TC agent was performing to take  50minutes. I then started to think about Java and possible memory  settings being different and that these settings if set really low might  be causing a bottle neck on the checkout, but the settings for memory  on the TC agent were the same of those on the TC server Xmx384m (i'm  pretty sure that was the numbers, no-one has changed these from their  defaults anyway)


Hope all that makes sense, Thanks and Kindest Regards,
Mark
0

Mark, today I created a big repo (> 1Gb) on github and tested
clone through ssh on amazon agent running ubuntu. In my case,
both clone from command line and clone done by TeamCity agent
took ~ 6 minutes.

From your logs I see that a working directory of an agent is set
up to /mnt/TeamCity. Did you try to do clone from command line in
this directory? Might it be the mounted disk that slows
everything down?

Also please check that when you do clone from command line you
are really connected to github.

0
Hi Dmitry,


Thanks for getting back to me. Its interesting what you have pointed out and I also have some interesting results for you on that same basis as well...
Below three different command line based git checkouts and then the log of another checkout carried out onto the same server into a directory thats not /mnt


Yes - it does seem that the /mnt partition is very slow in Amazon EC2 compared to the standard provided 10GB root partition as the below shows. However i've also previous tried this same checkout onto a non-ec2 instance and it still took 50mins and you'll see the logs below on the same ec2 instance i can git command checkout to "/home/ubuntu/hotel-service.git/" which takes no time at all and i can TeamCity agent checkout to "/home/ubuntu/TeamCity/hotel-service" you'll notice the great difference in time. But yes, its very interesting to see just how slow the Ec2 default provided /mnt partition is!



/tmp$ date ;git clone --bare git@github.com:***/hotel-service.git;date
Thu Nov 18 12:13:58 UTC 2010
Initialized empty Git repository in /tmp/hotel-service.git/
remote: Counting objects: 5411, done.
remote: Compressing objects: 100% (3684/3684), done.
remote: Total 5411 (delta 2320), reused 4521 (delta 1492)
Receiving objects: 100% (5411/5411), 483.30 MiB | 5.37 MiB/s, done.
Resolving deltas: 100% (2320/2320), done.
Thu Nov 18 12:14:57 UTC 2010


~$ date ;git clone --bare git@github.com:***/hotel-service.git;date
Thu Nov 18 12:16:27 UTC 2010
Initialized empty Git repository in /home/ubuntu/hotel-service.git/
remote: Counting objects: 5411, done.
remote: Compressing objects: 100% (3684/3684), done.
remote: Total 5411 (delta 2320), reused 4521 (delta 1492)
Receiving objects: 100% (5411/5411), 483.30 MiB | 3.11 MiB/s, done.
Resolving deltas: 100% (2320/2320), done.
Thu Nov 18 12:17:33 UTC 2010



/mnt/TeamCity# date ;git clone --bare git@github.com:***/hotel-service.git;date
Thu Nov 18 12:11:54 UTC 2010
Initialized empty Git repository in /mnt/TeamCity/hotel-service.git/
remote: Counting objects: 5411, done.
remote: Compressing objects: 100% (3684/3684), done.
remote: Total 5411 (delta 2320), reused 4521 (delta 1492)
Receiving objects: 100% (5411/5411), 483.30 MiB | 336 KiB/s, done.
Resolving deltas: 100% (2320/2320), done.
Thu Nov 18 12:40:30 UTC 2010



[12:22:57]: Checking for changes
[12:22:22]: Clearing temporary directory: /usr/local/teamcity/temp/buildTmp
[12:22:22]: Checkout directory: /home/ubuntu/TeamCity/hotel-service
[12:22:22]: Updating sources: Agent side checkout... (46m:59s)
[12:22:22]: [Updating sources: Agent side checkout...] Will perform clean checkout. Reason: Checkout directory is empty or doesn't exist
[12:22:22]: [Updating sources: Agent side checkout...] Cleaning /home/ubuntu/TeamCity/hotel-service
[12:22:22]: [Updating sources: Agent side checkout...] VCS Root: hotel-service (46m:59s)
[12:22:22]: [VCS Root: hotel-service] revision: 9c33073345c04f0472359eb8cadbe7109d4959fa@12c504eb038
[12:22:22]: [VCS Root: hotel-service] Cleaning /home/ubuntu/TeamCity/hotel-service
[12:22:22]: [VCS Root: hotel-service] The .git directory is missing in '/home/ubuntu/TeamCity/hotel-service'. Running 'git init'...
[12:22:23]: [VCS Root: hotel-service] Fetching data for 'hotel-service'...
[13:09:00]: [VCS Root: hotel-service] Fetched  revisions up to 9c33073345c04f0472359eb8cadbe7109d4959fa Matt #Project:  Uploading new version of the hotel service project.
[13:09:00]: [VCS Root: hotel-service] Checking  out branch master in hotel-service in  /home/ubuntu/TeamCity/hotel-service with revision  9c33073345c04f0472359eb8cadbe7109d4959fa Matt #Project: Uploading new  version of the hotel service project.
[13:09:19]: [VCS Root: hotel-service] Cleaning hotel-service in /home/ubuntu/TeamCity/hotel-service the file set ALL_UNTRACKED


Hope this helps and makes some sense,
Thanks and Regards,
Mark
0

Mark, thank you for investigation.
Could you please clarify: is TeamCity agent, which log you provided, ran on the same EC2 agent or it is your local machine?

0

On this occasion all logs and tests were carried out on the same instance that was running the TC agent, which is on an EC2 instance. However I have the same 50min checkout time on a server thats internally hosted on our own local network.

Thanks

0

Mark, check please that directory /home/ubuntu/TeamCity is not mapped onto /mnt/TeamCity. Just in case..

0

Hi Dmitry,
I can confirm this, one I hadn't created that directory. I'd let the TC agent create the directory I think or i did it myself, but for sure its not pointing at /mnt... and the other thing to point out is the manual git clone command pulls data out into that exact same directory, so i would have noticed the same massive slow down the /mnt directory has if they were linked in some form.

Cheers,
Mark

0

Mark, I couldn't reproduce it so far..
You did use /home/ubuntu/hotel-service.git for native git and /home/ubuntu/TeamCity/hotel-service for TeamCity.
For the fair test, please try make clone using TeamCity into /home/ubuntu/hotel-service.git.

0

Hi Dmitry,

Sorry my bad, i thought I had used the same folder, though it shouldn't have made any difference. I have deleted the hotel-service.git directory of course before doing the checkout because that would not be the test I want.. I guess the below shows the clocks are slightly out on server to agent, but I can't see that having any impact? Still we can see, the same slow checkout occurred.

[09:53:07]: Checking for changes
[09:52:31]: Clearing temporary directory: /usr/local/teamcity/temp/buildTmp
[09:52:31]: Checkout directory: /home/ubuntu/hotel-service.git
[09:52:31]: Updating sources: Agent side checkout... (47m:14s)
[09:52:31]: [Updating sources: Agent side checkout...] Will perform clean checkout. Reason: Checkout directory is empty or doesn't exist
[09:52:31]: [Updating sources: Agent side checkout...] Cleaning /home/ubuntu/hotel-service.git
[09:52:31]: [Updating sources: Agent side checkout...] VCS Root: hotel-service (47m:14s)
[09:52:31]: [VCS Root: hotel-service] revision: 9c33073345c04f0472359eb8cadbe7109d4959fa@12c504eb038
[09:52:32]: [VCS Root: hotel-service] Cleaning /home/ubuntu/hotel-service.git
[09:52:32]: [VCS Root: hotel-service] The .git directory is missing in '/home/ubuntu/hotel-service.git'. Running 'git init'...
[09:52:32]: [VCS Root: hotel-service] Fetching data for 'hotel-service'...
[10:39:22]: [VCS Root: hotel-service] Fetched  revisions up to 9c33073345c04f0472359eb8cadbe7109d4959fa Matt #Project:  Uploading new version of the hotel service project.
[10:39:23]: [VCS Root: hotel-service] Checking  out branch master in hotel-service in /home/ubuntu/hotel-service.git  with revision 9c33073345c04f0472359eb8cadbe7109d4959fa Matt #Project:  Uploading new version of the hotel service project.
[10:39:44]: [VCS Root: hotel-service] Cleaning hotel-service in /home/ubuntu/hotel-service.git the file set ALL_UNTRACKED


Cheers,
Mark
0

Hello Mark,
I still cannot reproduce your problem.
What if we add option for you to run shallow clone on agents until we have a better solution?
It will work if you do not need entire repository history during the build.
It will save about half of the repository size, so clone should be twice as faster.
What do you think?

0

Hi Dmitry,

Is there really nothing that can be done. There must be a problem some where, mis-configuration or something? I don't understand why its working perfectly for you, and yet regardless if I try a Cloud machine running an TC agent vs a machine in house running a TC agent, the times are very silimarly long.

Is there no debugging that can be turned on? Extra Logging, settings to tweek?

We don't really want to go down the route of creating scripts etc, and shallow check-out although it may work and make a difference, its still possibly going to be slower than it should be. It something I will try however, if there's some guidance/notes you could provide for me to set that up, it would be great. We don't really need all the histories of the project in the repository (I don't think), but in any case I can try this for now just to see if it makes a difference.

Thanks,
Mark

0

Mark, when we clone on agent we use a java implementation of ssh.
Let's check if it can cause the slowdown.
Could you please do the following on the fast amazon disk:

mkdir repo;
cd repo;
git init;
git remote add origin git@github.com:<your account on github>/hotel-service.git;
date; env GIT_SSH=/path/to/git-ssh.sh git fetch origin '+refs/heads/master:refs/remotes/origin/master'; date


The script git-ssh.sh is attached. You should set variable 'agent_home_dir' to actual value of your agent home dir.



Attachment(s):
git-ssh.sh
0

HI Dmitry,

Thanks for getting back to me, my appologis for any forcefullness on my part however i'm really trying to get everything sorted out with this.

But early indications show this is exactly the problem here!

I've not waited the time thats going to be needed to see if the checkout is going to take a long time or not, I can see its going to take ages right now...

Transfer speed when running those commands you've given me is VERY slow, compared to a normal 'git clone' without Java SSH involved which shows the tranfer speeds as much much higher (into the several MB/s) region

Currently doing this is showing me transfer speed to be at 174 KiB/s (Receiving objects:  10% (610/5561), 62.07 MiB | 174 KiB/s ), much much slower, which probably explains the reasons why its taking so long (almost certainly).

Normal git clone without Java involved running at the same time this checkout above is still running shows this speed:-
Receiving objects:  35% (1996/5561), 173.81 MiB | 11.65 MiB/s

Do you have ideas in which case on how to combat this problem?

Many Thanks and Regards,
Mark

PPS - Just Finished - Takes roughly same amount of time that we see to be a problem.
Thu Nov 25 08:16:30 UTC 2010
remote: Counting objects: 5561, done.
remote: Compressing objects: 100% (3818/3818), done.
remote: Total 5561 (delta 2421), reused 4542 (delta 1497)
Receiving objects: 100% (5561/5561), 490.25 MiB | 195 KiB/s, done.
Resolving deltas: 100% (2421/2421), done.
From github.com:holidayextras/hotel-service
* [new branch]      master     -> origin/master
Thu Nov 25 09:04:30 UTC 2010

0

Mark, I created an issue you can watch/vote for: http://youtrack.jetbrains.net/issue/TW-14598.
As a workaround for the problem I created build of git-plugin which uses a native ssh, it is attached to the issue.
Since plugin uses a native ssh you have to configure it properly.
In particular, you should setup ssh to automatically add new host keys to the $HOME/.ssh/known_hosts.
You can do it by putting something like this in your $HOME/.ssh/config:

Host github.com
User git
StrictHostKeyChecking no

Do not forget to save it in your amazon image.
Please let me know how it works for you.

0

Hi Dmitry,

Sorry for not getting back to you sooner, i've only just had the time to sort this out.

Great news though, it works perfectly and at a great speed. Now takes 1min 30secs instead of the 50mins. fantastic!!! I'm really pleased - thanks very much!

Couple of follow up quetions...
-> Why would others have not come across the same problem?
-> How comes you weren't able to reproduce? Was it that you were using http rather than SSH?
-> Why would this not be the method to use by default if Java ssh layer causes problems like this?

For anyone else - this site is good to know so you can see what happens during agents update and where to place plugins that you want to use etc.
http://confluence.jetbrains.net/display/TCD5/Plugins+Packaging

But really really fantastic, thanks again.
Mark

0

Hello Mark,

I'm happy to hear it works.

-> Why would others have not come across the same problem?

I think the main reason is that there are not so much people who have big repositories like yours and use git checkout on agent over ssh.

Another reason is a quality of network connection on an amazon agent.
When I tested checkout on a local agent, it was only twice slower than a native git clone, something like 100KiB/s vs 250KiB/s.
And on amazon agent clone using java SSH is still ~ 100KiB/s, but native git is 10MiB/s - so you can notice the difference.

-> How comes you weren't able to reproduce? Was it that you were using http rather than SSH?

This is still a mystery for me, but I'm sure we did use the SSH protocol.
We will investigate it futher.

-> Why would this not be the method to use by default if Java ssh layer causes problems like this?

Yes we are going to make it default method if it will not be possible to improve java implementation.

Thank you for you help!

0

Hi Dmitry,

Just a simple note to say thanks for answering my follow up questions. All makes sense and all is great.

Thanks again for helping to fully resolve the problems,
All the Best,
Mark

0

Please sign in to leave a comment.