Feedback: issues with windowsservercore teamcity-minimal-agent docker images

We're moving some of our tests to run in containerized agents.   We are a Windows-based shop and need to run a 32-bit application, so have tried using the windowsservercore 1709 and ltsc2016 base images Jetbrains has published.   Our custom agent image includes a 32-bit ruby install, Chrome and nodejs - which we are ultimately using to run Cucumber acceptance tests against our test web server.

We have had some fundamental issues with each one which I'd like to feed back to you (as requested):

1709 seems to run in a stable manner, but we are consistently seeing the wrong time within the container (the container os reports the correct timezone, but the time is a few hours different to the host.  As I understand it, the container time should be in sync with the host).   My feeling is this is an issue with either Docker for Windows or the Windows base image, but unfortunately it's a serious breaking issue for us being able to run tests reliably. 

ltsc2016 does not exhibit the same time offset issue, but since trying this as the base image yesterday we have seen a variety of strange, intermittent errors when trying to run tests that draw me to conclude this is too unstable to use:

  • Builds fail to start with "Cannot start build runner" - problems include "Unable to find build runner 'simpleRunner'" and "Unable to find build runner 'rake-runner'"
  • Having recreated the containers we had managed to get some tests passing, but then get what seem to be network timeouts/outages:  Ruby reports Net::ReadTimeout.  Another failed test run resulted in a large number of tests failing with "Selenium::WebDriver::Error::NoSuchWindowError: no such window: target window already closed"
  • One build started normally but then spontaneously stopped - the log show "Canceled with comment: Build and agent have finished unexpectedly or were killed. Please check agent logs for details."  - there is nothing I can see in the agent logs that would indicate the cause.

As the performance of the agent is so erratic I am reluctant to spend significant time troubleshooting the specific issues above.  We are reverting to the 1709 image and exploring workarounds for the time offset problem, in the hope that a fix will be forthcoming for this in due course.

0
3 comments
Avatar
Permanently deleted user

Having rolled back to my original 1709 image, and also trying an image built with the latest version of your 1709 image I get some of the same problems:

  • Unable to find build runner 'simpleRunner'
  • Unable to find build runner 'rake-runner'

I am starting my containers with docker compose, and am re-using the previously created config dir (using a volume mapping).   Looking at the logs for an agent instance that shows the above errors I don't see the agent upgrade log output I would expect.  I've just tried removing the config dir contents, and I then see the expected agent upgrade output in the log (although I have to re-authorize the agent).  The build runs successfully though.

0

Hi Roger,

the time sync issue is a problem with docker for windows: https://github.com/docker/for-win/issues/1288
which was later moved to https://github.com/moby/moby/issues/37283

If it's a big issue for you, I'd argue you are going to need to check up with them.

Regarding the other issues:
-Could you mention exactly when and how you get the messages?
-Could you specifically describe which steps you've taken to modify the images? As you can imagine, we have tested our images and they work fine in our tests, so maybe there is something on your setup, or environment, that is making this fail.

Also, are you using the latest images? 2018.1? Or older ones?

0
Avatar
Permanently deleted user

Hi Denis,

Thanks I'm already tracking that issue - I'm currently looking at a workaround which is to run the containers with --isolation=process (only supported on Windows Server hosts unfortunately) - it looks promising so far.  There is a useful comment on that issue that has steered me in this direction - and things seem to be more stable in general now I'm not using hyperv isolation.

I'm using the latest images yes.   I'm doing a lot of extra build steps using your minimal agent images; I'll follow up with more details if the fix I'm pursuing doesn't turn out.

0

Please sign in to leave a comment.