Blocked build queue

Hi, expert,

I met a stange problem that TC didn't allocat build in queue to idle machines. We have nearly 100 agents and all are registed and connected. However, nearly 2/3 of them cannot be allocated builds or they are really slow to get build to run (after waiting for several hours). The builds in queue have no dependencies and displayed as "delayed". Therefore, I want to ask how and when TC decides to allocate build to agent? Is there any possible reason to block the build queue? It's too busy or something? My server version is 8.1.1.

Thanks,
Lin

5 comments
Comment actions Permalink

After looking at each agent, I found nearly all idle agents printed out this kind of error:

[2014-05-12 22:59:03,638]   WARN -       org.apache.xmlrpc.XmlRpc - Read timed o
ut
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:152)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
        at org.apache.xmlrpc.WebServer2$Connection.readLine(WebServer2.java:842)

        at org.apache.xmlrpc.WebServer2$Connection.run(WebServer2.java:718)
        at org.apache.xmlrpc.WebServer2$Runner.run(WebServer2.java:644)
        at java.lang.Thread.run(Thread.java:744)
......
Server confirmed we are still registered

Can anyone help to explain when will this happen?

Thanks

0
Comment actions Permalink

It looks like there's a problem in connection between agent and server.

Agent must be able to reach the server and server must be able to reach the agent as well. Both of the use HTTP protocol for interaction. Please double check that nothing prevents server from accessing agents.

0
Comment actions Permalink

Hi, Sergey,

Thanks for reply. OK, I checked the agent and it seems it lost connection during the build process, and reconnected again soon. However, I started a XMLRPC service in server plugin, and connect it in agent-side plugin. The communication between my custom plugin has no problem. Only the connection between server and agent lost for small time. I am sure I didn't do the network stuff in main thread. By the way, I already contacted your company by the official mail for commercial user. And Yegor already takes care of this but still didn't find the root cause here.

And another question: if server or agent lost connection with each other, why it still displayed in the "connected" page? This may need another post, however, that may also influence this problem. It's confused that when I shutdown the agent normally, the agent will not disappear from "connected" page fastly but waiting for some time (normally 5-10 minutes). And during this time, some builds are allocated to these shutdown agents, then cancelled and re-added to build queue again. I think this is annoying as the incorrect status displayed in web page will cause misunderstanding. What's more, this problem only existed in Windows agents. For Linux agents, they disconnected from TC fast.

Thanks,
Lin

0
Comment actions Permalink

The agents disappear fast, if they send "goodbye" message before shutdown. For windows, it doesn't happen, if agent is not started as a service. So, TC waits for a certain time (timeout) and if agent doesn't respond, it removes it from the list of agents.

0
Comment actions Permalink

Thanks, that helps! So it seems the problem is that I didn't run it as a service.

0

Please sign in to leave a comment.