Server/Agent disconnects, notification or build cancellation?
Completed
Is there a way internally or external for notifying when the server or agent is unavailable? Thanks to conditions that are beyond my privileges to change I'll find machines getting rebooted and agents getting disconnected. This grinds the CI to a halt and nobody knows until they log into TC to check the Queue and number of Agents.
Ideally I'd like to just cancel a build if no compatible agent is available but I can also work with just being notified that an agent itself has disconnected. For the server in of itself - getting an alert if that machine goes down would be future proofing a separate concern.
Please sign in to leave a comment.
Hi Paul,
I'm afraid that there is currently no way to have notifications about this kind of events in the classic meaning of this. We have a request to add some of them here: https://youtrack.jetbrains.com/issue/TW-5299 , with an extra one here for more general system wide events: https://youtrack.jetbrains.com/issue/TW-2795
Both are pretty old but have never had too much traction so we've just been postponing them, but there is every now and then some extra comments for them, so I'd recommend voting for them to try to push them over the edge.
With this in mind, it might be possible to use some scripts or third party tools that parse logs and send notifications about those kinds of events. You could also create your own plugin to try to hook up to some of the server events and use your own mechanisms to provide this notifications.
Also in regards to for the server itself, the most common approach is to have some sort of load balancing with a secondary node (usually with the read-only mode). The load balancer will usually have some heartbeat system to check on the status of the main server, which covers both situations where the server is shut down by an error or when it's unreachable for some other reason, which could be hard to cover by means directly tied to the server.
Hope this helps.