TeamCity High-Availability
Hi
I am reading the information on this page: Multinode Setup for High Availability | TeamCity On-Premises (jetbrains.com)
If i understand it correctly, this is a manual Active/Passive setup, and not a real HA. The idea of having different servers handling specific jobs(from the drawing from the link: Secondary Node that handle [Polling changes] - Secondary Node that handle [Processing build's data]] is fine, but still is a single point of failure setup. It is written than if the main is offline (crash or maintenance), the secondary will handle the main jobs., but if that is the action, it is a failover setup and not HA - and if you have to do all kind of manual checks, before moving back into normal state, is a complicated setup.
I was thinking of something like this:
Having 2 (or more) Proxy in front (nginx or Microsoft ARR for the use of http headers or just a simple NLB client IP affinity on the team city hosts)
Having 2 (or more) TeamCity servers handling all TeamCity jobs (just like having a single server hosting all services)
Shared Data Directory could be a DFS service on each host, or separate servers
Database in a SQL Always-On
In a disaster scenario this setup would maintain HA - and when updating TeamCity you could either update one host at a time (if there was no critical SQL changes) or you could yell for a maintenance window.
Never the less, i thing the documentation suggestion for a MultiNode setup called HA is overrated. specilay when one of the bullet points is called "Main vs. secondary node" - i read that as active/passive.
am i the only one that seek information regarding this ?
David
Please sign in to leave a comment.
Bump on this. Would also really like to see the secondary node be able to immediately step in upon failure of the Main node. Otherwise, it's not much more available than an instance running on a single node. 0 downtime during an upgrade or a rollout is what we were kind of hoping for.
You can make the secondary node take the role of the main node in a case of a failure with minimal downtime. Please refer to the Failover section of the documentation for the necessary steps: https://www.jetbrains.com/help/teamcity/2024.03/multinode-setup.html#Failover.
There are REST API endpoints available and can be used to automate and schedule the status checks and change the role in case the main node is offline: https://www.jetbrains.com/help/teamcity/2024.03/multinode-setup.html#Monitoring+and+Managing+Nodes+via+REST+API.
Best regards,
Anton