Issues After Upgrading

Completed

We upgraded from v9.1.1 to the 2019.2 on a test server and have the following issues...

1. TeamCity Web UI takes forever to initialize components, this may be due to # 2.

2. Several of the OpenJDK Platform binary processes are hitting very high CPU and killing the server. This seem to be related to the TeamCity Server service because if I stop that service, then the CPU drops. Or do these processes do some work on upgrade and we should just allow them to complete?

3. The SQL Server process is also using a lot of CPU. We use this for our database.

4. In the TeamCity UI, we no longer have buildAgents available. Agent has unregistered (will upgrade).

Any thoughts on how to resolve these issues?

3 comments
Comment actions Permalink

Just a follow up, after the upgrade I am seeing a Build Configuration Parameter that wasn't there pre-upgrade, but it won't let me delete it. This is preventing me from running builds.

If I mouse over the "undeletable" text it shows a tooltip that reads "To delete this parameter you need to remove reference to it from the settings.".

How can I do this?

 

0
Comment actions Permalink

Hi,

 

I'll try to address your issues:

1- This usually happens on server restart. How much "forever" means here is relevant, and understanding how often it happens as well. The server logs might provide some information, but the most common issue for it is that not enough memory is available for the process while it needs to load a large amount of objects into memory. In particular, hs_err* logs from crashes, teamcity-server.log or catalina.out might contain valuable information to assess whether this is the root of the issue.

2- As you mentioned in 1, it's very likely related. Hitting very high cpu usage usually happens when garbage collection needs to be very aggressive, and if it cannot clean enough memory, it leads to a jvm crash because more memory is needed than is available. This could be observed by checking the logs, and is usually solved by increasing the amount of memory that the teamcity server is allowed to use. More information here: https://www.jetbrains.com/help/teamcity/reporting-issues.html#ReportingIssues-OutOfMemoryProblems

It is not guaranteed that this is the root cause of the problem, but it's definitely one of the most common causes, particularly after an upgrade. It's often the case that the default values provided by the server for usage are not enough to handle a long standing production server, and if the environment variables that set the amount of memory for the server are not set, or not appropriately set, it leads to exactly this kind of issues.

3- Particularly during startup, teamcity needs to load large amounts of information from the sql server. This can tax the sql server a bit, and if the startup is delayed enough due to this issues, it will also have an impact on the sql performance. This said, it would be suggested to take a look at the monitoring tools of the sql server and see what exactly (what queries, how many, etc) are having the impact.

4- When the server updates, the build agents need to download the newest version of the agent and upgrade, as agents can only communicate with a server of the exact same version. This can take a few minutes, and needs to download files from the server. Given the server is being impacted by performance, this download might be problematic, and that's not counting the issues that might arise during the agent's upgrade, if any. If the process is taking more than a few minutes, open one of the agents' directory and check for its logs (in particular teamcity-agent.log and upgrade.log) and see if there is any specific problem being reported.

5- The parameter indicates that your agents are required to have the nuget tool installed. You can control this from Administration - Tools, but it will require agents to be live to be able to report having the tool installed. Once this parameter is reported by the agents it should stop showing as a requirement.

 

Could you try those suggestions? If you have trouble finding the related information in the logs, consider sending those to us via the submit a request button on top of this page.

0
Comment actions Permalink

We increased the memory on our test build server from 2GB, which has always been enough to 8GB, then tried the upgrade, again, and it appears to have worked much better.

The agent issue I was also able to resolve by installing the latest version of NuGet and making it the default, though I suspect I could have just made our existing version the default and it would have worked, as well. Not sure why is wasn't the default post upgrade, though.

Thanks for the input and suggestions.

0

Please sign in to leave a comment.