Cloud support : API usage for "mark for termination" ?

Hello,

I'm working on yandex-qatools/teamcity-openstack-plugin, the TeamCity OpenStack Cloud Support.
I'm trying to improve the configuration reload mechanism (when image or some properties like 'Instance Cap' update), which currently terminate all running instances (=> #2).

When this method is called (because configuration is reloaded):

jetbrains.buildServer.clouds.CloudClientEx.dispose()

All resources (=> images & instances) are "disposed" too. This instance terminaison  is a little violent :-).

Without going to an high-level implementation like the native AWS, a improvement could be to "mark for termination after current build finishes" these agents.
With that, on any configuration change, current build finish (and is not stopped) and agent is deleted.

Is there a way to set this property (mark agent for termination) from CloudClientEx / CloudImage / CloudInstance interfaces ?

Thanks in advance.
Best regards

2 comments
Comment actions Permalink

Hi, was finally able to review the plugin. I'm going to write down the things I noticed. They will address the pain points you mentioned above, but not only them. So,

  1. There's so called AgentType (which basically contains properties of agent). All CloudInstances under the same CloudImage share the same AgentType. AgentType is represented by 3 parameters: CloudCode (OpenstackCloudParameters.CLOUD_TYPE), ProfileId (i.e. NOVA-1) and ImageId (CloudImage#getId()). Since ImageId is generated every time from System.currentTimeMillis(), every restart or enable/disable of profile will make all new agents come to default/project pool without any history, etc. The ImageId must be the same for the same entity. Usually(and it is recommended), we use smth like 'Agent name prefix'. So, as long as 'agent name prefix' is the same, other changes in the image (instance size, etc.) won't affect TC part of it (such as pool, compatibility, history, etc.).
  2. It is recommended to mark instances launched by TC, so CloudClient can actually catch them on start. OpenStack supports tagging (not sure what is the implementation, but on the high-level this is true). You need a way to find all instances from a single profile, then locate instances belonging to a certain CloudImage, then instances themselves.
  3. It is recommended to periodically synchronize state of cloud instances with the openstack. Usually, this is done by fetching all TC-tagged instances and matching their states with the TC internal state.
  4. It is highly recommended to use the same procedure (from #3) to catch all started and running instances when initializing a CloudProfile. If you maintain the AgentType (as mentioned in #1) and tag instances (#2), you should be easily able to do that.
  5. Concerning "mark for termination after current build finishes". This feature works automatically when you choose to stop an instance that is running a build. However, the feature itself relies on the existence of the CloudInstance you marked as terminated and if you disable a profile, the actual instances will never be terminated.

So, to make this more viable, I would recommend to fix at least the following places in the code:

1) Get rid of IdGenerator and use permanent values for CloudImage.getId() implementation (OpenstackCloudClient.java:69)

2) OpenstackCloudClient#isInitialized should return only after plugin got initial state from the openstack. This can (and should be) done async. There's no problem with waiting a few seconds to let plugin fully initialize.

3) Don't terminate all instances in OpenstackCloudImage#dispose.

There are a few more comments, but they are minor. If you'd like, I can create a ticket at GitHub and put my notes here as well.

2
Comment actions Permalink

Hi Sergey, many thanks for this high valuable comments.

Agree todo any follow up of that in plugin issue: #2.

0

Please sign in to leave a comment.