Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenStack specific rolling update should not require separate update cluster process #9635

Closed
kciredor opened this issue Jul 27, 2020 · 10 comments · Fixed by #9927
Closed

OpenStack specific rolling update should not require separate update cluster process #9635

kciredor opened this issue Jul 27, 2020 · 10 comments · Fixed by #9927

Comments

@kciredor
Copy link

Hi,

Rolling an OpenStack based kops cluster you have to open a separate terminal and run kops-update cluster --yes in a never ending loop. All this does as far as I can tell is create new vm's that have just been terminated by the rolling upgrade to replace them.

Kops on AWS for example does not require this. I guess it's an AWS autoscaling group ensuring a new instance is being created, so kops does not have to bother. Unfortunately OpenStack does not have autoscaling groups.

Would be great if there could be something in the rolling update loop after draining/deleting a node like: "if openstack then createMissingNewNodes" (borrowed from existing kops update cluster code).

Not exactly sure if this is a feature request or a bug fix, depends on how you look at it I think ;-)

Best,
kciredor

@johngmyers
Copy link
Member

It would seem possible for the openstackCloud.DeleteInstance() method to spin up a new instance.

@olemarkus
Copy link
Member

That should be doable. The function name would be a bit misleading though, so if it is used anywhere outside of rolling upgrade, things could get interesting. It may be better to do a check for openstack in instancegroup.deleteInstance instead. Either way it doesn't seem to difficult to fix.

@johngmyers
Copy link
Member

In clouds with ASGs, DeleteInstance() causes the deleted instance to be replaced, assuming the group's desired count ends up greater than the count of remaining instances. Even if DeleteInstance() were called from outside of rolling update, it would be a reasonable place for a cloud provider with manual SGs to do a check on the group and spin up necessary instances.

@olemarkus
Copy link
Member

I had a small peak at this again as this annoys be a bit too. And after previously messing with code in this area I find it not that unreasonable to instancegroup for this. But, I have no idea how to get all the things needed to create the instance there. The instance creation logic is run as a task and has a whole lot of reconciliation logic, and none of that seems accessible from instance deletion.

As far as I can tell, the most viable option is to somehow trigger the update cmd from within rollingupdate. There are other cloudproviders (DO) that do not support autoscaling, so it could be something to add a function on fi.Cloud that for these clusters could kick off an ApplyClusterCmd.Run. That does sound a bit heavy, but at the moment, that is the smallest piece of work needed to run something like kops update cluster internally.

@johngmyers
Copy link
Member

A way to think about this is as implementing a (semi-)ASG in kops. Have the cloudup model save the all the information needed to create an instance for each group in a cloud-provider-specific file in the config store. Then the fi.Cloud code could read that file and have what it needs to create instances.

Wrap a controller around it and you could have an actual ASG.

@johngmyers
Copy link
Member

So it looks like you need a Port, an Instance, and possibly a FloatingIP. It might make sense to create a Loader and run a sort of mini-apply.

@olemarkus
Copy link
Member

Basically all that is needed is to run ServerGroupModelBuilder, I believe. So I think I'll try a new loader that only uses this builder.

@olemarkus
Copy link
Member

So "only using this builder" ends up pulling in most of regular apply, it seems. ServerGroup depends on BootstrapScript which depends on a host of other things.
Also, trying to anything in the openstack cloud implementation ends up in all sorts of dependency cycles, so this has to be done closer to the rolling update command, I think.

@johngmyers
Copy link
Member

I suggest storing the bootstrapscript for each group as a file in the config store.

@kciredor
Copy link
Author

kciredor commented Oct 4, 2020

Nice work, thanks very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants