-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add simple way to replace a nodegroup #443
Comments
Ideally we should finish #396 beforehand and integrate it into this feature. So that the node draining ensures all the old nodes are safely drained (hence pods are migrated to the new nodes). |
Yes, we certainly need draining before we can do this. |
cc @tiffanyfay |
an interesting strategy, that could build on this work, would be to do a blue/green with two node pools, esp. with built in self-tests like:
this would require an operator and a lot of effort, but would automate a lot of QA work |
I think it is a good idea, definitely! We also have though today a few test
workloads for functional verification, essentially something that we could
probably extract from the integration suite with a few tweaks.
…On Tue, 2 Apr 2019, 12:33 pm Paweł Prażak, ***@***.***> wrote:
an interesting strategy would be to do a blue/green with two node pools,
esp. with built in self-tests like:
- create new node pool with two nodes, mark as non-schedulable
- deploy some smoke test pods (with node selectors and taint
tolerations)
- mark as schedulable
- drain one old node in the old node pool
- continue the rolling update if all new pods are running and ready
- (optionally) pause at 50/50 and 99/1 for a predetermined delay
- delete old node pool
this would require an operator and a lot of effort, but would automate a
lot of QA work
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#443 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPWSxhrClXpJ_CJvLfkZ-8Lvxo9fRkaks5vcz_tgaJpZM4aFJs2>
.
|
This would depend on #642. |
Any progress on this? I've been looking for the exact same thing. The closest I can find is the |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Duplicate of #2774 |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue was closed because it has been stalled for 5 days with no activity. |
any plans to add this? |
We can create and delete nodegroups, but one needs to really script their upgrade and it can be tedious. If they are to use e.g. a custom AMI and want to test it first, they probably know how to do tests and will split the phases themeselves.
In many cases users probably want to create a new nodegroup based on an existing one. I think a good way to do this would be with
eksctl create nodegroup --replaces=<oldNodeGroup>
, which would take most configuration fromoldNodeGroup
and use it for the new nodegroup, and make sure to delete that old nodegroup once new nodes joined successfully. We can add a mode where latest AMI from the same family is applied, and a parameter to delayed the deletion by some number of minutes, and perhaps scale the old nodegroup down slowly instead of deleting it immediately (just an idea for those that prefer to play safe).The text was updated successfully, but these errors were encountered: