add simple way to replace a nodegroup #443

errordeveloper · 2019-01-17T13:01:55Z

We can create and delete nodegroups, but one needs to really script their upgrade and it can be tedious. If they are to use e.g. a custom AMI and want to test it first, they probably know how to do tests and will split the phases themeselves.

In many cases users probably want to create a new nodegroup based on an existing one. I think a good way to do this would be with eksctl create nodegroup --replaces=<oldNodeGroup>, which would take most configuration from oldNodeGroup and use it for the new nodegroup, and make sure to delete that old nodegroup once new nodes joined successfully. We can add a mode where latest AMI from the same family is applied, and a parameter to delayed the deletion by some number of minutes, and perhaps scale the old nodegroup down slowly instead of deleting it immediately (just an idea for those that prefer to play safe).

The text was updated successfully, but these errors were encountered:

mumoshu · 2019-01-21T16:53:38Z

Ideally we should finish #396 beforehand and integrate it into this feature. So that the node draining ensures all the old nodes are safely drained (hence pods are migrated to the new nodes).

errordeveloper · 2019-01-22T07:26:40Z

Yes, we certainly need draining before we can do this.

errordeveloper · 2019-03-01T04:50:24Z

cc @tiffanyfay

pawelprazak · 2019-04-02T11:33:01Z

an interesting strategy, that could build on this work, would be to do a blue/green with two node pools, esp. with built in self-tests like:

create new node pool with two nodes, mark as non-schedulable
deploy some smoke test pods (with node selectors and taint tolerations)
mark as schedulable
drain one old node in the old node pool
continue the rolling update if all new pods are running and ready
(optionally) pause at 50/50 and 99/1 for a predetermined delay
delete old node pool

this would require an operator and a lot of effort, but would automate a lot of QA work

errordeveloper · 2019-04-02T12:27:58Z

I think it is a good idea, definitely! We also have though today a few test workloads for functional verification, essentially something that we could probably extract from the integration suite with a few tweaks.

…

On Tue, 2 Apr 2019, 12:33 pm Paweł Prażak, ***@***.***> wrote: an interesting strategy would be to do a blue/green with two node pools, esp. with built in self-tests like: - create new node pool with two nodes, mark as non-schedulable - deploy some smoke test pods (with node selectors and taint tolerations) - mark as schedulable - drain one old node in the old node pool - continue the rolling update if all new pods are running and ready - (optionally) pause at 50/50 and 99/1 for a predetermined delay - delete old node pool this would require an operator and a lot of effort, but would automate a lot of QA work — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#443 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAPWSxhrClXpJ_CJvLfkZ-8Lvxo9fRkaks5vcz_tgaJpZM4aFJs2> .

errordeveloper · 2019-05-01T16:45:11Z

This would depend on #642.

polanfong · 2020-04-20T01:30:13Z

Any progress on this? I've been looking for the exact same thing. The closest I can find is the eksctl upgrade nodegroup ... command but I have a feeling this doesn't result in the same thing. Can anyone comment on the difference(s)?

github-actions · 2021-01-27T02:11:49Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

michaelbeaumont · 2021-01-27T17:06:08Z

Duplicate of #2774

github-actions · 2021-02-27T02:01:21Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2021-03-04T02:04:59Z

This issue was closed because it has been stalled for 5 days with no activity.

Dhruv-Garg79 · 2024-06-06T10:44:42Z

any plans to add this?

errordeveloper mentioned this issue Jan 17, 2019

EKS Node Groups update proposal #100

Closed

errordeveloper added area/nodegroup area/upgrades labels Jan 22, 2019

mumoshu mentioned this issue Jan 25, 2019

Declarative way to add/delete nodegroups #462

Closed

whereisaaron mentioned this issue Feb 16, 2019

[EKS]: How to upgrade worker nodes created by eksctl? aws/containers-roadmap#72

Closed

errordeveloper mentioned this issue Mar 1, 2019

Update cluster using CLI #323

Closed

errordeveloper mentioned this issue Mar 14, 2019

Document upgrades #610

Merged

1 task

whereisaaron mentioned this issue Mar 29, 2019

[EKS] Cloudformation support for cluster upgrades aws/containers-roadmap#115

Closed

martina-if added the priority/backlog Not staffed at the moment. Help wanted. label Sep 11, 2020

github-actions bot added the stale label Jan 27, 2021

michaelbeaumont removed the stale label Jan 27, 2021

michaelbeaumont marked this as a duplicate of #2774 Jan 27, 2021

michaelbeaumont mentioned this issue Jan 27, 2021

Add first draft proposal for apply #3037

Merged

5 tasks

github-actions bot added the stale label Feb 27, 2021

github-actions bot closed this as completed Mar 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add simple way to replace a nodegroup #443

add simple way to replace a nodegroup #443

errordeveloper commented Jan 17, 2019

mumoshu commented Jan 21, 2019

errordeveloper commented Jan 22, 2019

errordeveloper commented Mar 1, 2019

pawelprazak commented Apr 2, 2019 •

edited

Loading

errordeveloper commented Apr 2, 2019 via email

errordeveloper commented May 1, 2019

polanfong commented Apr 20, 2020

github-actions bot commented Jan 27, 2021

michaelbeaumont commented Jan 27, 2021

github-actions bot commented Feb 27, 2021

github-actions bot commented Mar 4, 2021

Dhruv-Garg79 commented Jun 6, 2024

add simple way to replace a nodegroup #443

add simple way to replace a nodegroup #443

Comments

errordeveloper commented Jan 17, 2019

mumoshu commented Jan 21, 2019

errordeveloper commented Jan 22, 2019

errordeveloper commented Mar 1, 2019

pawelprazak commented Apr 2, 2019 • edited Loading

errordeveloper commented Apr 2, 2019 via email

errordeveloper commented May 1, 2019

polanfong commented Apr 20, 2020

github-actions bot commented Jan 27, 2021

michaelbeaumont commented Jan 27, 2021

github-actions bot commented Feb 27, 2021

github-actions bot commented Mar 4, 2021

Dhruv-Garg79 commented Jun 6, 2024

pawelprazak commented Apr 2, 2019 •

edited

Loading