You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the EKS version upgrade happens and the node are drained, if any of the pods are crashloopBackoff and have a poddisruptionBudget which cant be violated, then the draining of node cannot happen. Currently we have a cloudwatch script which runs and check for any errors and force delete the pod in order to proceed with the upgrade.
Tha variable didnt improve the stuck pod eviction during node group upgrade. Probably that is used when doing upgrade via terraform and not apply to the one in console
Background
When the EKS version upgrade happens and the node are drained, if any of the pods are crashloopBackoff and have a poddisruptionBudget which cant be violated, then the draining of node cannot happen. Currently we have a cloudwatch script which runs and check for any errors and force delete the pod in order to proceed with the upgrade.
This is because of open issue which can be found in EKS container roadmap and kubernetes
In the EKS terraform module 1.18, there is a flag which can help to drain the nodes forcefully for these kind of errors.
cloudposse/terraform-aws-eks-node-group#151
Approach
Which part of the user docs does this impact
https://runbooks.cloud-platform.service.justice.gov.uk/upgrade-eks-cluster.html#upgrade-eks-cluster
Communicate changes
Questions / Assumptions
Definition of done
Reference
How to write good user stories
The text was updated successfully, but these errors were encountered: