Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test force_update_version variable for better EKS version upgrade #5037

Closed
9 tasks
poornima-krishnasamy opened this issue Nov 21, 2023 · 1 comment
Closed
9 tasks
Assignees

Comments

@poornima-krishnasamy
Copy link
Contributor

Background

When the EKS version upgrade happens and the node are drained, if any of the pods are crashloopBackoff and have a poddisruptionBudget which cant be violated, then the draining of node cannot happen. Currently we have a cloudwatch script which runs and check for any errors and force delete the pod in order to proceed with the upgrade.

This is because of open issue which can be found in EKS container roadmap and kubernetes

In the EKS terraform module 1.18, there is a flag which can help to drain the nodes forcefully for these kind of errors.
cloudposse/terraform-aws-eks-node-group#151

Approach

  • Create a cluster and add workload which has pdb violations and crashlooping.
  • Update the variable and do an eks version upgrade.
  • Test if the upgrade is successful without the need of cloudwatch script

Which part of the user docs does this impact

https://runbooks.cloud-platform.service.justice.gov.uk/upgrade-eks-cluster.html#upgrade-eks-cluster

Communicate changes

  • post for #cloud-platform-update
  • Weeknotes item
  • Show the Thing/P&A All Hands/User CoP
  • Announcements channel

Questions / Assumptions

Definition of done

  • readme has been updated
  • user docs have been updated
  • another team member has reviewed
  • smoke tests are green
  • prepare demo for the team

Reference

How to write good user stories

@poornima-krishnasamy
Copy link
Contributor Author

Tha variable didnt improve the stuck pod eviction during node group upgrade. Probably that is used when doing upgrade via terraform and not apply to the one in console

@github-project-automation github-project-automation bot moved this from 🏗 In Progress to 🥇 Done in Cloud Platform May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

2 participants