diff --git a/vertical-pod-autoscaler/enhancements/4902-delete-oom-pods/README.md b/vertical-pod-autoscaler/enhancements/4902-delete-oom-pods/README.md new file mode 100644 index 000000000000..0fa06410ea2c --- /dev/null +++ b/vertical-pod-autoscaler/enhancements/4902-delete-oom-pods/README.md @@ -0,0 +1,75 @@ +# KEP-4902: Delete OOM Pods + + +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) +- [Design Details](#design-details) + - [Test Plan](#test-plan) +- [Implementation History](#implementation-history) +- [Alternatives](#alternatives) + - [Update the eviction API](#update-the-eviction-api) + + +## Summary + +The default behaviour of VPA Updater is to evict Pods when new resource +recommendations are available. This works fine for most cases. +A problem that can arise is when there are multiple replicas and a +PodDisruptionBudget (PDB) which allows one disruption. Now the eviction is at +risk, because as soon as multiple replicas run into for example the memory +limit and get killed they will never recover as the eviciton API will not allow +any further disruptions. + +This proposal addresses the problem by allowing users to enable the deletion of +pods as a backup if the eviction fails. + +## Motivation + +The motivation behind the change is to give VPA users a way to recover a +failing deployment, if updated/increased limits would solve the problem. + +### Goals + +- Main: allow cluster administrators to enable deletion of pods +- Secondary: configurable deletion threshold to tune the deletion behaviour + +### Non-Goals + +- Get rid of or work around the existing eviction behaviour + +## Proposal + +The proposal is to add `--experimental-deletion` to the VPA to enable deletion +of pods. Currently only as an experimental, or beta feature. +To add a bit of configuration an additional an additional flag, +`--experimental-deletion-threshold`, should be addedd. + +## Design Details + +Suggested implementation is present in [PR +4898](https://github.com/kubernetes/autoscaler/pull/4898). + +### Test Plan + +Add unit tests that cover the new code paths. + +## Implementation History + +- 2022-05-19: initial version + +## Alternatives + +### Update the eviction API + +Instead of implementing this change on the client side, the VPA in this case, +it could be implemented on the API side. This would have the advantage that it +would work for all clients. On the other hand this would introduce breaking +behaviour and most likely would result in a new api version. + +Also according to some discussions the general stance seems to be: +If you don't like the drain/evict behaviour, just use delete. +(https://github.com/kubernetes/kubernetes/issues/72320, +https://github.com/kubernetes/kubernetes/pull/105296)