-
Notifications
You must be signed in to change notification settings - Fork 4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[VPA] Enhancement proposal: Delete OOM Pods
- Loading branch information
Christoph Manns
committed
May 20, 2022
1 parent
128bfa5
commit 45a6608
Showing
1 changed file
with
75 additions
and
0 deletions.
There are no files selected for viewing
75 changes: 75 additions & 0 deletions
75
vertical-pod-autoscaler/enhancements/4902-delete-oom-pods/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# KEP-4902: Delete OOM Pods | ||
|
||
<!-- toc --> | ||
- [Summary](#summary) | ||
- [Motivation](#motivation) | ||
- [Goals](#goals) | ||
- [Non-Goals](#non-goals) | ||
- [Proposal](#proposal) | ||
- [Design Details](#design-details) | ||
- [Test Plan](#test-plan) | ||
- [Implementation History](#implementation-history) | ||
- [Alternatives](#alternatives) | ||
- [Update the eviction API](#update-the-eviction-api) | ||
<!-- /toc --> | ||
|
||
## Summary | ||
|
||
The default behaviour of VPA Updater is to evict Pods when new resource | ||
recommendations are available. This works fine for most cases. | ||
A problem that can arise is when there are multiple replicas and a | ||
PodDisruptionBudget (PDB) which allows one disruption. Now the eviction is at | ||
risk, because as soon as multiple replicas run into for example the memory | ||
limit and get killed they will never recover as the eviciton API will not allow | ||
any further disruptions. | ||
|
||
This proposal addresses the problem by allowing users to enable the deletion of | ||
pods as a backup if the eviction fails. | ||
|
||
## Motivation | ||
|
||
The motivation behind the change is to give VPA users a way to recover a | ||
failing deployment, if updated/increased limits would solve the problem. | ||
|
||
### Goals | ||
|
||
- Main: allow cluster administrators to enable deletion of pods | ||
- Secondary: configurable deletion threshold to tune the deletion behaviour | ||
|
||
### Non-Goals | ||
|
||
- Get rid of or work around the existing eviction behaviour | ||
|
||
## Proposal | ||
|
||
The proposal is to add `--experimental-deletion` to the VPA to enable deletion | ||
of pods. Currently only as an experimental, or beta feature. | ||
To add a bit of configuration an additional an additional flag, | ||
`--experimental-deletion-threshold`, should be addedd. | ||
|
||
## Design Details | ||
|
||
Suggested implementation is present in [PR | ||
4898](https://github.com/kubernetes/autoscaler/pull/4898). | ||
|
||
### Test Plan | ||
|
||
Add unit tests that cover the new code paths. | ||
|
||
## Implementation History | ||
|
||
- 2022-05-19: initial version | ||
|
||
## Alternatives | ||
|
||
### Update the eviction API | ||
|
||
Instead of implementing this change on the client side, the VPA in this case, | ||
it could be implemented on the API side. This would have the advantage that it | ||
would work for all clients. On the other hand this would introduce breaking | ||
behaviour and most likely would result in a new api version. | ||
|
||
Also according to some discussions the general stance seems to be: | ||
If you don't like the drain/evict behaviour, just use delete. | ||
(https://github.com/kubernetes/kubernetes/issues/72320, | ||
https://github.com/kubernetes/kubernetes/pull/105296) |