Adds feature to crashlooping delete pods after evition fails #4898

RuriRyan · 2022-05-19T09:33:22Z

Which component this PR applies to?

vertical-pod-autoscaler

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds a new experimental feature, which can be enabled via
a command line flag which attempts a pod deletion should the eviction
fail for whatever reason.
The deletion is bound to various conditions to not blindly delete a pod
where the eviction failure might be justified.
In general the deletion only occurs if at least one container was
OOMKilled and is currently in CrashLoopBackoff and the number of
restarts exceeeds the configured threshold (also behind a flag).

Which issue(s) this PR fixes:

Fixes #4730

Special notes for your reviewer:

Does this PR introduce a user-facing change?

New feature to delete crashlooping pods after eviction fails

This PR adds a new experimental feature, which can be enabled via a command line flag which attempts a pod deletion should the eviction fail for whatever reason. The deletion is bound to various conditions to not blindly delete a pod where the eviction failure might be justified. In general the deletion only occurs if at least one container was OOMKilled and is currently in CrashLoopBackoff and the number of restarts exceeeds the configured threshold (also behind a flag).

k8s-ci-robot · 2022-05-19T09:33:29Z

Welcome @RuriRyan!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

jbartosik · 2022-05-19T10:07:58Z

Thanks. I think it would be good to have an enhancement proposal for this.

In the enhancement proposal I'd like to see:

Short explanation why we want this
Explanation why do this in VPA instead of in PDB
Description of the implementation

jbartosik · 2022-05-19T16:39:13Z

I was writing in hurry. Let me try once more.

Thanks for sharing this. I'll take a look when I have some time (I expect next week).

Reading the PR will help me understand how much complexity implementing this would add to VPA (which is one concern).

I still would like to understand in some more detail why implementing this as a part of PDB is a problem. If we could have support for evicting pods in PDB it would be better. But that's "if". On the other hand if implementing this in PDB is not an option I'd like to know why but I guess it makes sense to make an improvement in VPA.

k8s-ci-robot · 2022-07-25T10:02:15Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: RuriRyan
Once this PR has been reviewed and has the lgtm label, please assign jbartosik for approval by writing /assign @jbartosik in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

vertical-pod-autoscaler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jbartosik · 2022-07-25T12:07:44Z

/kind api-change

jbartosik · 2022-07-25T12:08:47Z

/kind documentation

jbartosik · 2022-07-25T12:10:23Z

LGMT
I think I need someone who knows more about API to take a look too

k8s-triage-robot · 2022-07-25T12:23:52Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

jbartosik · 2022-07-25T12:24:49Z

I meant to leave those comments on the PR with KEP

RuriRyan · 2022-09-01T15:00:43Z

@jbartosik I 'm considering this as "done" now and am patiently awaiting your review :D

jbartosik

Thanks, I did first pass, mostly focusing on what was hard for me to read.

Did you test if this works as intended end - to - end?

jbartosik · 2022-09-02T11:58:08Z

vertical-pod-autoscaler/deploy/vpa-v1-crd-gen.yaml

@@ -383,6 +383,11 @@ spec:
                  pods. If not specified, all fields in the `PodUpdatePolicy` are
                  set to their default values.
                properties:
+                  deleteOomingOnEvictionError:
+                    description: Wheather to try to delete the pod when eviction fails


s/Wheather/Whether/

Or maybe rewrite to something simpler, like:

When true VPA will try to delete OOMing pods when eviction fails. When False it won't do that.

jbartosik · 2022-09-02T12:00:46Z

vertical-pod-autoscaler/pkg/updater/.gitignore

@@ -5,3 +5,5 @@ updater-arm64
 updater-arm
 updater-ppc64le
 updater-s390x
+


Please drop the empty line

jbartosik · 2022-09-02T12:01:16Z

vertical-pod-autoscaler/pkg/updater/README.md

@@ -26,6 +26,9 @@ Threshold for evicting pods is specified by recommended min/max values from VPA
 Priority of evictions within a set of replicated pods is proportional to sum of percentages of changes in resources
 (i.e. pod with 15% memory increase 15% cpu decrease recommended will be evicted
 before pod with 20% memory increase and no change in cpu).
+* Deleting pods if the eviction fails and if the corresponding feature flag (`--experimental-deletion`) is enabled.
+Deletion is guarded by a treshold (`--experimental-deletion-threshold`) of how many restarts are required. The pod
+needs to be in `CrashLoopBackOff` and the LastTerminationReason needs to be `OOMKilled`.


LastTerminationReason ?

jbartosik · 2022-09-02T12:14:56Z

vertical-pod-autoscaler/pkg/updater/eviction/pods_eviction_restriction.go

+			deleted = true
+			eventRecorder.Event(podToEvict, apiv1.EventTypeNormal, "DeletedByVPA",
+				"Pod was deleted by VPA Updater to apply resource recommendation.")
+			break


If I read this loop right it will attempt to delete 0 or 1 containers.

0 containers if canDelete doesn't return true for any container,

1 container otherwise (returns error if delete fails, breaks if it succeeds)

Is this the intended behavior (as opposed to attempting to delete all containers until we run into an error)?

If so I'd make it into its own function that returns (bool, error) to make this easier to read.

It is the intended behaviour, because we cannot delete individual containers, but only the whole pod. So as soon as the pod is deleted, there is no reason to continue for the rest of the containers in the pod.

jbartosik · 2022-09-02T12:16:45Z

vertical-pod-autoscaler/pkg/updater/eviction/pods_eviction_restriction.go

+// EvictViaDelete sends deletion instruction to api client. Returns error if pod cannot be deleted or if client returned error
+// Does not check if pod was actually deleted after termination grace period.
+func (e *podsEvictionRestrictionImpl) EvictViaDelete(podToEvict *apiv1.Pod, eventRecorder record.EventRecorder) error {
+	cr, present := e.podToReplicaCreatorMap[getPodID(podToEvict)]


At first I was confused why we set cr here but use it much later. Please add a comment like

// Make sure that we can map pod to replica so any problems prevent deleting the pod.

jbartosik · 2022-09-02T12:18:32Z

vertical-pod-autoscaler/pkg/updater/eviction/pods_eviction_restriction_test.go

+
+	for _, pod := range pods[:1] {
+		err := eviction.EvictViaDelete(pod, test.FakeEventRecorder())
+		assert.Nil(t, err, "Should evict with no error")


Please use assert.NoError

jbartosik · 2022-09-02T12:19:49Z

vertical-pod-autoscaler/pkg/updater/eviction/pods_eviction_restriction_test.go

+	}
+	for _, pod := range pods[1:] {
+		err := eviction.EvictViaDelete(pod, test.FakeEventRecorder())
+		assert.Error(t, err, "Error expected")


Why do we expect error here?

This is effectively just a copy & paste of the unit test for the "normal" eviction. I found out that only one eviction is allowed per "replica controller" for every "tick" of the updater. So when multiple pods of the same replica set should get evicted it throws an error until the next tick.

jbartosik · 2022-09-02T12:24:36Z

vertical-pod-autoscaler/pkg/updater/logic/updater.go

+
+				// only try to delete the pod if the feature is enabled and if we would increase
+				// the resource requests or limits
+				if priority.ScaleUp && DeleteOomingOnEvictionError {


I'd add a function that returns bool and rewrite the condition as

if attemptEvictViaDelete(&priority, &vpa.Spec) { //...

jbartosik · 2022-09-02T12:27:57Z

vertical-pod-autoscaler/pkg/updater/main.go

@@ -65,6 +65,8 @@ var (

 	namespace          = os.Getenv("NAMESPACE")
 	vpaObjectNamespace = flag.String("vpa-object-namespace", apiv1.NamespaceAll, "Namespace to search for VPA objects. Empty means all namespaces will be used.")
+
+	deleteOomingOnEvictionError = flag.Bool("delete-ooming-on-eviction-error", false, "If true, updater will try to delete ooming pods when the eviction fails.")


I think there should be "by default" somewhere in the flag description

jbartosik · 2022-09-02T12:30:22Z

vertical-pod-autoscaler/pkg/utils/metrics/updater/updater.go

+		prometheus.CounterOpts{
+			Namespace: metricsNamespace,
+			Name:      "deleted_pods_total",
+			Help:      "Number of Pods delete by Updater to apply a new recommendation.",


Number of Pods delete--d--

jbartosik · 2022-09-02T13:07:36Z

/hold

While I'm working on VPA 0.12.0 I don't want to make any changes to VPA that are not related to the release (I think code is good to go but E2E tests are timing out).

I'll remove the hold after I release VPA 0.12.0.

jbartosik · 2022-09-20T11:20:43Z

/hold cancel

jbartosik · 2022-10-10T11:22:33Z

@RuriRyan I see this has a few open comments, mostly small things.

RuriRyan · 2022-10-10T14:16:40Z

@RuriRyan I see this has a few open comments, mostly small things.

@avorima Can you take over? :D

Fyi: I have left ionos and no longer have access to this Code.

avorima · 2022-10-10T14:42:03Z

@jbartosik AFAICT all open review comments have either been fixed or answered

timoreimann · 2022-10-26T12:33:46Z

We'd also love to see this PR move forward and eventually get merged.

/cc @jbartosik

jbartosik · 2022-10-27T12:54:46Z

@timoreimann looks good to me

There's one thing remaining - checking if this works end to end.

Did you run any (manual?) tests to verify this works as intended?

Can you provide an e2e test that will ensure this keeps working?

I thought that I could take OOM test scenario and modify it:

Same scenario but with PDB preventing VPA from evicting any pods
Same scenario with PDB and VPA allowed to delete, verify that recommendation is applied

I'm not sure when I'll be able to do that. I'm worried about merging this without any verification that this works as intended end to end

jbartosik · 2022-10-27T12:58:26Z

@timoreimann I want to merge this but need some verification that this works e2e. I see the following options:

Someone writes e2e sends PR so that we can merge them at the same time
Someone verifies this works e2e manually, we merge then add e2e test later

I can contribute e2e tests but I'm not sure when. I don't think 2. is significantly faster than 1.

timoreimann · 2022-10-28T19:45:08Z

Writing an e2e tests sounds like the right thing to do. I'm happy to look into that when I have some spare time unless someone beats me to it.

voelzmo · 2022-11-18T10:31:04Z

I just saw unhealthyPodEvictionPolicy was added for PDBs (see KEP and API change in PDB).

We also saw the original issue happen quite a few times and would want to get this fixed, but the change to PDB seems to solve the problem in a more "k8s native" way instead of work-arounding with the delete call.

@jbartosik @avorima @timoreimann wdyt?

timoreimann · 2022-11-18T10:49:23Z

@voelzmo thanks for sharing the KEP. I wasn't aware of it, this does sound like the right approach going forward.

That said, I think there's value in driving this PR to completion regardless: the solution in the KEP will only be available in Kubernetes 1.26 in alpha. Users on older versions will still want to have some kind of solution (at least we do given a large portion of our fleet is affected by the problem regularly).
It also seems possible to transition from what this PR proposes to what the KEP outlines by checking for the new field in the PDB once available and honor its value with higher precedence. At least, that's how I understand it should work.

Just my 2 cents of course, final call is to be made by @jbartosik. Coincidentally, I have started working on extending the end-to-end tests as proposed just earlier today, so I could probably add the last missing piece fairly soon.

timoreimann · 2022-11-18T11:06:07Z

On second thought, the feature from this PR will likely only be backported so far (if at all), and / or more recent versions of VPA may be incompatible with older Kubernetes versions? So maybe there is only so much value users of older Kubernetes releases could get out of this PR?

I'm not super familiar with VPA's / CA's release policy and compatibility guarantees, so throwing out additional thoughts mostly. @jbartosik still to the rescue for more clarity and preferences. 🙂

EDIT: had to remind myself that VPA is versioned separately from CA and based on its own CRD, so perhaps my second thought concerns aren't as concerning after all, and continuing with the PR would still be beneficial.

avorima · 2022-11-18T16:35:21Z

Thanks for sharing @voelzmo, that looks promising.

This PR and the VPA KEP were just created because, at the time, it seemed like this was never going to make it into the PDB spec, so I would actually be fine with closing it now. The short time that it would be useful would probably be outweighed by the effort it takes to maintain it. Only speaking for myself of course.

jbartosik · 2022-11-22T14:01:42Z

I'm happy to use PDB features instead of working around PDB in VPA. AlwaysAllow policy sounds like it does what this was intended to do (from kubernetes/kubernetes#113375):

// AlwaysAllow policy means that all running pods (status.phase="Running"),
// but not yet healthy are considered disrupted and can be evicted regardless
// of whether the criteria in a PDB is met. This means perspective running
// pods of a disrupted application might not get a chance to become healthy.
// Healthy pods will be subject to the PDB for eviction.

avorima · 2022-11-22T15:21:28Z

Ok, what about the KEP? I'm not familiar with the process, but I suppose it needs to be removed or updated to document this decision.

jbartosik · 2022-11-23T13:23:05Z

Ok, what about the KEP? I'm not familiar with the process, but I suppose it needs to be removed or updated to document this decision.

I added an item about this to the next SIG meeting

k8s-triage-robot · 2023-02-21T14:14:45Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

avorima · 2023-02-21T15:18:00Z

/close

k8s-ci-robot · 2023-02-21T15:18:06Z

@avorima: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 19, 2022

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 19, 2022

k8s-ci-robot requested review from jbartosik and krzysied May 19, 2022 09:33

jbartosik added the area/vertical-pod-autoscaler label May 19, 2022

RuriRyan mentioned this pull request May 20, 2022

[VPA] KEP-4902: Delete OOM Pods #4902

Merged

Update to latest KEP state

e2dd8cb

jbartosik mentioned this pull request Jul 20, 2022

[VPA] Feature Request: Delete OOM Pods #4730

Closed

Rename flag && only delete if scale up is intended

c8e2174

RuriRyan changed the title ~~Adds experimental feature to delete pods~~ Adds feature to delete pods if evition fails Jul 25, 2022

Add missed files & comment on new exported type

940588c

k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jul 25, 2022

k8s-ci-robot added the kind/documentation Categorizes issue or PR as related to documentation. label Jul 25, 2022

jbartosik removed kind/documentation Categorizes issue or PR as related to documentation. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Jul 25, 2022

RuriRyan changed the title ~~Adds feature to delete pods if evition fails~~ Adds feature to crashlooping delete pods after evition fails Aug 30, 2022

Update flag name to match the KEP

720d570

jbartosik requested changes Sep 2, 2022

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 2, 2022

Address review comments

66d5a36

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 20, 2022

k8s-ci-robot requested a review from jbartosik October 26, 2022 12:33

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 21, 2023

k8s-ci-robot closed this Feb 21, 2023

Adds feature to crashlooping delete pods after evition fails #4898

Adds feature to crashlooping delete pods after evition fails #4898

Conversation

RuriRyan commented May 19, 2022 • edited Loading

Which component this PR applies to?

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

k8s-ci-robot commented May 19, 2022

jbartosik commented May 19, 2022

jbartosik commented May 19, 2022

k8s-ci-robot commented Jul 25, 2022

jbartosik commented Jul 25, 2022

jbartosik commented Jul 25, 2022

jbartosik commented Jul 25, 2022

k8s-triage-robot commented Jul 25, 2022

jbartosik commented Jul 25, 2022

RuriRyan commented Sep 1, 2022

jbartosik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbartosik commented Sep 2, 2022

jbartosik commented Sep 20, 2022

jbartosik commented Oct 10, 2022

RuriRyan commented Oct 10, 2022 • edited Loading

avorima commented Oct 10, 2022 • edited Loading

timoreimann commented Oct 26, 2022

jbartosik commented Oct 27, 2022

jbartosik commented Oct 27, 2022

timoreimann commented Oct 28, 2022

voelzmo commented Nov 18, 2022

timoreimann commented Nov 18, 2022 • edited Loading

timoreimann commented Nov 18, 2022 • edited Loading

avorima commented Nov 18, 2022

jbartosik commented Nov 22, 2022

avorima commented Nov 22, 2022

jbartosik commented Nov 23, 2022

k8s-triage-robot commented Feb 21, 2023

avorima commented Feb 21, 2023

k8s-ci-robot commented Feb 21, 2023

RuriRyan commented May 19, 2022 •

edited

Loading

RuriRyan commented Oct 10, 2022 •

edited

Loading

avorima commented Oct 10, 2022 •

edited

Loading

timoreimann commented Nov 18, 2022 •

edited

Loading

timoreimann commented Nov 18, 2022 •

edited

Loading