Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds feature to crashlooping delete pods after evition fails #4898

Closed

Conversation

RuriRyan
Copy link
Contributor

@RuriRyan RuriRyan commented May 19, 2022

Which component this PR applies to?

vertical-pod-autoscaler

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds a new experimental feature, which can be enabled via
a command line flag which attempts a pod deletion should the eviction
fail for whatever reason.
The deletion is bound to various conditions to not blindly delete a pod
where the eviction failure might be justified.
In general the deletion only occurs if at least one container was
OOMKilled and is currently in CrashLoopBackoff and the number of
restarts exceeeds the configured threshold (also behind a flag).

Which issue(s) this PR fixes:

Fixes #4730

Special notes for your reviewer:

Does this PR introduce a user-facing change?

New feature to delete crashlooping pods after eviction fails

This PR adds a new experimental feature, which can be enabled via
a command line flag which attempts a pod deletion should the eviction
fail for whatever reason.
The deletion is bound to various conditions to not blindly delete a pod
where the eviction failure might be justified.
In general the deletion only occurs if at least one container was
OOMKilled and is currently in CrashLoopBackoff and the number of
restarts exceeeds the configured threshold (also behind a flag).
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 19, 2022
@k8s-ci-robot
Copy link
Contributor

Welcome @RuriRyan!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 19, 2022
@jbartosik
Copy link
Collaborator

Thanks. I think it would be good to have an enhancement proposal for this.

In the enhancement proposal I'd like to see:

  • Short explanation why we want this
  • Explanation why do this in VPA instead of in PDB
  • Description of the implementation

@jbartosik
Copy link
Collaborator

I was writing in hurry. Let me try once more.

Thanks for sharing this. I'll take a look when I have some time (I expect next week).

Reading the PR will help me understand how much complexity implementing this would add to VPA (which is one concern).

I still would like to understand in some more detail why implementing this as a part of PDB is a problem. If we could have support for evicting pods in PDB it would be better. But that's "if". On the other hand if implementing this in PDB is not an option I'd like to know why but I guess it makes sense to make an improvement in VPA.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: RuriRyan
Once this PR has been reviewed and has the lgtm label, please assign jbartosik for approval by writing /assign @jbartosik in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@RuriRyan RuriRyan changed the title Adds experimental feature to delete pods Adds feature to delete pods if evition fails Jul 25, 2022
@jbartosik
Copy link
Collaborator

/kind api-change

@k8s-ci-robot k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jul 25, 2022
@jbartosik
Copy link
Collaborator

/kind documentation

@k8s-ci-robot k8s-ci-robot added the kind/documentation Categorizes issue or PR as related to documentation. label Jul 25, 2022
@jbartosik
Copy link
Collaborator

LGMT
I think I need someone who knows more about API to take a look too

@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@jbartosik jbartosik removed kind/documentation Categorizes issue or PR as related to documentation. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Jul 25, 2022
@jbartosik
Copy link
Collaborator

I meant to leave those comments on the PR with KEP

@RuriRyan RuriRyan changed the title Adds feature to delete pods if evition fails Adds feature to crashlooping delete pods after evition fails Aug 30, 2022
@RuriRyan
Copy link
Contributor Author

RuriRyan commented Sep 1, 2022

@jbartosik I 'm considering this as "done" now and am patiently awaiting your review :D

Copy link
Collaborator

@jbartosik jbartosik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I did first pass, mostly focusing on what was hard for me to read.

Did you test if this works as intended end - to - end?

@@ -383,6 +383,11 @@ spec:
pods. If not specified, all fields in the `PodUpdatePolicy` are
set to their default values.
properties:
deleteOomingOnEvictionError:
description: Wheather to try to delete the pod when eviction fails
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Wheather/Whether/

Or maybe rewrite to something simpler, like:

When true VPA will try to delete OOMing pods when eviction fails. When False it won't do that.

@@ -5,3 +5,5 @@ updater-arm64
updater-arm
updater-ppc64le
updater-s390x

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop the empty line

@@ -26,6 +26,9 @@ Threshold for evicting pods is specified by recommended min/max values from VPA
Priority of evictions within a set of replicated pods is proportional to sum of percentages of changes in resources
(i.e. pod with 15% memory increase 15% cpu decrease recommended will be evicted
before pod with 20% memory increase and no change in cpu).
* Deleting pods if the eviction fails and if the corresponding feature flag (`--experimental-deletion`) is enabled.
Deletion is guarded by a treshold (`--experimental-deletion-threshold`) of how many restarts are required. The pod
needs to be in `CrashLoopBackOff` and the LastTerminationReason needs to be `OOMKilled`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LastTerminationReason ?

deleted = true
eventRecorder.Event(podToEvict, apiv1.EventTypeNormal, "DeletedByVPA",
"Pod was deleted by VPA Updater to apply resource recommendation.")
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this loop right it will attempt to delete 0 or 1 containers.

  • 0 containers if canDelete doesn't return true for any container,
  • 1 container otherwise (returns error if delete fails, breaks if it succeeds)

Is this the intended behavior (as opposed to attempting to delete all containers until we run into an error)?

If so I'd make it into its own function that returns (bool, error) to make this easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the intended behaviour, because we cannot delete individual containers, but only the whole pod. So as soon as the pod is deleted, there is no reason to continue for the rest of the containers in the pod.

// EvictViaDelete sends deletion instruction to api client. Returns error if pod cannot be deleted or if client returned error
// Does not check if pod was actually deleted after termination grace period.
func (e *podsEvictionRestrictionImpl) EvictViaDelete(podToEvict *apiv1.Pod, eventRecorder record.EventRecorder) error {
cr, present := e.podToReplicaCreatorMap[getPodID(podToEvict)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I was confused why we set cr here but use it much later. Please add a comment like

// Make sure that we can map pod to replica so any problems prevent deleting the pod.


for _, pod := range pods[:1] {
err := eviction.EvictViaDelete(pod, test.FakeEventRecorder())
assert.Nil(t, err, "Should evict with no error")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use assert.NoError

}
for _, pod := range pods[1:] {
err := eviction.EvictViaDelete(pod, test.FakeEventRecorder())
assert.Error(t, err, "Error expected")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we expect error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is effectively just a copy & paste of the unit test for the "normal" eviction. I found out that only one eviction is allowed per "replica controller" for every "tick" of the updater. So when multiple pods of the same replica set should get evicted it throws an error until the next tick.


// only try to delete the pod if the feature is enabled and if we would increase
// the resource requests or limits
if priority.ScaleUp && DeleteOomingOnEvictionError {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a function that returns bool and rewrite the condition as

if attemptEvictViaDelete(&priority, &vpa.Spec) {
  //...

@@ -65,6 +65,8 @@ var (

namespace = os.Getenv("NAMESPACE")
vpaObjectNamespace = flag.String("vpa-object-namespace", apiv1.NamespaceAll, "Namespace to search for VPA objects. Empty means all namespaces will be used.")

deleteOomingOnEvictionError = flag.Bool("delete-ooming-on-eviction-error", false, "If true, updater will try to delete ooming pods when the eviction fails.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be "by default" somewhere in the flag description

prometheus.CounterOpts{
Namespace: metricsNamespace,
Name: "deleted_pods_total",
Help: "Number of Pods delete by Updater to apply a new recommendation.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number of Pods delete--d--

@jbartosik
Copy link
Collaborator

/hold

While I'm working on VPA 0.12.0 I don't want to make any changes to VPA that are not related to the release (I think code is good to go but E2E tests are timing out).

I'll remove the hold after I release VPA 0.12.0.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 2, 2022
@jbartosik
Copy link
Collaborator

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 20, 2022
@jbartosik
Copy link
Collaborator

@RuriRyan I see this has a few open comments, mostly small things.

@RuriRyan
Copy link
Contributor Author

RuriRyan commented Oct 10, 2022

@RuriRyan I see this has a few open comments, mostly small things.

@avorima Can you take over? :D

Fyi: I have left ionos and no longer have access to this Code.

@avorima
Copy link
Contributor

avorima commented Oct 10, 2022

@jbartosik AFAICT all open review comments have either been fixed or answered

@timoreimann
Copy link
Contributor

We'd also love to see this PR move forward and eventually get merged.

/cc @jbartosik

@jbartosik
Copy link
Collaborator

@timoreimann looks good to me

There's one thing remaining - checking if this works end to end.

Did you run any (manual?) tests to verify this works as intended?

Can you provide an e2e test that will ensure this keeps working?

I thought that I could take OOM test scenario and modify it:

  1. Same scenario but with PDB preventing VPA from evicting any pods
  2. Same scenario with PDB and VPA allowed to delete, verify that recommendation is applied

I'm not sure when I'll be able to do that. I'm worried about merging this without any verification that this works as intended end to end

@jbartosik
Copy link
Collaborator

@timoreimann I want to merge this but need some verification that this works e2e. I see the following options:

  1. Someone writes e2e sends PR so that we can merge them at the same time
  2. Someone verifies this works e2e manually, we merge then add e2e test later

I can contribute e2e tests but I'm not sure when. I don't think 2. is significantly faster than 1.

@timoreimann
Copy link
Contributor

Writing an e2e tests sounds like the right thing to do. I'm happy to look into that when I have some spare time unless someone beats me to it.

@voelzmo
Copy link
Contributor

voelzmo commented Nov 18, 2022

I just saw unhealthyPodEvictionPolicy was added for PDBs (see KEP and API change in PDB).

We also saw the original issue happen quite a few times and would want to get this fixed, but the change to PDB seems to solve the problem in a more "k8s native" way instead of work-arounding with the delete call.

@jbartosik @avorima @timoreimann wdyt?

@timoreimann
Copy link
Contributor

timoreimann commented Nov 18, 2022

@voelzmo thanks for sharing the KEP. I wasn't aware of it, this does sound like the right approach going forward.

That said, I think there's value in driving this PR to completion regardless: the solution in the KEP will only be available in Kubernetes 1.26 in alpha. Users on older versions will still want to have some kind of solution (at least we do given a large portion of our fleet is affected by the problem regularly).
It also seems possible to transition from what this PR proposes to what the KEP outlines by checking for the new field in the PDB once available and honor its value with higher precedence. At least, that's how I understand it should work.

Just my 2 cents of course, final call is to be made by @jbartosik. Coincidentally, I have started working on extending the end-to-end tests as proposed just earlier today, so I could probably add the last missing piece fairly soon.

@timoreimann
Copy link
Contributor

timoreimann commented Nov 18, 2022

On second thought, the feature from this PR will likely only be backported so far (if at all), and / or more recent versions of VPA may be incompatible with older Kubernetes versions? So maybe there is only so much value users of older Kubernetes releases could get out of this PR?

I'm not super familiar with VPA's / CA's release policy and compatibility guarantees, so throwing out additional thoughts mostly. @jbartosik still to the rescue for more clarity and preferences. 🙂

EDIT: had to remind myself that VPA is versioned separately from CA and based on its own CRD, so perhaps my second thought concerns aren't as concerning after all, and continuing with the PR would still be beneficial.

@avorima
Copy link
Contributor

avorima commented Nov 18, 2022

Thanks for sharing @voelzmo, that looks promising.

This PR and the VPA KEP were just created because, at the time, it seemed like this was never going to make it into the PDB spec, so I would actually be fine with closing it now. The short time that it would be useful would probably be outweighed by the effort it takes to maintain it. Only speaking for myself of course.

@jbartosik
Copy link
Collaborator

I'm happy to use PDB features instead of working around PDB in VPA. AlwaysAllow policy sounds like it does what this was intended to do (from kubernetes/kubernetes#113375):

// AlwaysAllow policy means that all running pods (status.phase="Running"),
// but not yet healthy are considered disrupted and can be evicted regardless
// of whether the criteria in a PDB is met. This means perspective running
// pods of a disrupted application might not get a chance to become healthy.
// Healthy pods will be subject to the PDB for eviction.

@avorima
Copy link
Contributor

avorima commented Nov 22, 2022

Ok, what about the KEP? I'm not familiar with the process, but I suppose it needs to be removed or updated to document this decision.

@jbartosik
Copy link
Collaborator

Ok, what about the KEP? I'm not familiar with the process, but I suppose it needs to be removed or updated to document this decision.

I added an item about this to the next SIG meeting

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 21, 2023
@avorima
Copy link
Contributor

avorima commented Feb 21, 2023

/close

@k8s-ci-robot
Copy link
Contributor

@avorima: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VPA] Feature Request: Delete OOM Pods
7 participants