-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Fix deletion priority to avoid deleting too many machines #10087
Conversation
Welcome @ctrox! |
Hi @ctrox. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
internal/controllers/machineset/machineset_delete_policy_test.go
Outdated
Show resolved
Hide resolved
13fe3d0
to
e0ee1dd
Compare
sgtm, but let's wait for others opinion |
@@ -35,6 +35,7 @@ type ( | |||
|
|||
const ( | |||
mustDelete deletePriority = 100.0 | |||
shouldDelete deletePriority = 75.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this might be a fine change on its own, regardless of the cause, could we have a reproducible test that shows the original bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically the test that I have added already shows the bug when ran against current main. It will fail because the nodeHealthyConditionUnknownMachine
would be picked by getMachinesToDeletePrioritized
instead of the expected mustDeleteMachine
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ctrox Would it be, or would it be random? (we're not using a stable sort)
EDIT: My bad, I missed the alphabetic sort
EDIT2: Hm but we're not setting the names in the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm looking at the test and our code. I don't see how the current code would alwas deterministically lead to nodeHealthyConditionUnknownMachine being picked over mustDeleteMachine.
Am I missing something?
I think in general the root cause of your issue is that multiple cases got the same priority and because of that we ended up with a alphabetical sort instead of one based on those priorities.
I think it would be fine to cover this with an additional test where we set the names in a way which would result in a different order (compared to the priority). And then let's add a short godoc comment to the test case explaining this.
I think then we're good to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the test I added previously did not take the name into account. I now added two cases with different names and one of them fails against current main since it would pick the unhealthyMachineA
instead of the mustDeleteMachine
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thx!
This introduces an additional deletePriority between betterDelete and mustDelete to avoid a race where multiple machines might be deleted at the same time.
/lgtm |
LGTM label has been added. Git tree hash: af94e45be79c13944e50fe9368fab772a47a667a
|
/cherry-pick release-1.7 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sbueringer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.7 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-1.6 |
@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.6 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-1.5 |
@sbueringer: once the present PR merges, I will cherry-pick it on top of release-1.5 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sbueringer: new pull request created: #10429 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sbueringer: new pull request created: #10430 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sbueringer: new pull request created: #10431 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This introduces an additional deletePriority between betterDelete and mustDelete to avoid a race where multiple machines might be deleted at the same time.
What this PR does / why we need it:
This implements (or tries to) what has been discussed here.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #9334
/area machineset