Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix targeted scale-in logic #215

Merged

Conversation

rishabh-11
Copy link

@rishabh-11 rishabh-11 commented May 8, 2023

What this PR does / why we need it:
This PR updates the priority of machines with nodes without ToBeDeletedTaint to 3 before calling DeleteMachines to scale down the corresponding machine deployment.

Which issue(s) this PR fixes:
Fixes #159

Special notes for your reviewer:
This PR does not deal with one case of #159. It is as follows:-

  1. All Nodes in a node group: [N1, N2, N3, N4]
    At T1, CA marked [N2, N4] with ToBeDeleted taint, added priority=1, and reduced the machine deployment replicas by 2.
    Then CA restarts, which calls cleanUpIfRequired, which removes all the taints and resets the priority of N2 and N4 to 3. By this point, if MCM has not started the deletion process, then it is possible that N2 and N4 are not selected for deletion because MCM selects machines to delete based on priority annotation value.

Following test cases are added for the DeleteNodes method:-

  1. Delete single node happy path.
  2. Delete single placeholder node happy path.
  3. Delete a node with an error in updating machine priority to 1 due to timeout.
  4. Delete a node with updateMcD error timeout.
  5. Delete a node with updateMcD error resolving b4 timeout.
  6. McD is under rolling update.
  7. Machine is already in a terminating state. No scale-down should happen.
  8. Should not scale down the McD beyond the minimum.
  9. Should not delete node if it does not belong to the machine deployment.

Following test cases are added for the Refresh method:-

  1. Reset the priority of a machine having a node without ToBeDeletedTaint to 3.
  2. Don't reset the priority of a machine having a node with ToBeDeletedTaint.
  3. Failure to reset the priority of a machine

The base cluster requirements for IT are updated.
The faking (for unit tests of mcm cloud provider implementation) is done for the client only. Listers are not faked

Release note:

A bug where MCM removed a machine other than the one , CA wanted , is resolved.
`machinepriority.machine.sapcloud.io` annotation on machine is now reset to 3 by autoscaler if the corresponding node doesn't have `ToBeDeletedByClusterAutoscaler` taint
Initial implementation for `Refresh()` method of `CloudProvider` interface done
unit tests framework introduced to test implemented methods of `Cloudprovider` and `Nodegroup` interface

@rishabh-11 rishabh-11 requested review from himanshu-kun, unmarshall and a team as code owners May 8, 2023 13:14
@gardener-robot gardener-robot added needs/review Needs review size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels May 8, 2023
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 8, 2023
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels May 8, 2023
@gardener-robot
Copy link

@unmarshall, @himanshu-kun You have pull request review open invite, please check

Copy link

@himanshu-kun himanshu-kun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kindly avoid using flag as it makes the code unreadable
also try to break the main code where you reset priorities into helper functions so its more readable.

Copy link

@himanshu-kun himanshu-kun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kindly avoid using flag as it makes the code unreadable
also try to break the main code where you reset priorities into helper functions so its more readable.

@gardener-robot gardener-robot added the needs/changes Needs (more) changes label May 11, 2023
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 16, 2023
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 16, 2023
@rishabh-11 rishabh-11 changed the title Correct the marking of machines for deletion logic Fix targeted scale-in logic May 16, 2023
@rishabh-11 rishabh-11 requested a review from himanshu-kun May 16, 2023 04:51
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 16, 2023
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 16, 2023
@gardener-robot gardener-robot added size/l Size of pull request is large (see gardener-robot robot/bots/size.py) needs/second-opinion Needs second review by someone else and removed size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels May 18, 2023
@rishabh-11 rishabh-11 requested a review from himanshu-kun May 18, 2023 11:12
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels May 18, 2023
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Aug 14, 2023
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Aug 14, 2023
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Aug 16, 2023
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Aug 16, 2023
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Aug 16, 2023
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Aug 16, 2023
Copy link

@unmarshall unmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/changes Needs (more) changes needs/review Needs review needs/second-opinion Needs second review by someone else labels Aug 17, 2023
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Aug 17, 2023
Copy link

@himanshu-kun himanshu-kun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@himanshu-kun himanshu-kun merged commit abc1666 into gardener:machine-controller-manager-provider Aug 17, 2023
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Aug 17, 2023
@rishabh-11 rishabh-11 deleted the fix-ca-2 branch April 2, 2024 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies reviewed/lgtm Has approval for merging reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MCM doesn't remove the machine which CA wants
7 participants