Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementation of generic scale from zero #102

Conversation

AxiomSamarth
Copy link

@AxiomSamarth AxiomSamarth commented Oct 12, 2021

What this PR does / why we need it:
This PR implements the generic scale from zero logic.

Which issue(s) this PR fixes:
Fixes #27

Special notes for your reviewer:

Release note:

Gardener Autoscaler now supports generic scale from zero. Traditionally, nodes are scaled by Autoscaler by generating node template for a new node that would accomodate the pending pods. This node template is generated either by referring to the node spec of the already existing node in the nodeGroup or by referring to the cloud provider specific catalogue which provides the node details like `cpu`, `gpu`, `memory`, `region`, `zone`, `instanceType` etc. The latter scenario is the scale from zero scenario where there are zero nodes in the `nodeGroup` that is expected to scale up and so far the existing logic limited this feature to AWS and Azure only.

Now, with the introduction of `nodeTemplate` property in the `MachineClass`, the scale from zero feature has been extended generically across all providers. 

@AxiomSamarth AxiomSamarth requested a review from a team as a code owner October 12, 2021 14:24
@gardener-robot gardener-robot added needs/review Needs review size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels Oct 12, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 12, 2021
@AxiomSamarth
Copy link
Author

The changes are tested with the below steps on AWS, OpenStack

  1. Create a Shoot with a worker pool spanning across multiple zones with minimum number of machines equal to 1 and maximum number of machines equal to number of zones the worker pool is spanning.
  2. Edit the MachineClasses on the Shoot by adding the expectedNodeAttributes field.
  3. Run the Cluster Autoscaler locally by trying to schedule pods that become unschedulable to witness the scale up happening in other zones which has machineDeployments with 0 replicas.

Copy link

@prashanth26 prashanth26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me.

  • However, improve the release notes
  • to provide more details on how to make use of it as end user
  • As well about deprecation of current scale from zero logic.
  • Finally adding some tests will help

/lgtm otherwise.
/needs second-opinion
@himanshu-kun

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review reviewed/lgtm Has approval for merging labels Oct 14, 2021
@gardener-robot
Copy link

@prashanth26 Label needs/@himanshu-kun does not exist.

@AxiomSamarth
Copy link
Author

@prashanth26 Sure, Prashanth. I will incorporate those suggestions.

@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 14, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 14, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 14, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 14, 2021
@gardener-robot
Copy link

@himanshu-kun, @hardikdr You have pull request review open invite, please check

Copy link

@himanshu-kun himanshu-kun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, looks good , I will test myself once as well. Adding a unit test would be good.

@gardener-robot gardener-robot added size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) needs/second-opinion Needs second review by someone else and removed size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels Nov 7, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 7, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 7, 2021
@AxiomSamarth
Copy link
Author

AxiomSamarth commented Nov 7, 2021

The machine catalogue is updated with the new data types. However, new files are created in the mcm package (as changing in the original location under aws and azure packages would lead to CA build failures). The changes in the mcm_manager.go has been done accordingly.

Tested for backward compatibility. Works fine.

The CRD & API changes from the new MCM release should be vendored in this PR. Post which the CA build will go through successfully.

Copy link

@himanshu-kun himanshu-kun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the neat PR Samarth
As suggested I think we can change variable name MemoryMb to Memory as it could be misleading . You might have to generate the catalogues again in that case.

Also as discussed by us of not fetching region and zone from expectedNodeAttributes now, I think those changes will get incorporated in your next push. Just commenting as a reminder

@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 24, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 24, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 24, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 24, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 30, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 30, 2021
Copy link

@himanshu-kun himanshu-kun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor change requested and discussion

@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Dec 6, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Dec 6, 2021
Copy link

@himanshu-kun himanshu-kun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the neat changes, just some minor picks to improve error and logging messages

@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Dec 9, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Dec 9, 2021
Copy link

@himanshu-kun himanshu-kun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Dec 17, 2021
@AxiomSamarth
Copy link
Author

Merging the changes as decided.

@AxiomSamarth AxiomSamarth merged commit 0f6f94c into gardener:machine-controller-manager-provider Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/second-opinion Needs second review by someone else size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generic Scale-to-Zero Support
7 participants