Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoids panics when VM type isn't found during scale from zero #77

Conversation

prashanth26
Copy link

@prashanth26 prashanth26 commented May 11, 2021

  • Cluster autoscaler was trying to fetch VM details for AWS VMs instead of required Azure VMs for MCM (OOT) provider Azure. This lead to panics. Now, it fetches the details from the correct provider VM map.
  • This PR also improves error handling in such scenarios to not panic.

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #76

Special notes for your reviewer:

Release note:

Avoids panics when VM type isn't found during scale from zero
Fetches the VM from the correct map for MCM provider Azure and hence doesn't panic anymore

- Cluster autoscaler was trying to fetch VM details for AWS VMs instead of required Azure VMs for MCM (OOT) provider Azure. This lead to panics. Now, it fetches the details from the correct provider VM map.
- This PR also improves error handling in such scenarios to not panic.
@prashanth26 prashanth26 requested a review from hardikdr as a code owner May 11, 2021 08:41
@gardener-robot gardener-robot added needs/review Needs review size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) labels May 11, 2021
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label May 11, 2021
@prashanth26
Copy link
Author

/invite @AxiomSamarth @ialidzhikov

@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels May 11, 2021
@@ -681,8 +690,10 @@ func (m *McmManager) GetMachineDeploymentNodeTemplate(machinedeployment *Machine
if err != nil {
return nil, fmt.Errorf("Unable to convert from %s to %s for %s, Error: %v", kindMachineClass, providerAzure, machinedeployment.Name, err)
}

azureInstance := aws.InstanceTypes[providerSpec.Properties.HardwareProfile.VMSize]
azureInstance, exists := azure.InstanceTypes[providerSpec.Properties.HardwareProfile.VMSize]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key change causing the panic is here, where it was fetching AWS details for Azure VMs.

Copy link
Member

@ialidzhikov ialidzhikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review labels May 11, 2021
@ialidzhikov
Copy link
Member

@prashanth26 , can you please file a cherry-pick and cut a v1.16.1 release? Thank you in advance!

@prashanth26
Copy link
Author

@prashanth26 , can you please file a cherry-pick and cut a v1.16.1 release? Thank you in advance!

Yes, on it. Thanks.

@prashanth26
Copy link
Author

prashanth26 commented May 11, 2021

Also tested this change locally. Scale from zero seems to work as before/expected now. No more panics.

I0511 08:53:08.279888       1 scale_up.go:460] Estimated 1 nodes needed in shoot--xxx--yyy-scale0-z3
I0511 08:53:08.279919       1 scale_up.go:574] Final scale-up plan: [{shoot--xxx--yyy-scale0-z3 0->1 (max: 2)}]
I0511 08:53:08.279935       1 scale_up.go:663] Scale-up: setting group shoot--xxx--yyy-scale0-z3 size to 1

@prashanth26 prashanth26 merged commit 6633475 into gardener:machine-controller-manager-provider May 11, 2021
@prashanth26 prashanth26 deleted the bugfix/azure-oot-panic branch May 11, 2021 09:07
ialidzhikov pushed a commit to ialidzhikov/autoscaler that referenced this pull request May 23, 2021
…er#77)

- Cluster autoscaler was trying to fetch VM details for AWS VMs instead of required Azure VMs for MCM (OOT) provider Azure. This lead to panics. Now, it fetches the details from the correct provider VM map.
- This PR also improves error handling in such scenarios to not panic.
ialidzhikov pushed a commit to ialidzhikov/autoscaler that referenced this pull request Jul 14, 2021
…er#77)

- Cluster autoscaler was trying to fetch VM details for AWS VMs instead of required Azure VMs for MCM (OOT) provider Azure. This lead to panics. Now, it fetches the details from the correct provider VM map.
- This PR also improves error handling in such scenarios to not panic.
prashanth26 added a commit that referenced this pull request Jul 14, 2021
)

- Cluster autoscaler was trying to fetch VM details for AWS VMs instead of required Azure VMs for MCM (OOT) provider Azure. This lead to panics. Now, it fetches the details from the correct provider VM map.
- This PR also improves error handling in such scenarios to not panic.

Co-authored-by: Prashanth <prashanth@sap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic: runtime error: invalid memory address or nil pointer dereference
6 participants