-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: alicloud the function NodeGroupForNode return nil #6296
Conversation
Welcome @guopeng0! |
@x13n @BigDarkClown |
/label area/provider/alicloud |
@vadasambar: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Looks like I can't add the |
Just in case, I will do a first pass tomorrow. Thank you for the PR @guopeng0! The description you wrote is easy to understand. /assign vadasambar |
/lgtm @guopeng0 can you please squash the commits into a single commit? @x13n , @BigDarkClown can we get an approve on this PR 🙏 |
/LGTM |
@ringtail: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Thanks @guopeng0 . |
/approve (Based on @ringtail's lgtm above.) |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: guopeng0, vadasambar, x13n The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
I noticed an issue in the implementation of the NodeGroupForNode function in the alicloud provider of the cluster-autoscaler project. This issue results in the function sometimes returning nil for nodes in a specific nodeGroup.
Currently, the implementation regenerates the cache data and calls Alibaba Cloud to query all instances when a specified instance is not found. However, the DescribeScalingInstances function has default values of pageSize=10 and pageNumber=1, which only returns a subset of the data.
To address this issue, I have made the following modifications:
Modify the code logic to perform multiple queries based on the TotalCount of response information, ensuring the complete set of instances in the nodeGroup is returned.
I am a frequent user of cluster-autoscaler and have thoroughly tested and validated these modifications on Alibaba Cloud.
The error in this function can lead to inaccurate nodeGroup healthy status, resulting in scaling issues.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
NONE
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: