Could not get a CSINode object for the node #4811

80kk · 2022-04-12T08:43:40Z

Which component are you using?:
CA

cluster-autoscaler

What version of the component are you using?:
1.20.1

k8s.gcr.io/autoscaling/cluster-autoscaler:v1.20.1

Component version:

What k8s version are you using (kubectl version)?:

1.23.5

What environment is this in?:

AWS

Could someone please tell me what this error is about? I found sometimes that it takes ages for cluster to scale up and I am wondering if this is related somehow:

I0412 08:06:16.062769       1 scheduler_binder.go:775] Could not get a CSINode object for the node "template-node-for-nodes-a.domain.net-7982597919630627426-0": csinode.storage.k8s.io "template-node-for-nodes-a.domain.net-7982597919630627426-0" not found
I0412 08:06:16.062801       1 scheduler_binder.go:801] All bound volumes for Pod "namespace/pod-75b64dff96-99vxn" match with Node "template-node-for-nodes-a.domain.net-7982597919630627426-0"
I0412 08:06:16.062828       1 filter_out_schedulable.go:157] Pod namespace.pod-75b64dff96-99vxn marked as unschedulable can be scheduled on node template-node-for-nodes-a.domain.net-7982597919630627426-0. Ignoring in scale up.
I0412 08:06:16.063127       1 scheduler_binder.go:775] Could not get a CSINode object for the node "template-node-for-nodes-c.domain.net-4246696157256546175-0": csinode.storage.k8s.io "template-node-for-nodes-c.domain.net-4246696157256546175-0" not found
I0412 08:06:16.063143       1 scheduler_binder.go:801] All bound volumes for Pod "namespace/pod-64755c698f-ghcdt" match with Node "template-node-for-nodes-c.domain.net-4246696157256546175-0"
I0412 08:06:16.063166       1 filter_out_schedulable.go:157] Pod namespace.pod-64755c698f-ghcdt marked as unschedulable can be scheduled on node template-node-for-nodes-c.domain.net-4246696157256546175-0. Ignoring in scale up.

The thing is that in each node group there is a still place for the new nodes. At least 5 in each.

The text was updated successfully, but these errors were encountered:

JohnMops · 2022-06-08T14:30:22Z

Did you find what the issue was?

mohitreddy1996 · 2022-06-28T16:36:41Z

@80kk did you get around this issue? We started seeing this error recently

80kk · 2022-06-28T20:46:06Z

Cluster Autoscaler update has fixed the issue.

afirth · 2022-06-29T14:36:16Z

I think this happens when the pod requests a PVC on AWS (or others) that is not available in the AZ of the node. The real scheduler sees that this won't work, but the CAS "fake scheduler run" doesn't. After awhile CAS marks the node as underutilized, kills it, and scales up again. Eventually the scale-up node lands in the right AZ, and the pod is scheduled. On other providers which support multi-zone storage, this is not a problem.
solution - make a separate node group for each AZ.
caveat - scale to/from 0 is broken in default EKS. Workarounds and issue at aws/containers-roadmap#608

If CAS update really did fix it, I'm very interested in how. If it's caused by something else, feel free to chime in here. And, feel free to chat with your AWS AM about this. aws/containers-roadmap#608 and 724 have some of the most 👍 of all in the roadmap and aren't particularly hard to fix.

RicHincapie · 2022-07-14T22:00:35Z

I had a brand new AWS ASG scaled to 0 and had the same issue at deploy time. It was solved by manually scaling up. Afterwards, the CAS started working as expected.

decipher27 · 2022-09-12T06:13:34Z

What version of CA did the fix?
Could not get a CSINode object for the node "ip-10.xxx.x.xx..ap-south-1.compute.internal": csinode.storage.k8s.io "ip-10-xxx.xx-ap-south-1.compute.internal" not found

laxmanvallandas · 2022-12-01T16:54:40Z

Cluster Autoscaler update has fixed the issue.

Its unclear which version of autoscaler has the fix for this. We are using CA 1.23.1 and just hit this issue after updating k8s to 1.23
@80kk , Can you post the version?

KiranReddy230 · 2023-01-04T06:11:58Z

@80kk Can you please let us know which version is this fixed? We are facing simiar issue with CA 1.21.1 version and we are planning our EKS upgrade to 1.24 soon. similary we will need to update the CA version as well.

afirth · 2023-01-04T10:30:54Z

@ricarhincapie

I had a brand new AWS ASG scaled to 0 and had the same issue at deploy time. It was solved by manually scaling up. Afterwards, the CAS started working as expected.

It is my understanding that the CAS caches seen nodes, so it will be able to scale up from 0 until it restarts. Might be wrong

afirth · 2023-01-04T10:34:52Z

It seems this is fixed by #4491 in K8s 1.24+
aws/containers-roadmap#724 (comment)

Chili-Man · 2023-01-18T05:53:29Z

We're still observing this issue on AWS EKS 1.24 with Cluster Autoscaler 1.26.1

80kk · 2023-01-18T09:22:26Z

While the original reported issue was observed on KOPS provisioned Kubernetes cluster and I am now using EKS with Amazon EBS CSI Driver.

zentavr · 2023-06-15T21:22:24Z

What was the solution for this error?

bcouetil · 2023-06-24T16:34:56Z

I subscribed to this issue because I had the exact same error, but it was not linked to the CA ; it was linked to my lack of knowledge on AWS/EKS Terraform provider.

Configuring the addons correctly did the trick.

If it can help someone, I described my configuration in a blog post.

zentavr · 2023-06-25T10:37:04Z

@bcouetil what you do in your example is creating the node group only in one availability zone.

This is the same is how @afirth noticed in the commend here above .

bcouetil · 2023-06-25T12:36:52Z

That way of segregating node pools in zones is way older than the aws-ebs-csi-driver.

For as long as I can remember, at least 4 years, I've always done that, because scaling never worked 100% for multi-zones pools.

relaxdiego · 2023-08-10T08:32:13Z

In our case we found that CAS was trying to scale up a node group whose AZ can no longer allocate more of the specified instance type (c5n.metal in our case). The indicator for this kind of issue is that the Status of the node group will be "Degraded" and its Health Status tab will show something like:

Could not launch On-Demand Instances. InsufficientInstanceCapacity - We currently do not have sufficient c5n.metal capacity in the Availability Zone you requested (eu-central-1a). Our system will be working on provisioning additional capacity. You can currently get c5n.metal capacity by not specifying an Availability Zone in your request or choosing eu-central-1b, eu-central-1c. Launching EC2 instance failed.

80kk added the kind/bug Categorizes issue or PR as related to a bug. label Apr 12, 2022

jbartosik added the area/cluster-autoscaler label Apr 14, 2022

80kk closed this as completed Jun 28, 2022

dduportal mentioned this issue Feb 7, 2023

Bump the terraform module for AWS EKS (and consequences) jenkins-infra/helpdesk#3305

Closed

zentavr mentioned this issue Jun 16, 2023

Could not get a CSINode object for the node Error #5533

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not get a CSINode object for the node #4811

Could not get a CSINode object for the node #4811

80kk commented Apr 12, 2022

JohnMops commented Jun 8, 2022

mohitreddy1996 commented Jun 28, 2022

80kk commented Jun 28, 2022

afirth commented Jun 29, 2022 •

edited

Loading

RicHincapie commented Jul 14, 2022

decipher27 commented Sep 12, 2022 •

edited

Loading

laxmanvallandas commented Dec 1, 2022 •

edited

Loading

KiranReddy230 commented Jan 4, 2023

afirth commented Jan 4, 2023

afirth commented Jan 4, 2023

Chili-Man commented Jan 18, 2023

80kk commented Jan 18, 2023

zentavr commented Jun 15, 2023

bcouetil commented Jun 24, 2023

zentavr commented Jun 25, 2023

bcouetil commented Jun 25, 2023 •

edited

Loading

relaxdiego commented Aug 10, 2023

Could not get a CSINode object for the node #4811

Could not get a CSINode object for the node #4811

Comments

80kk commented Apr 12, 2022

JohnMops commented Jun 8, 2022

mohitreddy1996 commented Jun 28, 2022

80kk commented Jun 28, 2022

afirth commented Jun 29, 2022 • edited Loading

RicHincapie commented Jul 14, 2022

decipher27 commented Sep 12, 2022 • edited Loading

laxmanvallandas commented Dec 1, 2022 • edited Loading

KiranReddy230 commented Jan 4, 2023

afirth commented Jan 4, 2023

afirth commented Jan 4, 2023

Chili-Man commented Jan 18, 2023

80kk commented Jan 18, 2023

zentavr commented Jun 15, 2023

bcouetil commented Jun 24, 2023

zentavr commented Jun 25, 2023

bcouetil commented Jun 25, 2023 • edited Loading

relaxdiego commented Aug 10, 2023

afirth commented Jun 29, 2022 •

edited

Loading

decipher27 commented Sep 12, 2022 •

edited

Loading

laxmanvallandas commented Dec 1, 2022 •

edited

Loading

bcouetil commented Jun 25, 2023 •

edited

Loading