Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Subnet validation fails on previously valid subnet configuration (<= 0.176) with "Error: all private subnets from [az], that the cluster was originally created on, have been deleted" #7785

Closed
fbuchmeier-abi opened this issue May 28, 2024 · 3 comments · Fixed by #7816
Assignees
Labels
area/managed-nodegroup EKS Managed Nodegroups area/nodegroup kind/bug priority/important-soon Ideally to be resolved in time for the next release

Comments

@fbuchmeier-abi
Copy link

fbuchmeier-abi commented May 28, 2024

What were you trying to accomplish?

I am trying to create new private managed node groups
with eksctl version > 0.176 in an existing VPC with existing subnets.

What happened?

Private node group creation fails when the keys for the existing subnets do not match the names of the availability zones specified in the nodegroups.

This issue happens since eksctl 0.177.

How to reproduce it?

  1. Create a new cluster with eksctl v0.176 and the following configuration (snippet). The cluster uses subnets in an existing VPC:

    vpc:
    [...]
      subnets:
        private:
          1:
            id: subnet-08734d6dee15f6def
          2:
            id: subnet-07a9fb4dc5dac53d5
          3:
            id: subnet-0dcd7ace42147d557
        public:
          1:
            id: subnet-0a09bea21e2304c03
          2:
            id: subnet-087489fd1da3e59f1
          3:
            id: subnet-0ad2d7b673c484024
    [...]
    managedNodeGroups:
      - name: old-orca
        instanceName: old-orca
        instanceType: t3a.xlarge
        minSize: 1
        maxSize: 3
        availabilityZones: ['eu-central-1a']
        privateNetworking: true
    [...]
  2. Install eksctl v0.177 (or newer) and try to add a new node group with the same configuration but a different name.

Logs

eksctl create nodegroup --config-file /tmp/config.3_4e3_1y

Error: all private subnets from eu-central-1a, that the cluster was originally created on, have been deleted; to create private nodegroups within eu-central-1a please manually set valid private subnets via nodeGroup.SubnetIDs'

2024-05-28 10:26:43 [ℹ]  nodegroup "old-orca" will use "ami-0a3ee3d1e25e0daa8" [AmazonLinux2/1.28]
2024-05-28 10:26:43 [ℹ]  nodegroup "doozy-dodo" will use "ami-0a3ee3d1e25e0daa8" [AmazonLinux2/1.28]
2024-05-28 10:26:43 [ℹ]  nodegroup "kind-kodiak" will use "ami-0a3ee3d1e25e0daa8" [AmazonLinux2/1.28]

Anything else we need to know?

It looks like a new validation was introduced in 00934fd and #7714 which checks if there are (is?) subnets under a given availability zone key:

			if _, ok := spec.VPC.Subnets.Private[az]; !ok && ng.PrivateNetworking {
				return unavailableSubnetsErr(az)
			}

In our case, subnets are named differently (and at this time we do not have the information which subnet is in which AZ). This has been working properly with eksctl <= 0.176 and broke with eksctl 0.177.

Versions

Working up to eksctl v0.176
Broken since eksctl v0.177

@wind0r
Copy link
Contributor

wind0r commented May 28, 2024

We are encountering the same issue.

Previously, our custom subnet names worked fine with versions before v0.177. We tested adding the 'az' attribute to the subnet definition, but this did not resolve the issue. To address this, we need to rename the subnets to eu-central-1a, eu-central-1b, and eu-central-1c.

Looks like the new check uses the map index instead of the az attribute, also eksctl could use the id to detect the az.

https://eksctl.io/usage/vpc-subnet-settings/ states that az is optional and the key doesnt need to be the az value

e.g:

  subnets:
    private:
      broken-nodes-a:
        id: subnet-aaa
        az: eu-central-1a
      broken-nodes-b:
        id: subnet-bbb
        az: eu-central-1b
      broken-nodes-c:
        id: subnet-ccc
        az: eu-central-1c

@TiberiuGC
Copy link
Collaborator

Hi @fbuchmeier-abi @wind0r , thank you for raising this issue!

We'll have to update the validation to use the AZs resolved by the EC2::DescribeSubnets call, instead of using the keys provided in the config file.

@AndrewFarley
Copy link

Note: If anyone lands here because of the error named in the title of this issue, upgrade to 0.183 or greater and the problem goes away. Note: I was on 0.180 with this issue, simply upgrading fixes it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/managed-nodegroup EKS Managed Nodegroups area/nodegroup kind/bug priority/important-soon Ideally to be resolved in time for the next release
Projects
None yet
4 participants