Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-eks] ‼️ NOTICE: Service Accounts deleted upon upgrade from version 1.61.1 or below #10388

Closed
iliapolo opened this issue Sep 16, 2020 · 3 comments
Assignees
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service management/tracking Issues that track a subject or multiple issues

Comments

@iliapolo
Copy link
Contributor

iliapolo commented Sep 16, 2020

Please add your +1 👍 to let us know you have encountered this


Status: MITIGATED

Overview:

If you are using version 1.61.1 and below, and are defining a service account as part of your cluster, using either:

cluster.addServiceAccount(...)

OR

new ServiceAccount(...)

That service account will be deleted upon upgrade.

We recommend you DO NOT upgrade until we provide a workaround.

Note if your cluster has been created with version 1.62.0 or higher, it is safe to continue upgrading.

The root cause is that in version 1.62.0, we introduced a change that changes the logical ID of the service account construct. This was done to prevent possible circular dependencies when working with multiple stacks.

The change to the logical ID causes CloudFormation to create a new resource, and delete the old one. What happens is that CloudFormation first invokes a CREATE on the new resource, which eventually translates into a kubectl apply, this command succeeds and effectively does nothing.

CloudFormation then proceeds to execute a DELETE on the old resource, this translates into a kubectl delete command, since both the old and the new resource have the same physical name (i.e the name of the service account itself), this command now deletes the exact same resource that was previously "created".

Complete Error Message:

No errors, but the existing service account will be deleted, which will cause dependent resources to malfunction.

Workaround:

There are two options, choose one or the either:

  1. If you are assigning a concrete name to your service account via:
cluster.addServiceAccount('ServiceAccountId', {
  // assigning a concrete name
  name: 'serviceAccountName'
})

Then you can rename your service account. If you are not assigning a concrete name, specify the name property (any value would do in this case).

This will cause the old service account to be deleted, but a new one will be created instead with a new name. Make sure you replace all hard coded references to the old service account name with the new one.

This is the recommended approach.

  1. If the first approach is not applicable, add the following code to your application, this will revert the service account logical id to its earlier value, making it so that a replacement is not necessary:
// skip these imports if you already have them 
import * as cdk from '@aws-cdk/core';
import * as eks from '@aws-cdk/aws-eks';

// importing the logical id generator
import { makeUniqueId } from '@aws-cdk/core/lib/private/uniqueid'

const serviceAccount = cluster.addServiceAccount(...);
revertLogicalIdToV1_61_1(serviceAccount);

function revertLogicalIdToV1_61_1(serviceAccount: eks.ServiceAccount) {

    // const theElement = serviceAccount.node.findAll().filter(c => c.node.id.includes('ServiceAccountResource'))[0].node.children[0].node.children[0] as cdk.CfnResource
    const theElement = serviceAccount.node.findAll().filter(c => c.node.id === `manifest-${serviceAccount.node.id}ServiceAccountResource`)[0].node.defaultChild!.node.defaultChild! as cdk.CfnResource
    const scopes = theElement.node.scopes;
    const stackIndex = scopes.indexOf(theElement.stack);
    const pathComponents = scopes.slice(stackIndex + 1).map(x => x.node.id)
      // filter this path because it wasn't a part of the older logical ids
      .filter(id => id !== serviceAccount.node.id);

    theElement.overrideLogicalId(makeUniqueId(pathComponents))

}

For python users: (can also be adopted to Java and C#)

from aws_cdk import core
from aws_cdk import aws_eks as eks


class EksPythonStack(core.Stack):

    def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)

        # The code that defines your stack goes here
        cluster = eks.Cluster(self, 'Cluster', version=eks.KubernetesVersion.V1_17)
        service_account = cluster.add_service_account('account', name='myaccount')

        self.revert_logical_id_to_v1_61_1(service_account)


    def revert_logical_id_to_v1_61_1(self, service_account):
      the_element = list(filter(lambda c: c.node.id == f'manifest-{service_account.node.id}ServiceAccountResource', service_account.node.find_all()))[0].node.default_child.node.default_child

      # filter out the service account scope (it didn't exist prior to 1.62.0)
      old_scopes = list(filter(lambda s: s.node.id != service_account.node.id, the_element.node.scopes))

      # recreate the CfnElement
      old_element = core.Stack(core.App(), self.stack_name)
      for s in old_scopes[2:]:
        old_element = core.CfnElement(old_element, s.node.id)

      # generate a logical id from the old element and override
      the_element.override_logical_id(self.get_logical_id(old_element))

For further safety measures, we recommend you run a cdk diff before deploying and see that the logical id has remained the same and the resource is not scheduled for replacement.

Please let us know if the above workarounds are not suitable for you or for some reason do not work in your scenario.

We apologize for the inconvenience.

Solution:

We will be implementing a longer term solution to protect against these kind of situations: #10397

To resolve the current situation, the only solutions are to apply one the mentioned workarounds, as a revert is not possible since it would create the exact same problem in the reverse direction.

Considering this issue as mitigated.

Related Issues:

Originally reported in: #9701 (comment)

@iliapolo iliapolo added bug This issue is a bug. management/tracking Issues that track a subject or multiple issues p0 @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service labels Sep 16, 2020
@njlynch njlynch pinned this issue Sep 16, 2020
@iliapolo iliapolo removed bug This issue is a bug. p0 labels Sep 17, 2020
@iliapolo iliapolo unpinned this issue Sep 23, 2020
@iliapolo iliapolo pinned this issue Sep 23, 2020
@iliapolo
Copy link
Contributor Author

We consider this issue resolved by applying the proposed workaround.

@stefanolczak
Copy link

stefanolczak commented Sep 24, 2020

@iliapolo can you please share code for second workaround in python? I was able to port most of it but the problem is with calling makeUniqueId() which has no jsii bindings generated. First workaround which is renaming service account is not a option for us

@iliapolo
Copy link
Contributor Author

@stefanolczak Thanks for the callout. I've added the python code you can use.

Please let us know if that worked for you or not.

@iliapolo iliapolo unpinned this issue Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service management/tracking Issues that track a subject or multiple issues
Projects
None yet
Development

No branches or pull requests

2 participants