[aws-eks] ‼️ NOTICE: Service Accounts deleted upon upgrade from version 1.61.1 or below #10388

iliapolo · 2020-09-16T11:29:29Z

Please add your +1 👍 to let us know you have encountered this

Status: MITIGATED

Overview:

If you are using version 1.61.1 and below, and are defining a service account as part of your cluster, using either:

cluster.addServiceAccount(...)

OR

new ServiceAccount(...)

That service account will be deleted upon upgrade.

We recommend you DO NOT upgrade until we provide a workaround.

Note if your cluster has been created with version 1.62.0 or higher, it is safe to continue upgrading.

The root cause is that in version 1.62.0, we introduced a change that changes the logical ID of the service account construct. This was done to prevent possible circular dependencies when working with multiple stacks.

The change to the logical ID causes CloudFormation to create a new resource, and delete the old one. What happens is that CloudFormation first invokes a CREATE on the new resource, which eventually translates into a kubectl apply, this command succeeds and effectively does nothing.

CloudFormation then proceeds to execute a DELETE on the old resource, this translates into a kubectl delete command, since both the old and the new resource have the same physical name (i.e the name of the service account itself), this command now deletes the exact same resource that was previously "created".

Complete Error Message:

No errors, but the existing service account will be deleted, which will cause dependent resources to malfunction.

Workaround:

There are two options, choose one or the either:

If you are assigning a concrete name to your service account via:

cluster.addServiceAccount('ServiceAccountId', {
  // assigning a concrete name
  name: 'serviceAccountName'
})

Then you can rename your service account. If you are not assigning a concrete name, specify the name property (any value would do in this case).

This will cause the old service account to be deleted, but a new one will be created instead with a new name. Make sure you replace all hard coded references to the old service account name with the new one.

This is the recommended approach.

If the first approach is not applicable, add the following code to your application, this will revert the service account logical id to its earlier value, making it so that a replacement is not necessary:

// skip these imports if you already have them 
import * as cdk from '@aws-cdk/core';
import * as eks from '@aws-cdk/aws-eks';

// importing the logical id generator
import { makeUniqueId } from '@aws-cdk/core/lib/private/uniqueid'

const serviceAccount = cluster.addServiceAccount(...);
revertLogicalIdToV1_61_1(serviceAccount);

function revertLogicalIdToV1_61_1(serviceAccount: eks.ServiceAccount) {

    // const theElement = serviceAccount.node.findAll().filter(c => c.node.id.includes('ServiceAccountResource'))[0].node.children[0].node.children[0] as cdk.CfnResource
    const theElement = serviceAccount.node.findAll().filter(c => c.node.id === `manifest-${serviceAccount.node.id}ServiceAccountResource`)[0].node.defaultChild!.node.defaultChild! as cdk.CfnResource
    const scopes = theElement.node.scopes;
    const stackIndex = scopes.indexOf(theElement.stack);
    const pathComponents = scopes.slice(stackIndex + 1).map(x => x.node.id)
      // filter this path because it wasn't a part of the older logical ids
      .filter(id => id !== serviceAccount.node.id);

    theElement.overrideLogicalId(makeUniqueId(pathComponents))

}

For python users: (can also be adopted to Java and C#)

from aws_cdk import core
from aws_cdk import aws_eks as eks


class EksPythonStack(core.Stack):

    def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)

        # The code that defines your stack goes here
        cluster = eks.Cluster(self, 'Cluster', version=eks.KubernetesVersion.V1_17)
        service_account = cluster.add_service_account('account', name='myaccount')

        self.revert_logical_id_to_v1_61_1(service_account)


    def revert_logical_id_to_v1_61_1(self, service_account):
      the_element = list(filter(lambda c: c.node.id == f'manifest-{service_account.node.id}ServiceAccountResource', service_account.node.find_all()))[0].node.default_child.node.default_child

      # filter out the service account scope (it didn't exist prior to 1.62.0)
      old_scopes = list(filter(lambda s: s.node.id != service_account.node.id, the_element.node.scopes))

      # recreate the CfnElement
      old_element = core.Stack(core.App(), self.stack_name)
      for s in old_scopes[2:]:
        old_element = core.CfnElement(old_element, s.node.id)

      # generate a logical id from the old element and override
      the_element.override_logical_id(self.get_logical_id(old_element))

For further safety measures, we recommend you run a cdk diff before deploying and see that the logical id has remained the same and the resource is not scheduled for replacement.

Please let us know if the above workarounds are not suitable for you or for some reason do not work in your scenario.

We apologize for the inconvenience.

Solution:

We will be implementing a longer term solution to protect against these kind of situations: #10397

To resolve the current situation, the only solutions are to apply one the mentioned workarounds, as a revert is not possible since it would create the exact same problem in the reverse direction.

Considering this issue as mitigated.

Related Issues:

Originally reported in: #9701 (comment)

The text was updated successfully, but these errors were encountered:

iliapolo · 2020-09-23T20:25:18Z

We consider this issue resolved by applying the proposed workaround.

stefanolczak · 2020-09-24T07:17:33Z

@iliapolo can you please share code for second workaround in python? I was able to port most of it but the problem is with calling makeUniqueId() which has no jsii bindings generated. First workaround which is renaming service account is not a option for us

iliapolo · 2020-09-24T10:46:36Z

@stefanolczak Thanks for the callout. I've added the python code you can use.

Please let us know if that worked for you or not.

iliapolo added bug This issue is a bug. management/tracking Issues that track a subject or multiple issues p0 @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service labels Sep 16, 2020

github-actions bot assigned iliapolo Sep 16, 2020

njlynch pinned this issue Sep 16, 2020

This was referenced Sep 16, 2020

fix(eks): creating a ServiceAccount in a different stack than the Cluster creates circular dependency between the two stacks #9701

Merged

[aws-eks] Replacement of KubernetesManifest resources results in a deletion, without creation #10397

Closed

iliapolo removed bug This issue is a bug. p0 labels Sep 17, 2020

iliapolo unpinned this issue Sep 23, 2020

iliapolo pinned this issue Sep 23, 2020

iliapolo closed this as completed Sep 23, 2020

iliapolo unpinned this issue Nov 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aws-eks] ‼️ NOTICE: Service Accounts deleted upon upgrade from version 1.61.1 or below #10388

[aws-eks] ‼️ NOTICE: Service Accounts deleted upon upgrade from version 1.61.1 or below #10388

iliapolo commented Sep 16, 2020 •

edited

Loading

iliapolo commented Sep 23, 2020

stefanolczak commented Sep 24, 2020 •

edited

Loading

iliapolo commented Sep 24, 2020

[aws-eks] ‼️ NOTICE: Service Accounts deleted upon upgrade from version 1.61.1 or below #10388

[aws-eks] ‼️ NOTICE: Service Accounts deleted upon upgrade from version 1.61.1 or below #10388

Comments

iliapolo commented Sep 16, 2020 • edited Loading

Status: MITIGATED

Overview:

Complete Error Message:

Workaround:

Solution:

Related Issues:

iliapolo commented Sep 23, 2020

stefanolczak commented Sep 24, 2020 • edited Loading

iliapolo commented Sep 24, 2020

iliapolo commented Sep 16, 2020 •

edited

Loading

stefanolczak commented Sep 24, 2020 •

edited

Loading