[aws-eks] ‼️ NOTICE: Service Accounts deleted upon upgrade from version 1.61.1 or below #10388
Labels
@aws-cdk/aws-eks
Related to Amazon Elastic Kubernetes Service
management/tracking
Issues that track a subject or multiple issues
Please add your +1 👍 to let us know you have encountered this
Status: MITIGATED
Overview:
If you are using version
1.61.1
and below, and are defining a service account as part of your cluster, using either:OR
That service account will be deleted upon upgrade.
We recommend you DO NOT upgrade until we provide a workaround.
The root cause is that in version
1.62.0
, we introduced a change that changes the logical ID of the service account construct. This was done to prevent possible circular dependencies when working with multiple stacks.The change to the logical ID causes CloudFormation to create a new resource, and delete the old one. What happens is that CloudFormation first invokes a
CREATE
on the new resource, which eventually translates into akubectl apply
, this command succeeds and effectively does nothing.CloudFormation then proceeds to execute a
DELETE
on the old resource, this translates into akubectl delete
command, since both the old and the new resource have the same physical name (i.e the name of the service account itself), this command now deletes the exact same resource that was previously "created".Complete Error Message:
No errors, but the existing service account will be deleted, which will cause dependent resources to malfunction.
Workaround:
There are two options, choose one or the either:
Then you can rename your service account. If you are not assigning a concrete name, specify the
name
property (any value would do in this case).This will cause the old service account to be deleted, but a new one will be created instead with a new name. Make sure you replace all hard coded references to the old service account name with the new one.
This is the recommended approach.
For python users: (can also be adopted to Java and C#)
For further safety measures, we recommend you run a
cdk diff
before deploying and see that the logical id has remained the same and the resource is not scheduled for replacement.Please let us know if the above workarounds are not suitable for you or for some reason do not work in your scenario.
We apologize for the inconvenience.
Solution:
We will be implementing a longer term solution to protect against these kind of situations: #10397
To resolve the current situation, the only solutions are to apply one the mentioned workarounds, as a revert is not possible since it would create the exact same problem in the reverse direction.
Considering this issue as mitigated.
Related Issues:
Originally reported in: #9701 (comment)
The text was updated successfully, but these errors were encountered: