Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-ec2: cdk ignores ReplaceAndDelete always policy in the diff output #32472

Open
1 task
rantoniuk opened this issue Dec 11, 2024 · 6 comments
Open
1 task
Labels
@aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud bug This issue is a bug. p2

Comments

@rantoniuk
Copy link

Describe the bug

Bear with me, as I'm not sure whether this is actually on CloudFormation or CDK side - or in the middle.

I have a Vpc-Stack that is defined as follows, that was previously deployed successfully:

import * as cdk from 'aws-cdk-lib';
import * as ec2 from 'aws-cdk-lib/aws-ec2';

export class VpcStack extends cdk.Stack {

  readonly vpc: ec2.Vpc;
  readonly dbSecurityGroup: ec2.SecurityGroup;
  readonly egressSubnets: ec2.SubnetSelection;
  readonly isolatedSubnets: ec2.SubnetSelection;
  readonly publicSubnets: ec2.SubnetSelection;

  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    this.vpc = new ec2.Vpc(this, 'Vpc', {
      maxAzs: 2,
      subnetConfiguration: [
        { cidrMask: 24, name: 'Isolated', subnetType: ec2.SubnetType.PRIVATE_ISOLATED },
        { cidrMask: 24, name: 'Public', subnetType: ec2.SubnetType.PUBLIC },
        { cidrMask: 24, name: 'Egress', subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
      ],
      gatewayEndpoints: {
        S3: {
          service: ec2.GatewayVpcEndpointAwsService.S3,
        },
      },
    });

    this.isolatedSubnets = this.vpc.selectSubnets({ subnetType: ec2.SubnetType.PRIVATE_ISOLATED });
    this.egressSubnets = this.vpc.selectSubnets({ subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS });
    this.publicSubnets = this.vpc.selectSubnets({ subnetType: ec2.SubnetType.PUBLIC });

    // Create a Security Group for the db subnet
    this.dbSecurityGroup = new ec2.SecurityGroup(this, 'DbSecurityGroup', {
      // securityGroupName: 'DbSecurityGroup',
      vpc: this.vpc,
      description: 'Security group for DB subnet',
    });

    const bastionHostSG = new ec2.SecurityGroup(this, 'BastionHostSG', {
      securityGroupName: 'BastionSecurityGroup',
      vpc: this.vpc,
      description: 'Security group for BastionHosts',
    });

    new ec2.BastionHostLinux(this, 'BastionHost', {
      vpc: this.vpc,
      requireImdsv2: true,
      securityGroup: bastionHostSG,
    });

    // Allow application traffic to DB subnets
    this.egressSubnets.subnets!.forEach((subnet) => {
      this.dbSecurityGroup.addIngressRule(ec2.Peer.ipv4(subnet.ipv4CidrBlock), ec2.Port.tcp(5432), 'Allow internal PostgreSQL traffic');
      this.dbSecurityGroup.addIngressRule(ec2.Peer.ipv4(subnet.ipv4CidrBlock), ec2.Port.tcp(6379), 'Allow internal Redis traffic');
    });

  }
}

Now, today, cdk diff showed only CDK metadata that would need a REPLACE:

Stack Vpc-Stack
Conditions
[~] Condition CDKMetadata/Condition CDKMetadataAvailable: {"Fn::Or":[{"Fn::Or":[{"Fn::Equals":[{"Ref":"AWS::Region"},"af-south-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-east-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-northeast-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-northeast-2"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-south-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-southeast-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-southeast-2"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ca-central-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"cn-north-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"cn-northwest-1"]}]},{"Fn::Or":[{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-central-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-north-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-south-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-west-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-west-2"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-west-3"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"il-central-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"me-central-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"me-south-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"sa-east-1"]}]},{"Fn::Or":[{"Fn::Equals":[{"Ref":"AWS::Region"},"us-east-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"us-east-2"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"us-west-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"us-west-2"]}]}]} to {"Fn::Or":[{"Fn::Or":[{"Fn::Equals":[{"Ref":"AWS::Region"},"af-south-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-east-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-northeast-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-northeast-2"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-northeast-3"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-south-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-south-2"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-southeast-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-southeast-2"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-southeast-3"]}]},{"Fn::Or":[{"Fn::Equals":[{"Ref":"AWS::Region"},"ap-southeast-4"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ca-central-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"ca-west-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"cn-north-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"cn-northwest-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-central-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-central-2"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-north-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-south-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-south-2"]}]},{"Fn::Or":[{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-west-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-west-2"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"eu-west-3"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"il-central-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"me-central-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"me-south-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"sa-east-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"us-east-1"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"us-east-2"]},{"Fn::Equals":[{"Ref":"AWS::Region"},"us-west-1"]}]},{"Fn::Equals":[{"Ref":"AWS::Region"},"us-west-2"]}]}

start: Building 2575bd4cf5e3931a406f6c71ad10f7e7989285d82ef405e8f330cdc868196e97:current_account-current_region
success: Built 2575bd4cf5e3931a406f6c71ad10f7e7989285d82ef405e8f330cdc868196e97:current_account-current_region
start: Publishing 2575bd4cf5e3931a406f6c71ad10f7e7989285d82ef405e8f330cdc868196e97:current_account-current_region
success: Published 2575bd4cf5e3931a406f6c71ad10f7e7989285d82ef405e8f330cdc868196e97:current_account-current_region

However, when doing a subsequent cdk deploy, CloudFormation suddenly started to replace an EC2 instance:


Vpc-Stack
Vpc-Stack: deploying... [1/5]
Vpc-Stack: creating CloudFormation changeset...
Vpc-Stack | 0/4 | 12:41:01 PM | UPDATE_IN_PROGRESS   | AWS::CloudFormation::Stack            | Vpc-Stack User Initiated
Vpc-Stack | 0/4 | 12:41:05 PM | UPDATE_IN_PROGRESS   | AWS::CDK::Metadata                    | CDKMetadata/Default (CDKMetadata) 
Vpc-Stack | 1/4 | 12:41:06 PM | UPDATE_COMPLETE      | AWS::CDK::Metadata                    | CDKMetadata/Default (CDKMetadata) 
Vpc-Stack | 1/4 | 12:41:07 PM | UPDATE_IN_PROGRESS   | AWS::EC2::Instance                    | BastionHost/Resource (BastionHost30F9ED05) Requested update requires the creation of a new physical resource; hence creating one.
Vpc-Stack | 1/4 | 12:41:08 PM | UPDATE_IN_PROGRESS   | AWS::EC2::Instance                    | BastionHost/Resource (BastionHost30F9ED05) Resource creation Initiated
Vpc-Stack | 2/4 | 12:41:29 PM | UPDATE_COMPLETE      | AWS::EC2::Instance                    | BastionHost/Resource (BastionHost30F9ED05) 
Vpc-Stack | 3/4 | 12:41:31 PM | UPDATE_COMPLETE_CLEA | AWS::CloudFormation::Stack            | Vpc-Stack 
Vpc-Stack | 3/4 | 12:41:32 PM | DELETE_IN_PROGRESS   | AWS::EC2::Instance                    | BastionHost/Resource (BastionHost30F9ED05) 
3/4 Currently in progress: Vpc-Stack, BastionHost30F9ED05
Vpc-Stack | 2/4 | 12:42:54 PM | DELETE_COMPLETE      | AWS::EC2::Instance                    | BastionHost/Resource (BastionHost30F9ED05) 
Vpc-Stack | 3/4 | 12:42:54 PM | UPDATE_COMPLETE      | AWS::CloudFormation::Stack            | Vpc-Stack 

 ✅  Vpc-Stack

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

cdk diff should show accurate resource re-creation information.

Current Behavior

cdk diff did not show that an EC2 would be replaced during deployment.

Reproduction Steps

Code above.

Possible Solution

No response

Additional Information/Context

npm list
cdk@0.1.0 
├── @eslint/js@9.16.0
├── @stylistic/eslint-plugin@2.12.0
├── @types/babel__traverse@7.20.6
├── @types/eslint__js@8.42.3
├── @types/js-yaml@4.0.9
├── @types/node@20.17.9
├── @typescript-eslint/eslint-plugin@8.18.0
├── @typescript-eslint/parser@8.18.0
├── aws-cdk-lib@2.172.0
├── aws-cdk@2.172.0
├── cdk-nag@2.34.21
├── cloudwatch-retention-setter@0.0.15
├── constructs@10.4.2
├── eslint-import-resolver-typescript@3.7.0
├── eslint-plugin-import@2.31.0
├── eslint@9.16.0
├── js-yaml@4.1.0
├── source-map-support@0.5.21
├── typescript-eslint@8.18.0
└── typescript@5.7.2

CDK CLI Version

2.172.0 (build 0f666c5)

Framework Version

No response

Node.js Version

v22.11.0

OS

MacOS

Language

TypeScript

Language Version

No response

Other information

No response

@rantoniuk rantoniuk added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 11, 2024
@github-actions github-actions bot added the @aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud label Dec 11, 2024
@rantoniuk rantoniuk changed the title ec2: inaccurate cdk diff output aws-ec2: inaccurate cdk diff output Dec 11, 2024
@ashishdhingra
Copy link
Contributor

@rantoniuk Good afternoon. Thanks for opening the issue. Unfortunately, using the same stack at my end with CDK version 2.172.0 (build 0f666c5), I do not see any differences when executing cdk diff, even for the CDK Metadata. Could you please share:

  • If you updated CDK version to 2.172.0 recently when the issue occurred! If yes, what was the last CDK version used?
  • There was a recent commit bf77e51 which added support for AL 2023 as default instance type for Bastion Host under a feature flag @aws-cdk/aws-ec2:bastionHostUseAmazonLinux2023ByDefault. The default value for this feature flag if false, recommended value as true (cdk init for new projects would set this feature flag to recommended value true). Did you enable this feature flag in cdk.json or used a new project with cdk init recently?
  • Also refer userDataCausesReplacement property for BastionHostLinux. This doesn't appear to be set in your use case, mentioning it just for reference in case you could recollect of any change to user data recently.

Thanks,
Ashish

@ashishdhingra ashishdhingra self-assigned this Dec 12, 2024
@ashishdhingra ashishdhingra added p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Dec 12, 2024
@rantoniuk
Copy link
Author

rantoniuk commented Dec 13, 2024

If you updated CDK version to 2.172.0 recently when the issue occurred! If yes, what was the last CDK version used?

CDK updates are done every now and then on the every opportunity when any CDK stack is touched. So answering your question, yes, recently aws-cdk-lib was updated to the current one, but I cannot answer the other part of the question. What you are actually asking is: what was the version, with which this specific stack was last deployed with - and I have no idea.
I can see in CloudFormation Events when the stack was deployed previously, but that's about it.

  1. Definitely in the meantime, the bootstrap version has changed and the account was re-bootstrap to cover that.

There was a recent commit bf77e51 which added support for AL 2023 as default instance type for Bastion Host under a feature flag @aws-cdk/aws-ec2:bastionHostUseAmazonLinux2023ByDefault. The default value for this feature flag if false, recommended value as true (cdk init for new projects would set this feature flag to recommended value true). Did you enable this feature flag in cdk.json or used a new project with cdk init recently?

No, nothing there.

$ grep -ci bastionHostUseAmazonLinux2023ByDefault cdk.context.json cdk.json
cdk.context.json:0
cdk.json:0

Also refer userDataCausesReplacement property for BastionHostLinux. This doesn't appear to be set in your use case, mentioning it just for reference in case you could recollect of any change to user data recently.

No.

On top of that, I looked at CloudFormation's last executed ChangeSet JSON on that stack, this might be helpful:

[
  {
    "type": "Resource",
    "resourceChange": {
      "policyAction": "ReplaceAndDelete",
      "action": "Modify",
      "logicalResourceId": "BastionHost30F9ED05",
      "physicalResourceId": "i-0ca5ab2f37f8e1515",
      "resourceType": "AWS::EC2::Instance",
      "replacement": "True",
      "scope": [
        "Properties"
      ],
      "details": [
        {
          "target": {
            "attribute": "Properties",
            "name": "ImageId",
            "requiresRecreation": "Always",
            "path": "/Properties/ImageId",
            "beforeValue": "ami-033067239f2d2bfbc",
            "afterValue": "ami-061dd8b45bc7deb3d",
            "attributeChangeType": "Modify"
          },
          "evaluation": "Static",
          "changeSource": "DirectModification"
        }
      ],
      "beforeContext": "{\"Properties\":{\"UserData\":\"IyEvYmluL2Jhc2g=\",\"LaunchTemplate\":{\"LaunchTemplateName\":\"VpcStackBastionHostLaunchTemplate8E005F08\",\"Version\":\"1\"},\"ImageId\":\"ami-033067239f2d2bfbc\",\"AvailabilityZone\":\"us-west-2a\",\"IamInstanceProfile\":\"Vpc-Stack-BastionHostInstanceProfile770FCA07-9lzlkFHPUa3e\",\"SubnetId\":\"subnet-02d408389aa197627\",\"InstanceType\":\"t3.nano\",\"SecurityGroupIds\":[\"sg-019dba3810c4faf7a\"],\"Tags\":[{\"Value\":\"BastionHost\",\"Key\":\"Name\"}]},\"Metadata\":{\"aws:cdk:path\":\"Vpc-Stack/BastionHost/Resource/Resource\"}}",
      "afterContext": "{\"Properties\":{\"UserData\":\"IyEvYmluL2Jhc2g=\",\"LaunchTemplate\":{\"LaunchTemplateName\":\"VpcStackBastionHostLaunchTemplate8E005F08\",\"Version\":\"1\"},\"ImageId\":\"ami-061dd8b45bc7deb3d\",\"AvailabilityZone\":\"us-west-2a\",\"IamInstanceProfile\":\"Vpc-Stack-BastionHostInstanceProfile770FCA07-9lzlkFHPUa3e\",\"SubnetId\":\"subnet-02d408389aa197627\",\"InstanceType\":\"t3.nano\",\"SecurityGroupIds\":[\"sg-019dba3810c4faf7a\"],\"Tags\":[{\"Value\":\"BastionHost\",\"Key\":\"Name\"}]},\"Metadata\":{\"aws:cdk:path\":\"Vpc-Stack/BastionHost/Resource/Resource\"}}"
    }
  },
  {
    "type": "Resource",
    "resourceChange": {
      "action": "Modify",
      "logicalResourceId": "CDKMetadata",
      "physicalResourceId": "cbdcfcf0-c5a4-11ee-946a-0a435140c109",
      "resourceType": "AWS::CDK::Metadata",
      "replacement": "Conditional",
      "scope": [
        "Properties"
      ],
      "details": [
        {
          "target": {
            "attribute": "Properties",
            "name": "Analytics",
            "requiresRecreation": "Conditionally",
            "path": "/Properties/Analytics",
            "beforeValue": "...",
            "afterValue": "...",
            "attributeChangeType": "Modify"
          },
          "evaluation": "Static",
          "changeSource": "DirectModification"
        }
      ],
      "beforeContext": "{\"Properties\":{\"Analytics\":\"..."},\"Metadata\":{\"aws:cdk:path\":\"Vpc-Stack/CDKMetadata/Default\"}}",
      "afterContext": "{\"Properties\":{\"Analytics\":\"..."},\"Metadata\":{\"aws:cdk:path\":\"Vpc-Stack/CDKMetadata/Default\"}}"
    }
  }
]

Especially, notice that part:

            "attribute": "Properties",
            "name": "ImageId",
            "requiresRecreation": "Always",
            "path": "/Properties/ImageId",

Now the question is, why wasn't this shown via cdk diff. This whole situation happened on our DEV environment and I am pretty sure the same is going to happen on PROD (just checked the cdk diff there and it's also not showing BastionHost replacement).

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 13, 2024
@mwebber
Copy link

mwebber commented Dec 17, 2024

From the CDK docs for ec2.BastionHostLinux:
machine_image: the image is kept up-to-date automatically (the instance may be replaced on every deployment)

I'm guessing that a newer release of the AWS AMI for the instance is available, and that is causing the behaviour you see.

In my experience in other contexts, a launch template that specifies "always use the latest AMI" does not show any change in cdk diff (because there isn't any difference in the generated CF template). However, if a deployment is happening, then CF will notice if there is a newer AMI available, and replace the instance.

Generally, CDK's philosophy seems to be that "versions" of "things" are pinned, so that unexpected resource replacement does not occur. This obviously conflicts with the "always update if possible" requirement for security purposes, and there's not an obvious way to handle that.

@rantoniuk
Copy link
Author

rantoniuk commented Dec 17, 2024

I expect CDK that is an abstraction layer on top of CF to provide accurate deployment-time diff. That's the point of not using the --no-change-set flag.

As stated above, CloudFormation had the knowledge of the replacement need - it is available in the CF ChangeSet and CDK could fetch it from there.

Generally, CDK's philosophy seems to be that "versions" of "things" are pinned, so that unexpected resource replacement does not occur. This obviously conflicts with the "always update if possible" requirement for security purposes, and there's not an obvious way to handle that.

Not always. For example, you can deploy an Aurora Serverless RDS where by default minor upgrades are turned on and handled by the PaaS and deploying again wouldn't cause a downgrade. (Whether that's wanted or not is to be decided by the user)

@ashishdhingra
Copy link
Contributor

@rantoniuk Per details shared by you in #32472 (comment), looks like AMI ID of EC2 instance behind the scenes changed. There is also a change in UserData. Are you able to search ImageId ami-061dd8b45bc7deb3d in your account. As @mwebber pointed out in #32472 (comment), the image is kept up-to-date automatically. So even though cdk diff didn't report any differences, upon next CDK deployment, the EC2 instance might have been replaced with the new AMI. cdk diff relies on CloudFormation APIs to detect the differences. If CloudFormation API doesn't report any differences, CDK wouldn't as well. I will direct this ticket to CDK Core team for inputs.

@ashishdhingra ashishdhingra removed their assignment Dec 17, 2024
@HBobertz
Copy link
Contributor

Hm this feels like it may be a bit tricky to detect, but I do see why we would want to check this. Unexpected diffs, especially when it comes to the replacement of an instance, can be a bit scary. Gonna discuss with the team

@rantoniuk rantoniuk changed the title aws-ec2: inaccurate cdk diff output aws-ec2: cdk ignores ReplaceAndDelete always policy in the diff output Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud bug This issue is a bug. p2
Projects
None yet
Development

No branches or pull requests

4 participants