Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(many): a number of stateful resources don't have a removal policy #12563

Closed
6 of 7 tasks
rix0rrr opened this issue Jan 18, 2021 · 10 comments · Fixed by #12920
Closed
6 of 7 tasks

(many): a number of stateful resources don't have a removal policy #12563

rix0rrr opened this issue Jan 18, 2021 · 10 comments · Fixed by #12920
Labels
@aws-cdk/core Related to core CDK functionality bug This issue is a bug. effort/small Small work item – less than a day of effort good first issue Related to contributions. See CONTRIBUTING.md needs-triage This issue or PR still needs to be triaged. p1

Comments

@rix0rrr
Copy link
Contributor

rix0rrr commented Jan 18, 2021

  • AWS::CloudFormation::Stack
  • AWS::Cognito::UserPool
  • AWS::DocDB::DBInstance
  • AWS::EC2::Volume
  • AWS::Elasticsearch::Domain
  • AWS::FSx::FileSystem
  • AWS::SQS::Queue

This is 🐛 Bug Report

@rix0rrr rix0rrr added bug This issue is a bug. p1 @aws-cdk/core Related to core CDK functionality needs-triage This issue or PR still needs to be triaged. effort/small Small work item – less than a day of effort good first issue Related to contributions. See CONTRIBUTING.md labels Jan 18, 2021
@PatMyron
Copy link
Contributor

PatMyron commented Jan 20, 2021

Does this mean they don't have default removal policies explicitly set? Or removal policy cannot be set at all? If the latter, I'd prefer a resource type agnostic fix since there are always going to be new stateful resource types

@rix0rrr
Copy link
Contributor Author

rix0rrr commented Jan 22, 2021

They don't have one by default. You can always set a RemovalPolicy via Escape Hatching:

(myResource.node.defaultChild as CfnResource).applyRemovalPolicy(...);

@johnttompkins
Copy link

Any way to surface this in docs/is this called out anywhere?

See some customers having issues with this thinking it is a settable property by default and not aware of the method to apply it after creation.

@ghost
Copy link

ghost commented Jan 26, 2021

We have this in the Developer Guide: https://docs.aws.amazon.com/cdk/latest/guide/resources.html#resources_removal

However, we don't have a list of resources that the applyRemovaPolicy() method applies to. I'm loath to include such a list because it can easily become outdated.

@ghost
Copy link

ghost commented Jan 26, 2021

I went ahead and added the list.

@rix0rrr
Copy link
Contributor Author

rix0rrr commented Feb 1, 2021

I actually don't agree with AWS::CloudFormation::Stack.

@rix0rrr
Copy link
Contributor Author

rix0rrr commented Feb 1, 2021

Also... not sure that I agree with AWS::SQS::Queue... but for a different reason

@PatMyron
Copy link
Contributor

PatMyron commented Feb 1, 2021

I misinterpreted this issue as ensuring removalPolicy can be set at all from something @jotompki mentioned


Inconsistent and changing defaults are confusing and false senses of safety

Customers are going to have higher bills and hit limits with resources being retained/snapshotted suddenly, singleton resource types that get replaced suddenly can't be updated anymore, etc.

@rix0rrr
Copy link
Contributor Author

rix0rrr commented Feb 8, 2021

If we just say removalPolicy can be configured, but it doesn't have to be (and it defaults to Delete), then we're back in the situation where CloudFormation left us, which is EXACTLY the point we're trying to avoid.

Here are the options as far as I see them:

Description Default cfn-lint Notes
1. Require user to select Policy - happy Compatibility breaking change
2. Don't emit a DeletionPolicy CFN default = Delete will yell at you Effectively 1, but without tool support
3. Do emit a policy Delete happy We are now emulating CloudFormation in the before cfn-lint days (= dangerous)
4. Do emit a policy Retain happy Maximum safety, but deployments will fail if you gave physical names to your resource.

There really are no good solutions here.

@mergify mergify bot closed this as completed in #12920 Feb 22, 2021
mergify bot pushed a commit that referenced this issue Feb 22, 2021
…12920)

(Gave this change an eye-catching title so it would stand out in the change log)

`cfn-lint` expects all stateful resources to have a removal policy, but we weren't providing that option at the L2 level yet, and the L1 API is cumbersome.

Add the `removalPolicy` option to the following resources:

* Cognito User Pools (default: RETAIN)
* EC2 Volume (default: RETAIN)
* ElasticSearch Domain (default: RETAIN)
* FSx FileSystem (default: RETAIN)
* SQS Queue (default: DESTROY)
* Nested Stack (default: DESTROY)

All L2 resources now have an `applyRemovalPolicy` method, so it can be set even for non-stateful resources.

A mechanism has been added to the codegen so that new L2 authors cannot forget about the `removalPolicy` when they're writing an L2.

I'm aware that the choice to make most of these RETAIN by default is going to be contentious. There are 2 questions here:

* Is RETAIN the correct default in general?
* Can we afford switching from implicit-DESTROY to implicit-RETAIN?

### Is RETAIN the correct default?

I would argue "yes", by process of elimination:

Defaulting `removalPolicy` for all resources to `DESTROY` if nothing is given (and writing `DESTROY` to the template) is just going to put us back to the situation in CloudFormation *before* `cfn-lint` existed (your stateful resources are going to be destroyed and nothing is going to warn you about it).

Making `removalPolicy` explicitly required for all stateful resources is a backwards-breaking change, and the point about CDK is that it will have sane defaults.

I can't come up with a better solution than picking RETAIN as default. 

### Can we afford changing the default here?

I'm sensitive to the arguments that this a breaking change which we can't afford. 

It feels like this is the value we would have given the resources had we thought about this earlier, and not being able to correct past mistakes by saying they are "locked in" is going to be a death knell for the project. 

We could go for a "future behavior" flag, but the risk here seems minimal: this is a change that REDUCES risk of data loss. It might increase risk of deployment errors (duplicate resource names), but that can be manually managed. I will fast-follow with a change that will introduce warnings for resources that have a physical name set and also a RETAIN policy, to clearly identify you're doing something dangerous in CloudFormation. I'm not sure it's worth the effort to go further than that.

Although please: discussion welcome.

Fixes #12563.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/core Related to core CDK functionality bug This issue is a bug. effort/small Small work item – less than a day of effort good first issue Related to contributions. See CONTRIBUTING.md needs-triage This issue or PR still needs to be triaged. p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants