Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@aws-cdk/aws-s3: notification lambda function does not handle transient s3 errors well #16811

Closed
roryj-vendia opened this issue Oct 5, 2021 · 15 comments · Fixed by #30053
Closed
Labels
@aws-cdk/aws-s3 Related to Amazon S3 bug This issue is a bug. effort/small Small work item – less than a day of effort p1

Comments

@roryj-vendia
Copy link

roryj-vendia commented Oct 5, 2021

What is the problem?

A deployment failed for our team using a CDK defined infrastructure package due to a failure in the bucket notification lambda function. After looking at the logs in our account, it appears that it was caused due to missing error handling within the bucket notification lambda function. The line that failed was this one, which is called here.

Error logs:

[ERROR] 2021-10-05T18:18:42.172Z 5948f640-82f4-44de-a218-694408fb7bfb Failed to put bucket notification configuration

put_bucket_notification_configuration(bucket, config)

botocore.exceptions.ClientError: An error occurred (OperationAborted) when calling the PutBucketNotificationConfiguration operation: A conflicting conditional operation is currently in progress against this resource. Please try again.

Reproduction Steps

Have multiple notifications attempt to update the bucket? I am not quite sure!

What did you expect to happen?

The bucket notification succeeds. To do this, adding an error handling + retry method here would be great. For transient errors like this, it seems like having more robust error handling and retrying transient errors would be very beneficial. That way all of the consumers of AWS CDK get a smoother experience here

What actually happened?

The bucket notification failed, causing the stack to get into an UPDATE_ROLLBACK_FAILED state

CDK CLI Version

1.25.0

Framework Version

n/a

Node.js Version

n/a

OS

linux

Language

Python

Language Version

3.8

Other information

List of events:

Time Resource Status Message
2021-10-05 11:18:42 UTC-0700 BucketNotifications8F2E257D UPDATE_FAILED Received response status [FAILED] from custom resource. Message returned: Error: An error occurred (OperationAborted) when calling the PutBucketNotificationConfiguration operation: A conflicting conditional operation is currently in progress against this resource. Please try again.. See the details in CloudWatch Log Stream: 2021/10/05/[$LATEST]f56d3b94003f4219971717943c98f61c (RequestId: 52f908f9-40b7-4583-b6fe-314e65a991e8)
2021-10-05 11:18:46 UTC-0700 my-stack-name UPDATE_ROLLBACK_IN_PROGRESS The following resource(s) failed to update: [BucketNotifications8F2E257D, ...].
@roryj-vendia roryj-vendia added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 5, 2021
@github-actions github-actions bot added the @aws-cdk/aws-lambda Related to AWS Lambda label Oct 5, 2021
@nija-at nija-at added @aws-cdk/aws-s3 Related to Amazon S3 and removed @aws-cdk/aws-lambda Related to AWS Lambda labels Oct 7, 2021
@nija-at nija-at assigned otaviomacedo and unassigned nija-at Oct 7, 2021
@otaviomacedo otaviomacedo added p2 and removed needs-triage This issue or PR still needs to be triaged. labels Oct 8, 2021
@otaviomacedo otaviomacedo removed their assignment Oct 8, 2021
@otaviomacedo
Copy link
Contributor

Thanks for reporting this.

I am unassigning and marking this issue as p2, which means that we are unable to work on this immediately.

We use +1s to help prioritize our work, and are happy to revaluate this issue based on community feedback. You can reach out to the cdk.dev community on Slack to solicit support for reprioritization.

@otaviomacedo otaviomacedo added the effort/small Small work item – less than a day of effort label Oct 8, 2021
mergify bot pushed a commit that referenced this issue Apr 6, 2022
related to #16811, there is sometimes an issue when multiple operations
are performed on the same bucket.

To get around this in the integration test I created an additional
bucket for the import test.


----

### All Submissions:

* [ ] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/master/INTEGRATION_TESTS.md)?
	* [ ] Did you use `cdk-integ` to deploy the infrastructure and generate the snapshot (i.e. `cdk-integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
StevePotter pushed a commit to StevePotter/aws-cdk that referenced this issue Apr 27, 2022
related to aws#16811, there is sometimes an issue when multiple operations
are performed on the same bucket.

To get around this in the integration test I created an additional
bucket for the import test.


----

### All Submissions:

* [ ] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/master/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/master/INTEGRATION_TESTS.md)?
	* [ ] Did you use `cdk-integ` to deploy the infrastructure and generate the snapshot (i.e. `cdk-integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

github-actions bot commented Oct 8, 2022

This issue has not received any attention in 1 year. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Oct 8, 2022
@roryj-vendia
Copy link
Author

It doesn't look like this has been addressed. Commenting to keep issue open

@github-actions github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Oct 10, 2022
@m1n9o
Copy link

m1n9o commented Apr 10, 2023

Hi team, would you help to have a look on this small work item which wasted us for quite a long time over the last few months?

@forward2you
Copy link

This problem comes nearly every time. We already spend a long time on it but seems we can do nothing but wait for this issue fixed by your team

@adrianmace
Copy link
Contributor

The biggest problem with this is due to the mentioned issue #24762 which was closed as a duplicate of here.

There's a race condition, which makes creating and tearing down these resources in an automated / E2E manner difficult.

@adrianmace
Copy link
Contributor

Here is some reproduction code. cc: @pahud @otaviomacedo

from aws_cdk import (
	RemovalPolicy,
	aws_lambda as _lambda,
	aws_s3 as _s3,
	aws_iam as iam,
	aws_s3_notifications,
	App, Duration, Stack
)

class TestStackForAWS(Stack):
	def __init__(self, app: App, id: str) -> None:
		super().__init__(app, id)

		function = _lambda.Function(
			self, "function",
			code=_lambda.InlineCode(
				"exports.handler = function (event, context) {context.succeed('hello world');};"
			),
			handler="index.handler",
			timeout=Duration.seconds(300),
			runtime=_lambda.Runtime.NODEJS_18_X,
		)

		bucket = _s3.Bucket(self, "s3bucket", removal_policy=RemovalPolicy.DESTROY)

		notification = aws_s3_notifications.LambdaDestination(function)
		bucket.add_event_notification(_s3.EventType.OBJECT_CREATED_PUT, notification)
		bucket.add_event_notification(_s3.EventType.OBJECT_CREATED_POST, notification)

		bucket.add_to_resource_policy(
			iam.PolicyStatement(
				actions=["s3:*"],
				principals=[iam.AnyPrincipal()],
				resources=[
					bucket.bucket_arn,
					bucket.arn_for_objects('*')
				]
			)
		)

app = App()
TestStackForAWS(app, "ExampleStack")
app.synth()

Attempting to cdk deploy this will result in the following:

10:29:14 am | CREATE_FAILED        | AWS::S3::BucketPolicy         | s3bucketPolicyE79D5D76
API: s3:PutBucketPolicy Access Denied


 ❌  ExampleStack failed: Error: The stack named ExampleStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: API: s3:PutBucketPolicy Access Denied
    at FullCloudFormationDeployment.monitorDeployment (/opt/homebrew/lib/node_modules/aws-cdk/lib/index.js:412:10236)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Object.deployStack2 [as deployStack] (/opt/homebrew/lib/node_modules/aws-cdk/lib/index.js:415:153172)
    at async /opt/homebrew/lib/node_modules/aws-cdk/lib/index.js:415:136968

 ❌ Deployment failed: Error: The stack named ExampleStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: API: s3:PutBucketPolicy Access Denied
    at FullCloudFormationDeployment.monitorDeployment (/opt/homebrew/lib/node_modules/aws-cdk/lib/index.js:412:10236)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Object.deployStack2 [as deployStack] (/opt/homebrew/lib/node_modules/aws-cdk/lib/index.js:415:153172)
    at async /opt/homebrew/lib/node_modules/aws-cdk/lib/index.js:415:136968

The stack named ExampleStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: API: s3:PutBucketPolicy Access Denied

If you comment out the bucket.add_to_resource_policy statement and deploy it, then add that back in and deploy again, the process succeeds (since we're only doing one action at a time).

Crucially, the same thing happens when we are deleting stacks that have both a Policy and Notification configured. The delete fails as described in #24762.

I hope this helps!

@otaviomacedo otaviomacedo added p1 and removed p2 labels Jul 4, 2023
@otaviomacedo
Copy link
Contributor

Reprioritizing as p1, since there is no workaround for this. Note to whoever starts working on this, maybe we should use wait_until_exists to only put the notification configuration after we are sure the bucket has been created.

@adrianmace
Copy link
Contributor

Thank you @otaviomacedo. Is there any kind of guess on SLA for a p1 of this nature?

It would be good to be able to set expectations internally.

@ShanikaEdiriweera
Copy link

Hi
I am having a similar problem when creating a s3 bucket and adding a resource policy.

Received response status [FAILED] from custom resource. Message returned: Error: An error occurred (OperationAborted) when calling the PutBucketNotificationConfiguration operation: A conflicting conditional operation is currently in progress against this resource. Please try again.. See the details in CloudWatch Log Stream: 2023/10/17/[$LATEST]9f6959796644ca8449646270045 (RequestId: 26908c-74ad-4317-8a30-8358df2dc9)

I am adding the policy after creating the bucket

this.s3Bucket = new S3Bucket(this, 'bucket', {
      encryptionKey,
    });
    this.s3Bucket.addToResourcePolicy(
      createS3BucketSSLRequestsOnlyPolicyStatement(
        this.s3Bucket.bucketArn
      )
    );
--------------
export class S3Bucket extends s3.Bucket {
  constructor(scope: Construct, id: string, props: S3BucketProperties) {
    super(scope, id, {
      ...props,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      encryption: s3.BucketEncryption.KMS,
      encryptionKey: props.encryptionKey,
      eventBridgeEnabled: true,
    });
  }
}

Any update on the issue or any workarounds?

@jcoetzee
Copy link

For what it's worth, adding a dependency between the bucket policy and the notifications worked for me.

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda'
import * as s3 from 'aws-cdk-lib/aws-s3'
import * as iam from 'aws-cdk-lib/aws-iam'
import * as s3_notifications from 'aws-cdk-lib/aws-s3-notifications'
import { Construct } from 'constructs';

export class S3TestStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const func = new lambda.Function(this, 'function', {
      code: new lambda.InlineCode('exports.handler = function (event, context) {context.succeed(\'hello world\');};'),
      handler: 'index.handler',
      runtime: lambda.Runtime.NODEJS_LATEST
    })

    const bucket = new s3.Bucket(this, 'bucket', {removalPolicy: cdk.RemovalPolicy.DESTROY})

    const destination = new s3_notifications.LambdaDestination(func)

    bucket.addEventNotification(s3.EventType.OBJECT_CREATED_PUT, destination)
    bucket.addEventNotification(s3.EventType.OBJECT_CREATED_POST, destination)

    bucket.addToResourcePolicy(
      new iam.PolicyStatement({
        actions: ['s3:*'],
        principals: [new iam.AccountRootPrincipal()],
        resources: [bucket.bucketArn, bucket.arnForObjects('*')]
      })
    )

    // Add dependency between policy and notifications so they aren't applied at the same time
    bucket.node.findChild('Notifications').node.addDependency(bucket.node.findChild('Policy'))
  }
}

@haydster7
Copy link

haydster7 commented Feb 14, 2024

Same issue intermittently but when deleting a stack that contains buckets with both a policy and notifications. Doesn't seem to happen on stack creation.

The stack deletion is being called from a lambda. Before calling the stack deletion I also tried deleting the bucket notifications, and waiting until they were all successfully deleted, still getting the intermittent issue and cloudformation is still triggering a notification deletion.

@jcoetzee

For what it's worth, adding a dependency between the bucket policy and the notifications worked for me.

This workaround worked for me as well, perhaps in addition to better error handling, or as a more immediate fix, these dependencies could be picked up by cdk automatically?

@yerzhan7
Copy link
Contributor

yerzhan7 commented May 4, 2024

Proposed fix by adding dependency on Bucket Policy: #30053

Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

1 similar comment
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-s3 Related to Amazon S3 bug This issue is a bug. effort/small Small work item – less than a day of effort p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.