Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(AwsCustomResource): fails to use latest SDK version #29891

Closed
frfavoreto opened this issue Apr 18, 2024 · 16 comments · Fixed by #29958, rwlxxvii/containers#124 or rwlxxvii/containers#140 · May be fixed by NOUIY/aws-solutions-constructs#99 or NOUIY/aws-solutions-constructs#101
Labels
@aws-cdk/custom-resources Related to AWS CDK Custom Resources documentation This is a problem with documentation. effort/medium Medium work item – several days of effort p2

Comments

@frfavoreto
Copy link

frfavoreto commented Apr 18, 2024

Describe the bug

When setting AwsCustomResource with installLatestAwsSdk: true it fails to upgrade aws-sdk to latest version (at this moment v3.556.0) and fails back to the default (currently v3.515.0 in Lambda).

Examples with DynamoDB:

 INFO	Installing latest AWS SDK v3: @aws-sdk/client-dynamodb
 Task timed out after 120.10 seconds
        .
        .
        .
 INFO	Installing latest AWS SDK v3: @aws-sdk/client-dynamodb
 INFO	Failed to install latest AWS SDK v3. Falling back to pre-installed version. Error: Error: Cannot find module 'mnemonist/lru-cache'
 Require stack:
 - /tmp/node_modules/@aws-sdk/endpoint-cache/dist-cjs/index.js
 - /tmp/node_modules/@aws-sdk/middleware-endpoint-discovery/dist-cjs/index.js
 - /tmp/node_modules/@aws-sdk/client-dynamodb/dist-cjs/index.js
 - /var/task/index.js
 - /var/runtime/index.mjs
          .
          .

and SSM modules:

INFO	Installing latest AWS SDK v3: @aws-sdk/client-ssm
Task timed out after 120.16 seconds 
          .
          .
 INFO	Installing latest AWS SDK v3: @aws-sdk/client-ssm
 INFO	Failed to install latest AWS SDK v3. Falling back to pre-installed version. Error: Error: Cannot find module '@smithy/shared-ini-file-loader'
Require stack:
- /tmp/node_modules/@smithy/node-config-provider/dist-cjs/index.js
- /tmp/node_modules/@smithy/middleware-endpoint/dist-cjs/adaptors/getEndpointFromConfig.js
- /tmp/node_modules/@smithy/middleware-endpoint/dist-cjs/index.js
- /tmp/node_modules/@smithy/core/dist-cjs/index.js
- /tmp/node_modules/@aws-sdk/client-ssm/dist-cjs/index.js
- /var/task/index.js
- /var/runtime/index.mjs
          .
          . 

The custom resource eventually succeeds, after failing back to the default aws-sdk.

Expected Behavior

Be able to update the Lambda Nodejs18 runtime with the latest SDKv3 version.

Current Behavior

Unable to retrieve and upgrade Lambda Nodejs18 for SDKv3 latest version

Reproduction Steps

Deploy a CDK App with a Custom Resource that has installLatestAwsSdk: true:

const myTable = new dynamodb.Table(this, 'myTable', {
      partitionKey: {
        name: 'id',
        type: dynamodb.AttributeType.STRING,
      },
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

const myCustomResource = new cr.AwsCustomResource(this, 'myCR', {​​
      policy: cr.AwsCustomResourcePolicy.fromSdkCalls({​​
          resources: cr.AwsCustomResourcePolicy.ANY_RESOURCE,
      }​​),
      installLatestAwsSdk: true,
      onCreate: {​​
          service: 'DynamoDB',
          action: 'PutItem',
          parameters: {​​
              Item: {​​
                id: {"S": 'test-value'}
              }​​,
              TableName: myTable.tableName,
          }​​,
          physicalResourceId: cr.PhysicalResourceId.of('myCRphysicalResourceID'),
      }​​,
      onUpdate: {​​
          service: 'DynamoDB',
          action: 'PutItem',
          parameters: {​​
            service: 'DynamoDB',
            action: 'PutItem',
            parameters: {​​
                Item: {​​
                  id: {"S": 'test-value'}
                }​​,
                TableName: myTable.tableName,
            }​​,
          physicalResourceId: cr.PhysicalResourceId.of('myCRphysicalResourceID'),
      }​​,
      },
  }​​);

Check the underlying Lambda Function's logs to see the messages above.

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.137.0

Framework Version

No response

Node.js Version

18

OS

Mac

Language

TypeScript

Language Version

No response

Other information

No response

@frfavoreto frfavoreto added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Apr 18, 2024
@github-actions github-actions bot added @aws-cdk/aws-lambda Related to AWS Lambda @aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud labels Apr 18, 2024
@khushail khushail self-assigned this Apr 18, 2024
@khushail khushail added p1 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Apr 18, 2024
@khushail
Copy link
Contributor

@frfavoreto , thanks for reaching out. Team is already Tracking this and working on it.

@khushail khushail removed their assignment Apr 18, 2024
@dtczest
Copy link

dtczest commented Apr 19, 2024

I'm also running into this, but in my case, the custom resource never succeeds.

@jakekarnes42
Copy link

Please note that this is currently causing deployments to fail for custom resources that that use installLatestAwsSdk: true. The resources are timing out and/or failing when the lambda falls back to using the older SDK. Some resources may succeed upon retrying, but if you have multiple custom resources, then it's likely that at least one of them will fail during each deployment.

@colifran
Copy link
Contributor

colifran commented Apr 19, 2024

@frfavoreto @jakekarnes42 What version of the CDK are you using and do you know what version you started encountering this in?

@jakekarnes42
Copy link

@colifran Today I upgraded from CDK 2.131.0 to 2.138.0 (the current latest version) and that's when the issue began. Previously successful deployments began to fail. Investigating the failures came to exact conclusion shared by @frfavoreto in the original issue description.

Each deployment contains about 15 custom resources which would fail intermittently. Upon reviewing the Lambda logs, it appears that sometimes the Lambda would succeed after falling back to the default SDK. Infrequently, it would timeout before successfully falling back. That could cause the Custom Resource update to fail, and cascade the failure to the rest of the deployment. Since I'm deploying multiple custom resources, I found that at least one would fail on each deployment attempt.

I rolled back to CDK 2.131.0 and the issue is no longer present. I'm back to successful deployments. This appears to be regression sometime between those two versions.

I hope this helps and thanks for the quick support!

@colifran
Copy link
Contributor

@jakekarnes42 Thanks for the clarification. What is strange is that I have also gone back to 2.131.0, but I'm still getting the time out when trying to install the latest SDK version. I've looked through some of our recent changes and I'm not seeing anything that would make me think this is something on the CDK side. I'm wondering if this could be an SDK related issue? We will continue to investigate / monitor!

@colifran
Copy link
Contributor

@frfavoreto @jakekarnes42 I did some more testing and it appears that the 2 minute default timeout that is set for AwsCustomResource is no longer sufficient for installing the latest SDK version. I set the default timeout to 5 minutes and this fixed the timeout issue for me. It looks like it took close to 4 minutes to install the latest SDK version. Would one of you be able to try this out on your end? I'll continue testing this on my end.

@frfavoreto
Copy link
Author

@colifran When I increase timeout settings I have the same results you described. Now I believe it might be rather an issue with Lambda, not sure.

It happens to any aws-cdk-lib version that provision functions with Nodejs18.x.

@dtczest
Copy link

dtczest commented Apr 22, 2024

I can confirm that increasing the timeout fixed this issue for me, too.

@amizer12
Copy link

This should be way more visible that it is tbh - it causes all my custom resources to fail, increasing the timeout is an option but very costly time wise. Since this is not fixed yest i just set the install_latest_aws_sdk=False as suggested above. This thing costed my multiple hours of hair pulling today :)

@colifran colifran added p2 and removed p1 labels Apr 23, 2024
@trivikr
Copy link
Member

trivikr commented Apr 24, 2024

The source code where timeout happens after 120 seconds.

console.log(`Installing latest AWS SDK v3: ${packageName}`);
// Both HOME and --prefix are needed here because /tmp is the only writable location
execSync(
`NPM_CONFIG_UPDATE_NOTIFIER=false HOME=/tmp npm install ${JSON.stringify(packageName)} --omit=dev --no-package-lock --no-save --prefix /tmp`,
);

@TheRealAmazonKendra
Copy link
Contributor

TheRealAmazonKendra commented Apr 25, 2024

This should be way more visible that it is tbh - it causes all my custom resources to fail, increasing the timeout is an option but very costly time wise. Since this is not fixed yest i just set the install_latest_aws_sdk=False as suggested above. This thing costed my multiple hours of hair pulling today :)

The installation of the sdk is a direct call to npm so we have no control over the latency here. If npm is experiencing increased latency in their downloads, we can only mitigate that by providing a version of the sdk that we know is safe to use. I did also do a quick check to see if the asset size for the sdk significantly increased recently and it looks like it has not.

There is definitely room to improve the documentation here to specify WHY you might want to increase this timeout, but there is no fix here because there is not actually a bug (on our end, there may be an issue with npm, the sdk, or somewhere else).

If increasing the timeout is too costly, then changing this setting to false is the right way to go.

@TheRealAmazonKendra TheRealAmazonKendra added @aws-cdk/custom-resources Related to AWS CDK Custom Resources documentation This is a problem with documentation. and removed bug This issue is a bug. @aws-cdk/aws-lambda Related to AWS Lambda @aws-cdk/aws-ec2 Related to Amazon Elastic Compute Cloud labels Apr 25, 2024
@TheRealAmazonKendra
Copy link
Contributor

I'm going to suspect there's something going on with the SDK and/or npm here. On https://www.npmjs.com/package/@aws-sdk/client-s3 it says the most recent version is from 9 days ago and lists 3.556.0 as the most up-to-date version. On https://github.com/aws/aws-sdk-js-v3 latest is 3.562.0 published 12 hours ago.

@trivikr
Copy link
Member

trivikr commented Apr 25, 2024

On aws/aws-sdk-js-v3 latest is 3.562.0 published 12 hours ago.

This is a global version of the AWS SDK for JavaScript. We only publish the modules which are updated in a version, but keep the version number same for easy comparison. During dev-preview, we'd followed independent versioning which had caused confusion among users. Users also complained about fixed versioning for all modules, where new version doesn't have any update. Fixed versioning only when required was a good middle path.

On npmjs.com/package/@aws-sdk/client-s3 it says the most recent version is from 9 days ago and lists 3.556.0

This is correct. There hasn't be any update in @aws-sdk/client-s3 directly (change in service model) or indirectly (update in any of it's dependencies) since v3.556.0. That's why there's not new version published for it.

@mergify mergify bot closed this as completed in #29958 Apr 27, 2024
mergify bot pushed a commit that referenced this issue Apr 27, 2024
### Issue # (if applicable)

Closes #29891

### Reason for this change
When setting AwsCustomResource with installLatestAwsSdk: true it fails to upgrade aws-sdk to latest version. The Lambda function created to support the custom resource usually time out after 120 seconds, and for some cases it get time out even after 900 seconds. 

### Description of changes
Update the Lambda function created for the custom resource to set its MemorySize to be 512 in case if installLatestAwsSdk flag is true instead of the default value 128.

Also, this change will expose the MemorySize to the AwsCustomResource construct, so customers can customize the MemorySize of the Lambda function to fulfill use cases that requires higher memory, CPU, or I/O performance.

### Description of how you validated changes
I added the unit test cases, and updated the integration test cases.

### Checklist
- [X] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@aws-cdk-automation
Copy link
Collaborator

Comments on closed issues and PRs are hard for our team to see. If you need help, please open a new issue that references this one.

@aws aws locked as resolved and limited conversation to collaborators Jul 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.