Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: Socket timed out without establishing a connection when --asset-parallelism=true #19930

Closed
apoorvmote opened this issue Apr 15, 2022 · 24 comments · Fixed by #32317
Closed
Assignees
Labels
@aws-cdk/assets Related to the @aws-cdk/assets package bug This issue is a bug. cli Issues related to the CDK CLI effort/medium Medium work item – several days of effort p1 package/tools Related to AWS CDK Tools or CLI

Comments

@apoorvmote
Copy link

apoorvmote commented Apr 15, 2022

Describe the bug

I have anywhere between 20-50 nodejs lambda functions in single stack and I update their dependencies and deploy with cdk.

But lately I am not able to deploy updates. I get following error when I deploy.

current credentials could not be used to assume 'arn:aws:iam::******:role/cdk-hnb659fds-lookup-role-******-us-east-1', but are for the right account. Proceeding anyway.
(To get rid of this warning, please upgrade to bootstrap version >= 8)
current credentials could not be used to assume 'arn:aws:iam::******:role/cdk-hnb659fds-file-publishing-role-******-us-east-1', but are for the right account. Proceeding anyway.
current credentials could not be used to assume 'arn:aws:iam::******:role/cdk-hnb659fds-file-publishing-role-******-us-east-1', but are for the right account. Proceeding anyway.
[9%] fail: Socket timed out without establishing a connection
[18%] fail: Socket timed out without establishing a connection

I keep trying again and again and sometimes it goes through and most of the time it doesn't work.
Only stack with lower number of lambda functions sometimes gets deployed. But stack with large number of lambda functions fails 100% of the time.

Expected Behavior

I expected it to deploy no matter number of lambda functions in the stack. It used to get deployed without any problem.

Current Behavior

current credentials could not be used to assume 'arn:aws:iam::******:role/cdk-hnb659fds-lookup-role-******-us-east-1', but are for the right account. Proceeding anyway.
(To get rid of this warning, please upgrade to bootstrap version >= 8)

I don't know how to upgrade bootstrap version. I ran cdk bootstrap multiple times and it says no changes.

Reproduction Steps

const testSignUpFn = new NodejsFunction(this, 'testSignUpNodeJS', {
      runtime: Runtime.NODEJS_14_X,
      entry: `${__dirname}/../lambda-fns/sign-up/index.ts`,
      handler: 'signUp',
      architecture: Architecture.ARM_64,
      memorySize: 1024
    })

It was working before but suddenly stopped working.

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.20.0 (build 738ef49)

Framework Version

No response

Node.js Version

v16.14.2

OS

Ubuntu 20.04 on WSL 2

Language

Typescript

Language Version

~3.9.7

Other information

No response

@apoorvmote apoorvmote added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Apr 15, 2022
@corymhall
Copy link
Contributor

@apoorvmote do you know what changed between when it used to work and now? Did you recently update
the version of the CDK?

@apoorvmote
Copy link
Author

Of course I am regularly changing CDK versions as it comes out. But in my opinion problem is not with CDK. But AWS is denying my specific IP address whenever I do large update for 30-50 functions. All the other stacks has very little update and it always goes through.

Also I have another cdk project that I run in docker development environment and I am able to deploy over 50 functions without any problem from same computer. I can run that project in docker because all functions are nodejs built with esbuild. This (failing) project has functions written in golang and I use docker for building golang functions. I am not able to run docker inside docker so long story long I run this in WSL and deploy normally and it fails. But if I build it on WSL and docker then deploy from docker development environment because golang function is already built then it gets deployed.

@ryparker ryparker added p2 needs-reproduction This issue needs reproduction. labels Apr 26, 2022
@corymhall
Copy link
Contributor

@apoorvmote can you try this with a version of the CDK <2.17.0/1.149.0? We added parallel asset
publishing starting in those versions and i'm curious if that could be the issue.

@corymhall corymhall added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Apr 27, 2022
@peterwoodworth peterwoodworth removed the needs-triage This issue or PR still needs to be triaged. label Apr 27, 2022
@github-actions
Copy link

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Apr 29, 2022
@apoorvmote
Copy link
Author

I did upgrade to 2.22.0 and suddenly problem is disappeared. If the problem appears again then I will open another issue.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@agusavior
Copy link

agusavior commented Jul 9, 2022

It happened to me at CDK version 2.31.1 (build 42432c6)
I was trying to deploy a Cloudfront distribution with a S3 bucket.
I don't know how to fix it.

@viktorchukhantsev
Copy link

Facing with this issue regularly now on 2.39.1 When I enable vpn and deploy again this error disappear so looks like this is somehow related to connection establishing issue.

@corymhall corymhall reopened this Sep 1, 2022
@corymhall corymhall removed closing-soon This issue will automatically close in 4 days unless further comments are made. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Sep 1, 2022
@hassaanakram
Copy link

Facing with this issue regularly now on 2.39.1 When I enable vpn and deploy again this error disappear so looks like this is somehow related to connection establishing issue.

facing the same issue. I'm able to deploy several stacks that have lower resources but my lambdas aren't going through. Proxy seems to do the trick.

System information:
OS: macOS Monterey v12.4
CDK verison: 2.39.1

@anthony-mills
Copy link

I get the same problem with the CDK 2.52.0 seems related to IP or connection somehow.

Will get socket timeouts while trying to deploy:

[09:54:23] Assuming role failed: Socket timed out without establishing a connection
[09:54:23] Could not assume role in target account using current credentials Socket timed out without establishing a connection . Please make sure that this role exists in the account. If it doesn't exist, (re)-bootstrap the environment with the right '--trust', using the latest version of the CDK CLI.

Switch to another internet connection ( phone hot spot or similar ) and the problem goes away.

System information:
OS: Linux Mint 21 Cinnamon
NodeJS Version: v18.12.1
CDK verison: 2.52.0

@jscrobinson
Copy link

jscrobinson commented Nov 28, 2022

This does appear to be related to the Asset Parallelism feature. Executing a deployment with --asset-parallelism=false resulted in a successful deployment.

When running without --asset-parallelism=false the stack failed on the following error:

Call failed: listObjectsV2({"Bucket":"cdk-hnb659fds-assets-ACCOUNT_ID-eu-west-2","Prefix":"0936406e22fea26017ecca536fcbdc550936406e22fea26017ecca536fcbdc55.zip","MaxKeys":1}) => Socket timed out without establishing a connection (code=TimeoutError)

There are only four assets in the bucket and none of them are over 50KB.

System information:
OS: Ubuntu 20.04
NodeJS Version: v16.3.0
CDK verison: 2.51.1

@anthony-mills
Copy link

Thanks @jscrobinson I can confirm your finding! If I try to deploy normally it fails, but try again with the --asset-parallelism=false flag and the deployment succeeds.

Just so happy to have a work around at the moment that doesn't involve finding a new internet connection. 😃

@yuri1969
Copy link

yuri1969 commented Dec 5, 2022

I've encountered the same issue using 2.53.0 (build 7690f43) deploying a fleet of tens of Lambdas with --concurrency=50.

The workaround using --asset-parallelism=false seems to fix that.

@hamilton-earthscope
Copy link

We are running into this same issue when using the s3.BucketDeployment construct. Using --asset-parallelism=false fixes it. Thanks for the tip!

@madeline-k madeline-k added @aws-cdk/assets Related to the @aws-cdk/assets package p1 labels Jan 28, 2023
@madeline-k madeline-k changed the title aws-lambda-nodejs: Socket timed out without establishing a connection assets: Socket timed out without establishing a connection when --asset-parallelism=true Jan 28, 2023
@madeline-k madeline-k added effort/medium Medium work item – several days of effort package/tools Related to AWS CDK Tools or CLI cli Issues related to the CDK CLI labels Jan 28, 2023
@madeline-k madeline-k changed the title assets: Socket timed out without establishing a connection when --asset-parallelism=true cli: Socket timed out without establishing a connection when --asset-parallelism=true Jan 28, 2023
@oliversalzburg
Copy link

oliversalzburg commented Jun 12, 2023

We also have to use the --asset-parallelism=false workaround to be able to deploy at all. With 2.83, a new parallelism feature was introduced to improve performance. Now our deployments are entirely broken, regardless of --asset-parallelism.

In general, a real solution for the underlying issue would be appreciated.

In case it helps, we only see the problematic behavior when deploying from GitHub Actions. If we run the same deploy locally, it completes dramatically faster and without issues. So far, all our research regarding environment differences have been fruitless.

@oliversalzburg

This comment was marked as outdated.

@oliversalzburg
Copy link

oliversalzburg commented Aug 30, 2023

Turns out our issue was caused by setting NODE_OPTIONS=--enable-source-maps in our deployment pipeline.

CDK is compiled into a single 28 MB .js file, accompanied with a 58 MB source map. This causes excessive load, especially due to the high parallelism that CDK uses. I have patched out all the unqueued IO processes and replaced all the hardcoded parallelization values with require("os").cpus().length. This resolved our timeouts and we were able to deploy again.

Soon after, we realized that deployment performance was dramatically improved by upgrading to Node@20. This is due to this change in Node@19.6. Previously, we ran Node@18 LTS, which was also the highest supported version of CDK at the time. This change in Node@19.6 introduces caching for the parsed source maps, which resolves this whole problem entirely (for us).

I stand by my point that the way CDK handles IO is ridiculous. I also think bundling a NodeJS module into a single 28 MB file, with a 58 MB source map is ridiculous.

As Node@18 is also the latest supported runtime by AWS Lambda, be cautious when using --enable-source-maps at runtime, because similar performance issues can be observed there, especially during exception handling.

p.s.: The reason it worked for us locally was, that nobody set --enable-source-maps locally, or people were already on Node@20 locally.

@tanpenggood
Copy link

tanpenggood commented Sep 16, 2023

😭 😭 😭

I encountered the same problem while deploying the project aws-samples/amazon-codewhisperer-workshop.

And I tried using cdk deploy --all --asset-parallelism=false, the same error was throw.

Log

> cdk deploy --all

✨  Synthesis time: 7.3s

APIStack:  start: Building ba88964563976f2e7ba608a7bff3e66649cfc355fc656f357ee1cfd4981bc6aa:current_account-ap-southeast-2
APIStack:  success: Built ba88964563976f2e7ba608a7bff3e66649cfc355fc656f357ee1cfd4981bc6aa:current_account-ap-southeast-2
APIStack:  start: Building 3a167ad57f1fe716bf6aaecc1338dfc52e374149f35acd5ad6acba509938ae8d:current_account-ap-southeast-2
APIStack:  success: Built 3a167ad57f1fe716bf6aaecc1338dfc52e374149f35acd5ad6acba509938ae8d:current_account-ap-southeast-2
APIStack:  start: Publishing ba88964563976f2e7ba608a7bff3e66649cfc355fc656f357ee1cfd4981bc6aa:current_account-ap-southeast-2
IntegrationStack:  start: Building 0a7920ffc66926b7d6a37a65e729ce9c41a24a09d17a0be9db60a7c8e789a691:current_account-ap-southeast-2
IntegrationStack:  success: Built 0a7920ffc66926b7d6a37a65e729ce9c41a24a09d17a0be9db60a7c8e789a691:current_account-ap-southeast-2
IntegrationStack:  start: Building bfa23dd275e652257d6dd3b8d94380e2ff57ee161fcc742970e9a67a2268c685:current_account-ap-southeast-2
IntegrationStack:  success: Built bfa23dd275e652257d6dd3b8d94380e2ff57ee161fcc742970e9a67a2268c685:current_account-ap-southeast-2
RekognitionStack:  start: Building 7eff58b160d8d2dfb14b5ecabd9f6625f572f84607bb7941df702ea8198546cb:current_account-ap-southeast-2
RekognitionStack:  success: Built 7eff58b160d8d2dfb14b5ecabd9f6625f572f84607bb7941df702ea8198546cb:current_account-ap-southeast-2
RekognitionStack:  start: Building 2c64aeb833819272233efeec105e713968f201118f64a2a58fd01ffef5bdeca5:current_account-ap-southeast-2
RekognitionStack:  success: Built 2c64aeb833819272233efeec105e713968f201118f64a2a58fd01ffef5bdeca5:current_account-ap-southeast-2
APIStack:  fail: Socket timed out without establishing a connection

 ❌ Deployment failed: Error: Failed to publish asset ba88964563976f2e7ba608a7bff3e66649cfc355fc656f357ee1cfd4981bc6aa:current_account-ap-southeast-2
    at Deployments.publishSingleAsset (/Users/sam/.nvm/versions/node/v20.6.1/lib/node_modules/aws-cdk/lib/index.js:446:11458)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Object.publishAsset (/Users/sam/.nvm/versions/node/v20.6.1/lib/node_modules/aws-cdk/lib/index.js:446:151474)
    at async /Users/sam/.nvm/versions/node/v20.6.1/lib/node_modules/aws-cdk/lib/index.js:446:136916

Failed to publish asset ba88964563976f2e7ba608a7bff3e66649cfc355fc656f357ee1cfd4981bc6aa:current_account-ap-southeast-2

Env

> node -v
v20.6.1

> cdk --version
2.96.2 (build 3edd240)

> npm -v
9.8.1

> sw_vers
ProductName:	Mac OS X
ProductVersion:	10.15.7
BuildVersion:	19H2026

Solution

I switched the region from ap-southeast-2 to us-west-2 and successfully deployed the application.

😃 😃 😃

@juanesmendez
Copy link

I solved this issue changing the DNS server settings in my Wifi network from automatic, to manual. I am using Google DNS servers now (8.8.8.8, 8.8.4.4, 1.1.1.1, 1.0.0.1) and my issue was solved. Apparently it was an issue with the default DNS server my ISP uses.

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 18, 2024

Good find on the source maps @oliversalzburg. I wonder if disabling source maps for the CLI will help everyone.

I'm still mystified about how having a source map could lead to Socket timed out without establishing a connection... but apparently it does?

@sumupitchayan maybe you can dedicate a quick Google to this error message and see what could be causing it. Plus, make sure that our default concurrency settings aren't too insane. And remove the source maps from the CLI?

@oliversalzburg
Copy link

@rix0rrr In older versions of NodeJS, the source maps were not cached. They were re-evaluated every time a call would pass through the minified module. Because this would happen for every single asset build on a tiny GitHub CI runner on the public fleet, the machine would be fully saturated with source map processing, so that it wasn't able to handle socket communication anymore. At least that's what I remember about it. Issue went away with Node20, I believe. They added a cache for the source maps and it was a whole new world.

I still believe that publishing minifyied/bundled NodeJS modules is counter-productive on many levels. This could have been avoided entirely.

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 22, 2024

I still believe that publishing minifyied/bundled NodeJS modules is counter-productive on many levels. This could have been avoided entirely.

We are not doing that lightly either. This has been a learning due to real experiences with problematic dependencies in the past, and concerns around supply chain attacks in the ecosystem in general. In lieu of a properly supported shrinkwrapping mechanism that works across NPM, Yarn, PNPM and other potential JavaScript package managers, we've decided that bundling is the most reliable way to lock our dependency set to a known good one. (And if we're bundling, might as well minify...)

I understand your concerns, but from our PoV it's the lesser of two evils.

rix0rrr added a commit that referenced this issue Nov 28, 2024
Remove the source maps from the bundled CLI. The source maps are not
really useful for customers anyway, and have the following downsides:

- They are 30+MB, which we vend to customers for no benefit.
- They tend to slow down Node as it loads and processes them. We have
  reports that on some Node versions this even leads to socket timeouts
  as the Node process was too busy loading source maps (#19930).

There are 2 steps to producing a CLI build:

- First compile with TypeScript -> JavaScript. Produces sourcemaps
  that are still being loaded.
- Then bundle JavaScript -> bundle. This removes sourcemaps.

Developers running a local (non-bundled) build will benefit from
the source maps generated by TypeScript.

Two other changes in this PR that came up around this:

* Clarify what the `--debug` flag is for (debugging the CDK app)
  and what it's not for (debugging the CLI)
* Only print the stack trace in a CLI error if we're on a developer
  build; due to the minification printing the stack trace on
  a bundled copy prints a 1000-character minified line which is not
  useful to anyone.
@mergify mergify bot closed this as completed in #32317 Nov 28, 2024
@mergify mergify bot closed this as completed in 512cf95 Nov 28, 2024
Copy link

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

1 similar comment
Copy link

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
@aws-cdk/assets Related to the @aws-cdk/assets package bug This issue is a bug. cli Issues related to the CDK CLI effort/medium Medium work item – several days of effort p1 package/tools Related to AWS CDK Tools or CLI
Projects
None yet
Development

Successfully merging a pull request may close this issue.