Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

npm install in a project with 2 nx packages randomly fails with npm ERR! code 135 #26517

Open
1 of 4 tasks
Den-dp opened this issue Jun 11, 2024 · 23 comments
Open
1 of 4 tasks
Assignees
Labels
scope: core core nx functionality type: bug

Comments

@Den-dp
Copy link
Contributor

Den-dp commented Jun 11, 2024

Current Behavior

npm install on my CI job sometimes fails to complete because of a random postinstall failure.

I think it might be related to multiple different versions of nx package brought by nx-dotnet:

> npm ls nx

+-- @nx-dotnet/core@2.2.0
| +-- @nx-dotnet/utils@2.2.0
| | `-- @nx/devkit@17.0.2
| |   `-- nx@18.3.5
| |     `-- @nrwl/tao@18.3.5
| |       `-- nx@18.3.5 deduped
| `-- @nx/devkit@17.0.2
|   `-- nx@18.3.5
|     `-- @nrwl/tao@18.3.5
|       `-- nx@18.3.5 deduped
+-- @nx/devkit@19.2.3
| `-- nx@19.2.3 deduped
+-- @nx/js@19.2.3
| `-- @nx/workspace@19.2.3
|   `-- nx@19.2.3 deduped
`-- nx@19.2.3
  `-- @nrwl/tao@19.2.3
    `-- nx@19.2.3 deduped

Also, I found that it never fails when I use

npm install --foreground-scripts

Expected Behavior

If it is true that two different nx versions can conflict when installing, then it would be helpful to handle it if possible

GitHub Repo

No response

Steps to Reproduce

  1. generate workspace with nx@19.2.3 and @nx-dotnet/core@2.2.0
  2. run npm install
    In my case, I use an Ubuntu-based Jenkins job without dotnet (which might be important for nx-dotnet but shouldn't be a problem for overall npm install in nx workspace)
pipeline {
  agent {
    docker { image 'mcr.microsoft.com/playwright:jammy' }
  }
  stages {
    stage('Install dependencies') {
      steps {
        sh 'npm install --verbose'
      }
    }
  }
}

Nx Report

Node   : 20.13.1
 OS     : linux-x64
 npm    : 10.5.2
 
 nx                 : 19.2.3
 @nx/js             : 19.2.3
 @nx/jest           : 19.2.3
 @nx/linter         : 19.2.3
 @nx/eslint         : 19.2.3
 @nx/workspace      : 19.2.3
 @nx/devkit         : 19.2.3
 @nx/eslint-plugin  : 19.2.3
 @nx/playwright     : 19.2.3
 @nrwl/tao          : 19.2.3
 typescript         : 5.4.5
 ---------------------------------------
 Registered Plugins:
 @nx-dotnet/core
 @nx/eslint/plugin
 @nx/jest/plugin
 ---------------------------------------
 Community plugins:
 @nx-dotnet/core : 2.2.0

Failure Logs

npm info run @nx-dotnet/core@2.2.0 postinstall node_modules/@nx-dotnet/core node ./src/tasks/post-install
 npm info run @swc/core@1.5.7 postinstall node_modules/@swc/core node postinstall.js
 npm info run nx@19.2.3 postinstall node_modules/nx node ./bin/post-install
 npm info run @nx-dotnet/core@2.2.0 postinstall { code: 0, signal: null }
 npm info run nx@18.3.5 postinstall node_modules/@nx-dotnet/core/node_modules/nx node ./bin/post-install
 npm info run @swc/core@1.5.7 postinstall { code: 0, signal: null }
 npm info run nx@18.3.5 postinstall node_modules/@nx-dotnet/utils/node_modules/nx node ./bin/post-install
 npm info run nx@18.3.5 postinstall { code: 0, signal: null }
 npm info run nx@18.3.5 postinstall { code: 135, signal: null }
 npm info run nx@19.2.3 postinstall { code: 0, signal: null }
 npm verb stack Error: command failed
 npm verb stack     at ChildProcess.<anonymous> (/usr/lib/node_modules/npm/node_modules/@npmcli/promise-spawn/lib/index.js:53:27)
 npm verb stack     at ChildProcess.emit (node:events:519:28)
 npm verb stack     at maybeClose (node:internal/child_process:1105:16)
 npm verb stack     at Socket.<anonymous> (node:internal/child_process:457:11)
 npm verb stack     at Socket.emit (node:events:519:28)
 npm verb stack     at Pipe.<anonymous> (node:net:338:12)
 npm verb pkgid nx@18.3.5
 npm verb cwd /home/jenkins/agent/workspace/acme-main
 npm verb Linux 5.15.146+
 npm verb node v20.13.1
 npm verb npm  v10.5.2
 npm ERR! code 135
 npm ERR! path /home/jenkins/agent/workspace/acme-main/node_modules/@nx-dotnet/core/node_modules/nx
 npm ERR! command failed
 npm ERR! command sh -c node ./bin/post-install
 npm ERR! Bus error (core dumped)
 npm verb exit 135
 npm verb unfinished npm timer reify 1718135852618
 npm verb unfinished npm timer reify:build 1718135868493
 npm verb unfinished npm timer build 1718135868494
 npm verb unfinished npm timer build:deps 1718135868495
 npm verb unfinished npm timer build:run:postinstall 1718135868529
 npm verb unfinished npm timer build:run:postinstall:node_modules/@nx-dotnet/core/node_modules/nx 1718135868568
 npm verb code 135

Package Manager Version

No response

Operating System

  • macOS
  • Linux
  • Windows
  • Other (Please specify)

Additional Information

/cc @AgentEnder

@whygee-dev
Copy link

Having the same error randomly in our pipeline

@Daniel-Griffiths
Copy link

Having a similar issue here but it usually gets a 129 status code. yarn install works ok locally but sometimes fails on CI.

@FrozenPandaz FrozenPandaz added the scope: core core nx functionality label Jun 13, 2024
@nbalu2
Copy link

nbalu2 commented Jun 27, 2024

129 should be separated from this story. We've also run into the same issue with GHA runners.

It's not predictable when the error happens, though from 10 to 25% our builds are failing in postinstall -> node ./bin/postinstall step.

The problem is that SIGBUS indicates that the error is actually native memory access issue.

------------------------ LOCAL NX report ------------------------------
NX   Report complete - copy this into the issue template

Node   : 18.18.0
OS     : win32-x64
pnpm   : 9.4.0

nx                 : 19.2.3
@nx/js             : 19.2.3
@nx/jest           : 19.2.3
@nx/linter         : 19.2.3
@nx/eslint         : 19.2.3
@nx/workspace      : 19.2.3
@nx/angular        : 19.2.3
@nx/eslint-plugin  : 19.2.3
@nx/storybook      : 19.2.3
@nx/web            : 19.2.3
typescript         : 5.4.5
---------------------------------------
Registered Plugins:
some-workspace-plugin
---------------------------------------
Community plugins:
@ngneat/spectator        : 18.0.2
@storybook/angular       : 8.1.6
angular-auth-oidc-client : 17.1.0
nx-stylelint             : 17.1.5
---------------------------------------
Local workspace plugins:
         some-workspace-plugin

We are using GHA hosted agents with ubuntu-latest so OS is different on CI.

Current runner version: '2.317.0'
Operating System
  Ubuntu
  LTS
Runner Image
  Image: ubuntu-22.04
  Version: 20240616.1.0
  Included Software: https://github.com/actions/runner-images/blob/ubuntu22/20240616.1/images/ubuntu/Ubuntu2204-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20240616.1

Error:
image

We are using PNPM for package managment. So probably symlinks are present (this might be relevant later).

After seeing the issue I've captured the core dumps.
image

So unfortunately backtrace can't really help without debug symbols (at least for me). Though at least we know that there's 2 modules that core dump was trying to map -> node + nx-native-file-cache.

So trying to turn off cache with the awesome variables on the workflow with - NX_SKIP_NX_CACHE:true & NX_CACHE_PROJECT_GRAPH: false didn't helped + the core dump was almost identical. At least it's backtrace...

Is there any way we can skip nx-native-file-cache?

@Daniel-Griffiths
Copy link

Daniel-Griffiths commented Jun 27, 2024

A temporary fix for the meantime was to disable the nx postinstall script by using yarn to patch it.

https://yarnpkg.com/cli/patch

image

@Den-dp
Copy link
Contributor Author

Den-dp commented Jun 27, 2024

As I mentioned in the issue, I was able to workaround it by opting into sequential postinstall script execution via:

npm install --foreground-scripts

@nbalu2
Copy link

nbalu2 commented Jun 27, 2024

Both great, though have to look into a PNPM version of it. 😄

@hevans90
Copy link

hevans90 commented Aug 21, 2024

Getting this error 20-30% of the time on my builds in both github action runners and heroku builders, can you explain which module's package.json you patched @Daniel-Griffiths ? I can't seem to find it.

EDIT: nvm got it!

@hevans90
Copy link

Well crap, IDK if I am doing something wrong but I used this package: https://www.npmjs.com/package/patch-package to allow me to patch a dependency using yarn 1.x.x and it seems like the patch happens after postinstalls are run (which kinda defeats the point)...

@Daniel-Griffiths are you using yarn 2.x.x by any chance, and does patching with yarn patch actually work?

(also does anyone know the root cause of this issue? it is driving me mad and I've never seen it before. I've done a lot of nx monorepos before so I am surprised)

@Daniel-Griffiths
Copy link

@hevans90 That's the same issue I ran into, thats why I ended up using yarn patch as it runs before the postinstall script.

Yep! I was using yarn 2.x

@aagnone3
Copy link

Also seeing rc 135 happen inconsistently -- occurs in all of these environments:

  • GHA runners
  • M1 Macbook in an ARM-installed node environment
  • M1 Macbook in an x86-installed node environment (via Rosetta)

yvalentin added a commit to betagouv/mission-transition-ecologique that referenced this issue Sep 4, 2024
tentative de fix pour les erreurs sur la CI de github en relation avec
`nx` (voir: nrwl/nx#26517)
@cyrilluce
Copy link

FYI, pnpm didn't support --foreground-scripts param, I just try pnpm install --ignore-scripts, and works fine for ci

@meeroslav
Copy link
Contributor

One of the customers is using pnpm install --frozen-lockfile --ignore-scripts and it's still happening.

@FrozenPandaz would you have time to look into it?

@190n
Copy link

190n commented Oct 1, 2024

I debugged this for a while at Bun. I found that the bus error occurs inside dlopen, while Node is trying to load the module /tmp/nx-native-file-cache-d859752/19.8.2-nx.linux-x64-gnu.node. dlopen eventually calls memset which triggers a bus error on a memory write. The specific code is BUS_ADRERR, which can be caused by out-of-bounds access of a file mapped with mmap.

This StackOverflow post has someone getting a similar error in Java, and the issue was that they had two processes, one of which had partially extracted a library file, while the other tried to open it and hit this exception because the file wasn't complete yet. I haven't looked at very much code from nx's install scripts, but I think something similar's probably going on here, and it explains why forcing the scripts to run sequentially or not at all fixes this error (it did for me, too). I have a core dump if anyone from nx wants that to debug.

@Den-dp
Copy link
Contributor Author

Den-dp commented Oct 1, 2024

As running post-install scripts sequentially (or disabling them) has become a common workaround here, I'd like to emphasize that a much better solution is to eliminate multiple versions of the nx package (npm ls nx can help with this).

This naturally resolves the race condition between the nx daemon and multiple concurrent nx/bin/post-install.ts scripts, resulting in a healthier dependency graph for your workspace.

@aagnone3
Copy link

aagnone3 commented Oct 1, 2024

I use python in my monorepo alongside node/react, which indeed caused different transitive nx package dependencies.

I solved my particular situation by ensuring the transitive nx dependency of @nxlv/python aligned with the non-python nx packages. As of today, that meant @nxlv/python@19.1.2 and v19.7.3 of all other packages related to nx.

@Cammisuli
Copy link
Member

I just want everyone to know that we are looking into this. Its quite annoying when it happens and I have a hunch that it is related to multiple versions of Nx like @Den-dp @aagnone3 suggested.

We are actively discussing this.

@Squixx
Copy link

Squixx commented Oct 29, 2024

One of the customers is using pnpm install --frozen-lockfile --ignore-scripts and it's still happening.

@FrozenPandaz would you have time to look into it?

Thats me.

we're running a single nx version, so i'm not sure that could be the issue honestly. we also use pnpm & install on our ci side with pnpm install --frozen-lockfile --ignore-scripts

I've tried running --verbose, but it'll crash before any logging os done.

However in our case we get this wen running nx affected, not when running pnpm i

> nx "affected" "-t" "lint,test,sonar,e2e,build" "-c" "ci" "--exclude=features,vattenfallnl-pat-e2e,btb-web-e2e" "--parallel=3" "--base=red" "--head=red"

 WARN  Unsupported engine: wanted: {"node":">=20.9.0"} (current: {"node":"v18.20.4","pnpm":"9.12.3"})
Bus error (core dumped)
 ELIFECYCLE  Command failed with exit code 135.

nx report

NX Report complete - copy this into the issue template

Node : 20.18.0
OS : darwin-arm64
Native Target : aarch64-macos
pnpm : 9.12.3

nx (global) : 20.0.0
nx : 20.0.6
@nx/js : 20.0.6
@nx/jest : 20.0.6
@nx/eslint : 20.0.6
@nx/workspace : 20.0.6
@nx/angular : 20.0.6
@nx/cypress : 20.0.6
@nx/devkit : 20.0.6
@nx/eslint-plugin : 20.0.6
@nx/express : 20.0.6
@nx/node : 20.0.6
@nx/plugin : 20.0.6
@nx/rollup : 20.0.6
@nx/storybook : 20.0.6
@nx/web : 20.0.6
@nx/webpack : 20.0.6
typescript : 5.5.4

Community plugins:
@compodoc/compodoc : 1.1.26
@ionic/angular : 8.3.3
@ionic/angular-toolkit : 11.0.1
@jsverse/transloco : 7.5.0
@ngneat/spectator : 19.0.0
@ngrx/effects : 18.1.1
@ngrx/operators : 18.1.1
@ngrx/router-store : 18.1.1
@ngrx/store : 18.1.1
@ngrx/store-devtools : 18.1.1
@ngxs/storage-plugin : 18.1.4
@ngxs/store : 18.1.4
@nxext/capacitor : 20.0.1
@nxext/ionic-angular : 20.0.2
@storybook/angular : 8.3.6
angular-auth-oidc-client : 18.0.2
ng-mocks : 14.13.1
ng2-charts : 6.0.1
ngx-toastr : 19.0.0

Local workspace plugins:
@vf/generators

edit: this is still happening on 20.0.10

@leandroaguiar-lr
Copy link

leandroaguiar-lr commented Nov 26, 2024

I'm also seeing this when running multiple nx processes in parallel inside a single fresh CI agent.

@Cammisuli based on the reports from @nbalu2 and @190n I went looking and found this:

// We override the _load function so that when a native file is required,
// we copy it to a cache directory and require it from there.
// This prevents the file being loaded from node_modules and causing file locking issues.
// Will only be called once because the require cache takes over afterwards.
Module._load = function (request, parent, isMain) {
const modulePath = request;
if (
nxPackages.has(modulePath) ||
localNodeFiles.some((f) => modulePath.endsWith(f))
) {
const nativeLocation = require.resolve(modulePath);
const fileName = basename(nativeLocation);
// we copy the file to a workspace-scoped tmp directory and prefix with nxVersion to avoid stale files being loaded
const nativeFileCacheLocation = getNativeFileCacheLocation();
const tmpFile = join(nativeFileCacheLocation, nxVersion + '-' + fileName);
if (existsSync(tmpFile)) {
return originalLoad.apply(this, [tmpFile, parent, isMain]);
}
if (!existsSync(nativeFileCacheLocation)) {
mkdirSync(nativeFileCacheLocation, { recursive: true });
}
copyFileSync(nativeLocation, tmpFile);
return originalLoad.apply(this, [tmpFile, parent, isMain]);
} else {
// call the original _load function for everything else
return originalLoad.apply(this, arguments);
}
};

Since my processes are running in parallel, possibly one of the processes finishes copying the native library to the final location and tries to load it while the others are still writing to the same location, causing the SIGBUS signal on the process that finished first.

To test this theory I added a previous step to my CI pipeline where I ensure NX runs once without concurrency so the library is properly copied (nx reset will do), and the SIGBUS error is seemingly gone.

@ComradePashka
Copy link

Hello!

I have similar problem at GHA and DO AppPlatform and I reread this issue mutliple times, but still I don't understand how I can manage nx if that randomly happened at the very first CI step when I run yarn install ?

@johnbwoodruff
Copy link
Contributor

We recently upgraded to Nx v20 and have begun to see this issue intermittently in our CI pipelines as well. Usually a rerun fixes it, it's not a consistent failure.

npm error code 135
npm error path /opt/atlassian/pipelines/agent/build/client/node_modules/@angular-eslint/schematics/node_modules/nx
npm error command failed
npm error command sh -c node ./bin/post-install
npm error Bus error (core dumped)
npm error A complete log of this run can be found in: /root/.npm/_logs/2024-12-20T17_51_30_509Z-debug-0.log

@alumni
Copy link

alumni commented Dec 22, 2024

This looks like a duplicate of nx-dotnet/nx-dotnet#911 and should be fixed there.

Seems that nx-dotnet@2.4.5 requires @nx/devkit@19.5.3 as a direct dependency. I'm actually surprised they force an exact version of nx, this is not what packages published to be used in other projects should do, they should allow at least a major version, and, in this case, they should probably mark it as a peer dependency.

As a workaround, you could try to add resolutions/overrides (depending on your package manager) to force a single @nx/devkit version, if there aren't any breaking changes in the new version.

@mnkyjs
Copy link

mnkyjs commented Jan 8, 2025

We run into the same problem. We fixed it (for the moment) by using npm ci --foreground-scripts. Maybe this will help for you as well 🤷‍♂

@DeffiDeffi
Copy link

DeffiDeffi commented Jan 21, 2025

I ran into this today while trying to run npm install in my Angular project under WSL.
I've tried all the fixes mentioned here with no luck. What did work for me was installing and running as root, i.e.

rm -rf node_modules  
sudo npm install  
sudo npx nx serve <project>

I know that, unfortunately, this is a huge security risk. It is however the only way to get my project to build at all on my machine at the moment, so I thought I'd drop it in here in case someone else tries to run their project without success.

Gum-Joe added a commit to AIgnostic/AIgnostic that referenced this issue Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope: core core nx functionality type: bug
Projects
None yet
Development

No branches or pull requests