Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

npm ERR! Exit handler never called! #6409

Closed
1234567890sak opened this issue Apr 30, 2023 · 17 comments · Fixed by nodejs/node#48464
Closed

npm ERR! Exit handler never called! #6409

1234567890sak opened this issue Apr 30, 2023 · 17 comments · Fixed by nodejs/node#48464

Comments

@1234567890sak
Copy link

npm ERR! Exit handler never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR! https://github.com/npm/cli/issues

image

@davesag
Copy link

davesag commented Apr 30, 2023

I'm getting the same on macOS with NPM version 9.6.5 running npm outdated

npm ERR! Exit handler never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://github.com/npm/cli/issues>

npm ERR! A complete log of this run can be found in: /Users/dsag/.npm/_logs/2023-04-30T22_51_38_861Z-debug-0.log

when I check the log file it's empty.

@Er4twXz
Copy link

Er4twXz commented May 2, 2023

I get the same error when trying to create a new gatsby project

npm init gatsby


Edit: Despite getting this error I can run npm update and npm run develop to start the project


Npm 9.6.5 Node v. 20
2023-05-02T13_21_51_152Z-debug-0.log

@danez
Copy link

danez commented May 4, 2023

Same in our repositories: https://github.com/netlify/cli/actions/runs/4875634980/jobs/8698093846 happened since Github Action started using node 20 and npm 9.6.4

@ananfito
Copy link

I have received the same error message when attempting to install Netlify's CLI:

~ % npm install netlify-cli -g
npm ERR! Exit handler never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://github.com/npm/cli/issues>

npm ERR! A complete log of this run can be found in: /Users/anthonynanfito/.npm/_logs/2023-05-11T02_45_44_212Z-debug-0.log

System environment:

  • MacOS Monterey v12.6.5
  • npm 9.5.1
  • node v18.16.0

@johnthagen
Copy link

@johnthagen
Copy link

We are experiencing this intermittently within our CI, which is running a node:20 (debian-based) container within GItLab CI Docker executor.

...
docker image sha256:109baa4cc978d34d3f3fbac170fc96a72178c4cc600ca938886f2f932385afd9 for node:20 with digest node@sha256:68ba78ba4a648b0eb50aa35b58126c88f6caa0ccbb5560b7487a7d4fb4b8e9e9 ...
...
$ node --version
v20.2.0
$ npm --version
9.6.6
$ npm clean-install
npm ERR! Exit handler never called!
npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://github.com/npm/cli/issues>
npm ERR! A complete log of this run can be found in: /root/.npm/_logs/2023-05-18T12_24_18_695Z-debug-0.log

This started when we upgraded from node:18 to node:20 container for CI.

@jmcelroy58
Copy link

We are running npm 8.1 with Nodejs v16.3,

We are seeing this same messaeg when tryiing to run npm login to connect with our JFrog SaaS artifactory for building our artifact. We had done no other code changes up to that point:

Prepare all required actions
Run ./sat-shared/.github/actions/node-react
Run echo "Setting npm loglevel"
Setting npm loglevel
Exporting NPM_TOKEN
Setting NPM config
npm verb cli [
npm verb cli '/opt/hostedtoolcache/node/16.13.0/x64/bin/node',
npm verb cli '/opt/hostedtoolcache/node/16.13.0/x64/bin/npm',
npm verb cli 'config',
npm verb cli 'set',
npm verb cli 'registry',
npm verb cli 'https://jfrog.io/artifactory/api/npm/npm-repo/',
npm verb cli '--auth-type=legacy',
npm verb cli '--always-auth'
npm verb cli ]
npm info using npm@8.1.0
npm info using node@v16.13.0
npm timing npm:load:whichnode Completed in 1ms
npm timing config:load:defaults Completed in 1ms
npm timing config:load:file:/opt/hostedtoolcache/node/16.13.0/x64/lib/node_modules/npm/npmrc Completed in 0ms
npm timing config:load:builtin Completed in 0ms
npm verb config auth-type This method of SSO/SAML/OAuth is deprecated and will be removed in
npm verb config a future version of npm in favor of web-based login.
npm timing config:load:cli Completed in 2ms
npm timing config:load:env Completed in 0ms
npm timing config:load:file:/runner/_work///.npmrc Completed in 1ms
npm timing config:load:project Completed in 2ms
npm timing config:load:file:/home/runner/.npmrc Completed in 1ms
npm timing config:load:user Completed in 1ms
npm timing config:load:file:/opt/hostedtoolcache/node/16.13.0/x64/etc/npmrc Completed in 0ms
npm timing config:load:global Completed in 0ms
npm timing config:load:validate Completed in 1ms
npm timing config:load:credentials Completed in 1ms
npm timing config:load:setEnvs Completed in 0ms
npm timing config:load Completed in 8ms
npm timing npm:load:configload Completed in 8ms
npm timing npm:load:setTitle Completed in 0ms
npm timing npm:load:setupLog Completed in 1ms
npm timing config:load:flatten Completed in 2ms
npm timing npm:load:cleanupLog Completed in 1ms
npm timing npm:load:configScope Completed in 0ms
npm timing npm:load:projectScope Completed in 1ms
npm timing npm:load Completed in 20ms
npm info config set "registry" "https://jfrog.io/artifactory/api/npm/npm-repo/"
npm timing config:load:flatten Completed in 0ms
npm timing command:config Completed in 5ms
npm verb exit 0
npm timing npm Completed in 120ms
npm info ok
Running NPM Login
npm verb cli [
npm verb cli '/opt/hostedtoolcache/node/16.13.0/x64/bin/node',
npm verb cli '/opt/hostedtoolcache/node/16.13.0/x64/bin/npm',
npm verb cli 'login'
npm verb cli ]
npm info using npm@8.1.0
npm info using node@v16.13.0
npm timing npm:load:whichnode Completed in 1ms
npm timing config:load:defaults Completed in 1ms
npm timing config:load:file:/opt/hostedtoolcache/node/16.13.0/x64/lib/node_modules/npm/npmrc Completed in 0ms
npm timing config:load:builtin Completed in 0ms
npm timing config:load:cli Completed in 1ms
npm timing config:load:env Completed in 1ms
npm timing config:load:file:/runner/_work///.npmrc Completed in 1ms
npm timing config:load:project Completed in 1ms
npm timing config:load:file:/home/runner/.npmrc Completed in 1ms
npm timing config:load:user Completed in 1ms
npm timing config:load:file:/opt/hostedtoolcache/node/16.13.0/x64/etc/npmrc Completed in 0ms
npm timing config:load:global Completed in 0ms
npm timing config:load:validate Completed in 2ms
npm timing config:load:credentials Completed in 0ms
npm timing config:load:setEnvs Completed in 1ms
npm timing config:load Completed in 8ms
npm timing npm:load:configload Completed in 8ms
npm timing npm:load:setTitle Completed in 0ms
npm timing npm:load:setupLog Completed in 1ms
npm timing config:load:flatten Completed in 2ms
npm timing npm:load:cleanupLog Completed in 1ms
npm timing npm:load:configScope Completed in 0ms
npm timing npm:load:projectScope Completed in 0ms
npm timing npm:load Completed in 15ms
npm notice Log in on https://jfrog.io/artifactory/api/npm/npm-repo/
npm verb web login before first POST
npm verb web login not supported, trying couch
npm timing npm Completed in 237ms
npm verb unfinished npm timer command:adduser 1684919152420
npm info ok
npm ERR! Exit handler never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR! https://github.com/npm/cli/issues

npm ERR! A complete log of this run can be found in:
npm ERR! /home/runner/.npm/_logs/2023-05-24T09_05_52_546Z-debug.log
Username: (���Tb�f�����atjQ�� ����.�v��Tb��l����H���3���(�V|F)
Error: Process completed with exit code 1.

@johnthagen
Copy link

When we reverted our CI to use node:18 image, our intermittent npm clean-install errors stopped happening.

Using docker image sha256:78b037dbb659142a12986b522897824ab60209d194a336c15a5006a55fffe9b0 for node:18 with digest node@sha256:8cdf7234449f35e579f7491137807ddb3a089f028862f7ab69af437cc9f47ff1 ...
$ node --version
v18.16.0
$ npm --version
9.5.1
# ... works.

So for us, it seems like either Node 20 or npm 9.6.x (or the combination) is causing our failures.

@dharesign
Copy link

We've been hitting this too, and after much debugging I think I figured out why.

The issue seems to be caused by the new auto select family stuff in Node 20.

If I run with NODE_OPTIONS="--no-network-family-autoselection" then the issue does not occur.

How the bug manifests to cause this is a bit convoluted, but here goes:

  • npm ci (etc.) uses pacote, which uses npm-registry-fetch, which uses make-fetch-happen, which uses agentkeepalive along with Node's http module (for http:// npm registry URLs).
  • When there are multiple IPs resolved for the npm registry host, the new code in Node 20 will attempt to connect to each one in turn until something works.
  • When one connection times out, it was setting context[kTimeoutTriggered] to true, and triggers a new connection on the second IP.
  • The afterConnectMultiple for the second IP completes, it was incorrectly checking context[kTimeoutTriggered] and calling handle.close(), where handle is the TCPWrap object.
  • The timeout clearing was below the above check, so the timeout still hit for this second IP, and eventually it will have gone through all of the IPs and destroy the socket with the ERR_SOCKET_CONNECTION_TIMEOUT error.
  • destroy on the socket doesn't emit the close event, but it does call closeSocketHandle which emits the close event once the handle is closed.
  • As afterConnectMultiple has already closed the handle, closeSocketHandle's callback never gets called and the close event is not emitted.
  • As the close event is not emitted, the http's Agent does not call agent.removeSocket.
  • As agent.removeSocket is not called, the agent thinks the socket is still in use.
  • When make-fetch-happen tries to retry the connection after receiving the error event, it makes a new http request, which calls into Agent.addRequest which notice we are at the maximum number of sockets, and emits wait for socket.
  • Nothing is seemingly then keeping the Node event loop alive, so the process emits the beforeExit event and then the exit event.
  • npm then catches that exit event and reports the Exit handler never called! error.

The fix in nodejs/node#47860 looks like it will mostly address this issue.

I am concerned about one additional case though. If you consider there are two IPs to connect to, the first of which times out, the second of which fails, then I think this._handle will be the closed first handle, and the same bug will occur.

I think to address that, the kReinitializeHandle block should move up above the status !== 0 check. CC: @ShogunPanda

@ShogunPanda
Copy link

@dharesign Thanks for the deep analysis. I can confirm the behavior was broken, especially with TLS.

Several PR have landed on main now, so they will be likely release in 20.3.0 or 20.4.0.
Do you mind checking this with a nightly build to confirm this has been solved?

@dharesign
Copy link

Just from a code analysis point of view, I don’t think it has. The handle reinitialization is still below the error-and-retry block, so it won’t happen if the connect attempt errors for any connection other than the first, leaving _handle set to the previous attempt’s handle.

The desire seems to be to have _handle being the current attempts handle. My suggestion was to make that happen in afterConnectMultiple by moving it above the error check, but you could also move the _handle reinitialization to internalConnectMultiple I guess.

Not sure how testable/mockable all this is in CI, but it would be good to have a test for the case I mention:

  • First address succeeds, but after the timeout
  • Second address errors
  • No additional addresses
  • Ensure socket’s close event fires
    • Currently it does not

@ShogunPanda
Copy link

We don't care about reinitializing the socket if there is no active connection. In which case you might need anything like that?

  • First address succeeds, but after the timeout

This is now tested in test/internet/test-https-autoselectfamily-slow-timeout.js

  • Second address errors

This is now tested in test/parallel/test-net-autoselectfamily.js

  • No additional addresses

What do you mean? A single address in the resolution? In that case the multiple attempt logic is completely skipped.
See `lib.net/#1

  • Ensure socket’s close event fires

If you're talking about a close even if no connection succeded, I don't think it should. Can you clarify an example of this?

@dharesign
Copy link

The case I describe is having a domain resolve to two addresses. The first address succeeds connecting but after the timeout, and the second address fails with an error.

If we look through the code we can see what will happen in this situation:

  1. It goes into lookupAndConnect, then if autoSelectFamily is true, it'll go into `lookupAndConnectMultiple.
  2. Let's say the lookup yields two A records: 10.0.0.1 and 10.0.0.2.
    • Aside: it's a bit strange that this autoSelectFamily option also means "try connect to all addresses", rather than just trying the first IPv4 address and the first IPv6 address. The behavior is probably not unreasonable, but the option name is potentially misleading.
  3. It calls internalConnectMultiple.
  4. It's the first address, so it's using self._handle.
  5. Let's say it gets past all of the error handling there, and then as it's not the last connection it will set up the connect timeout.
  6. The timeout hits, it goes into internalConnectMultipleTimeout.
  7. handle is self._handle, and it gets closed. It then calls internalConnectMultiple for the second IP.
  8. It's not the first address, so it creates a new TCP(SOCKET) for handle.
  9. Now either the connection fails synchronously, in which case internalConnectMultiple will call internalConnectMultiple again, or it'll fail in afterConnectMultiple, which will also call internalConnectMultiple.
    • In both cases, self._handle is not updated, and is still the closed handle for the first attempted IP address.
  10. internalConnectMultiple has now tried all connections, so it destroys itself with ERR_SOCKET_CONNECTION_TIMEOUT or some aggregate of various errors.
  11. destroy() on the base class calls Socket.prototype._destroy, which goes into the (if this._handle) block
  12. this.resetAndClosing is false
  13. this._closeAfterHandlingError is false
  14. It then calls into closeSocketHandle
  15. closeSocketHandle roughly does self._handle.close(() => self.emit('close')), but as self._handle was closed in step 7 above, the callback never gets called, and the close event is never emitted.
  16. http's Agent requires close to be emitted, or the socket is not removed from the Agent.
  17. The Agent then eventually hits the limit on the number of "open" sockets, and waits for one to be closed, but that never happens.

If you're talking about a close even if no connection succeded, I don't think it should.

If you compare the above with what happens when there's a single connection, the handle never gets closed in afterConnect. In the error case, afterConnect calls self.destroy() which will follow the steps from 11 above, and end up emitting the close event from closeSocketHandle.

http's Agent seems to rely on the socket emitting close when there has been an error, so I do think this behavior should be preserved with this new multiple connect logic.

@dharesign
Copy link

It seems to me that the desire is that there's really only ever a single handle open at any given point in time.

Rather than waiting until a successful connection from IP 2 - N, and then updating _handle accordingly, I don't see why it doesn't just update _handle as soon as it initiates a new connection attempt.

My original suggestion was to move the handle reinitialization above the error check in afterConnectMultiple, but I don't think even that would be the full fix, as internalConnectMultiple can exit early if there is a synchronous error connecting.

I think the handle reinitialization should therefore move into internalConnectMultiple. For the avoidance of doubt, here is my suggestion: dharesign/node@4715fe1

dharesign added a commit to dharesign/node that referenced this issue Jun 5, 2023
If the first connection is successful, but only after the connect
timeout, and all subsequent connections result in an error, the socket
will not emit the 'close' event, causing issues in places like 'http's
'Agent' which rely on the 'close' event to free up sockets.

This commit moves the reassignment of '_handle' to happen as soon as a
new connection is attempted, so that if the socket is destroyed, the
'_handle' will represent the last connection attempt (even if it
resulted in an error) and closing the handle will then cause the 'close'
event to be emitted.

See npm/cli#6409 for more background.
@ShogunPanda
Copy link

Hi @dharesign!
It seems your analysis is correct.
I followed your advice and, inspired by your solution, implemented a slightly different change, which is now here.
Thanks for your contribution.
Once that Iands on main I'll let you know so that you can eventually test it for this issue using a compiled or nightly version of Node.

RafaelGSS pushed a commit to nodejs/node that referenced this issue Jul 3, 2023
PR-URL: #48464
Fixes: npm/cli#6409
Fixes: KararTY/dank-twitch-irc#13
Fixes: #47644
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Ceres6 pushed a commit to Ceres6/node that referenced this issue Aug 14, 2023
PR-URL: nodejs#48464
Fixes: npm/cli#6409
Fixes: KararTY/dank-twitch-irc#13
Fixes: nodejs#47644
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
Ceres6 pushed a commit to Ceres6/node that referenced this issue Aug 14, 2023
PR-URL: nodejs#48464
Fixes: npm/cli#6409
Fixes: KararTY/dank-twitch-irc#13
Fixes: nodejs#47644
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
Reviewed-By: Marco Ippolito <marcoippolito54@gmail.com>
@hildjj
Copy link

hildjj commented Jun 24, 2024

@ShogunPanda this is still broken in node v22.3.0, npm v10.8.1 for "npm publish" on a machine with both ipv4 and ipv6. NODE_OPTIONS="--no-network-family-autoselection" fixes the issue. Going back to node v17 (or probably v18 before the breakage) fixes the issue.

cyraxx added a commit to TrustNXT/c2pa-ts that referenced this issue Jul 18, 2024
@Madhu-mohan86
Copy link

yeah +1 this npm works in bizarre way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.