-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: gracefully shutdown server #59551
Conversation
@@ -266,8 +266,9 @@ export async function startServer( | |||
try { | |||
const cleanup = (code: number | null) => { | |||
debug('start-server process cleanup') | |||
server.close() | |||
process.exit(code ?? 0) | |||
server.close(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://nodejs.org/api/net.html#serverclosecallback
This function is asynchronous, the server is finally closed when all connections are ended and the server emits a 'close' event. The optional callback will be called once the 'close' event occurs.
Tests Passed |
Stats from current PRDefault BuildGeneral
Client Bundles (main, webpack)
Legacy Client Bundles (polyfills)
Client Pages
Client Build Manifests
Rendered Page Sizes
Edge SSR bundle Size
Middleware size
Next Runtimes
|
0da5d6c
to
3db2e2c
Compare
I'm having a hard time thinking of a good way to test this in an automated way. The way I've tested manually is by hitting an endpoint with a long-lived (~10s) response and then send a Also tested with |
An interesting behavior that's more noticeable after adding the log is that |
@@ -279,7 +281,8 @@ export async function startServer( | |||
// This is the render worker, we keep the process alive | |||
console.error(err) | |||
} | |||
process.on('exit', (code) => cleanup(code)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://nodejs.org/api/process.html#event-exit
Listener functions must only perform synchronous operations. The Node.js process will exit immediately after calling the
'exit'
event listeners causing any additional work still queued in the event loop to be abandoned.
Once process.exit
is called we don't really have a chance to clean up since server.close
is asynchronous. Best we could do is log a message, but not sure that's very useful.
Also calling the same cleanup code, now that there's a log, was resulting in the log printing twice after a signal was received.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is breaking the test test/integration/config-output-export/test/index.test.ts
Head branch was pushed to by a user without write access
|
||
process.on('exit', cleanup) | ||
process.on('SIGINT', cleanup) | ||
process.on('SIGTERM', cleanup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other handlers were already killing the child - this one always sends SIGTERM. However, we still need to add a special handler for the exit event since it needs to be synchronous
packages/next/src/cli/next-dev.ts
Outdated
process.on('SIGINT', () => handleSessionStop('SIGINT')) | ||
process.on('SIGTERM', () => handleSessionStop('SIGTERM')) | ||
process.on('SIGINT', (code) => handleSessionStop(code)) | ||
process.on('SIGTERM', (code) => handleSessionStop(code)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just sending handleSessionStop
works as well but typescript seems to do better with this (e.g. if I set signal: object
this way would give a typescript error, but process.on("SIGINT", handleSessionStop)
wouldn't raise any errors.
@huozhi I did a little more cleanup and got the tests passing. I have an idea on how to add a test for this but not really sure where to put it. It might involve adding an example app with a route that forces a response to take a while, then having a test that calls the route, sends a SIGTERM (or SIGINT), then makes sure the request comes back successfully before the server finishes. |
Woah, the tests were passing before I refactored. I must have gotten some logic backwards. I'll take a look in a bit |
Ah, OK I had made the condition function async so it always returned a promise, and Looking a lot better now |
still one test failing related to teardown. I need to look into how to run these tests locally |
process.on('SIGTERM', () => handleSessionStop('SIGKILL')) | ||
|
||
// exit event must be synchronous | ||
process.on('exit', () => child?.kill('SIGKILL')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed these to SIGKILL to make sure the child process stops immediately. In a production server we would want to pass along the SIGINT or SIGTERM but if you're killing the dev server you probably wouldn't expect it to do any cleanup. This should also be consistent with the current behavior that exits right away with process.exit
@huozhi I was able to debug the issue and got tests all passing. I think it's ready for review again. |
test/lib/next-test-utils.ts
Outdated
try { | ||
await waitForCondition(2000, () => !isAppRunning(instance)) | ||
} catch { | ||
await killProcess(instance.pid, 'SIGKILL') | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's throwing error but still get caught and killed again, won't know the error. And adding 2000ms here might increase duration for all tests.
Can we remove the changes to the util and add a separate test for this?
try { | |
await waitForCondition(2000, () => !isAppRunning(instance)) | |
} catch { | |
await killProcess(instance.pid, 'SIGKILL') | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's throwing SIGTERM
by default, the apps will run indefinitely. Maybe we just have it throw SIGKILL
by default instead and then adding a separate test for the clean shutdown would look like sending a SIGTERM
instead and making sure it shuts down properly.
Since SIGTERM
now waits for the app to shut down properly, it'll still require at least that change to the util
- Both the standalone server and the `startServer` function it calls attempt to stop the server on `SIGINT` and `SIGTERM` in different ways. This lets `server.js` yield to `startServer` - The cleanup function in `startServer` was not waiting for the server to close before calling `process.exit`. This lets it wait for any in-flight requests to finish processing before exiting the process fixes: vercel#53661
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Hi @huozhi - thanks for your patience with me on this one :) Just checking if there was anything else before it was ready to merge - I saw you approved it yesterday. |
@redbmk sorry The PR is reverted on canary branch due to showing failing tests after merging with canary. If you'd like to file another one we can take another round of look again. Would be awesome if we can get a new test associated with it 🙏 |
Ah bummer - OK yeah I'll see if I can get a test for it and see if I can figure out why the tests were failing in canary. Do more tests run in the |
Should be the same, feels like it was reported incorrectly on the PR. It was failing in this job but marked as successful |
startServer
function it calls attempt to stop the server onSIGINT
andSIGTERM
in different ways. This letsserver.js
yield tostartServer
startServer
was not waiting for the server to close before callingprocess.exit
. This lets it wait for any in-flight requests to finish processing before exiting the processfixes: #53661