Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: gracefully shutdown server #59551

Merged
merged 1 commit into from
Dec 19, 2023
Merged

fix: gracefully shutdown server #59551

merged 1 commit into from
Dec 19, 2023

Conversation

redbmk
Copy link
Contributor

@redbmk redbmk commented Dec 13, 2023

  • Both the standalone server and the startServer function it calls attempt to stop the server on SIGINT and SIGTERM in different ways. This lets server.js yield to startServer
  • The cleanup function in startServer was not waiting for the server to close before calling process.exit. This lets it wait for any in-flight requests to finish processing before exiting the process

fixes: #53661

@@ -266,8 +266,9 @@ export async function startServer(
try {
const cleanup = (code: number | null) => {
debug('start-server process cleanup')
server.close()
process.exit(code ?? 0)
server.close(() => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://nodejs.org/api/net.html#serverclosecallback

This function is asynchronous, the server is finally closed when all connections are ended and the server emits a 'close' event. The optional callback will be called once the 'close' event occurs.

@huozhi huozhi added the CI approved Approve running CI for fork label Dec 13, 2023
@ijjk
Copy link
Member

ijjk commented Dec 13, 2023

Tests Passed

@ijjk
Copy link
Member

ijjk commented Dec 13, 2023

Stats from current PR

Default Build
General
vercel/next.js canary redbmk/next.js fix-53661 Change
buildDuration 12.7s 12.8s N/A
buildDurationCached 7.1s 6.2s N/A
nodeModulesSize 200 MB 200 MB N/A
nextStartRea..uration (ms) 423ms 424ms N/A
Client Bundles (main, webpack)
vercel/next.js canary redbmk/next.js fix-53661 Change
170-HASH.js gzip 26.9 kB 26.9 kB N/A
199.HASH.js gzip 181 B 185 B N/A
3f784ff6-HASH.js gzip 53.3 kB 53.3 kB
framework-HASH.js gzip 45.2 kB 45.2 kB
main-app-HASH.js gzip 240 B 242 B N/A
main-HASH.js gzip 31.7 kB 31.7 kB N/A
webpack-HASH.js gzip 1.7 kB 1.7 kB N/A
Overall change 98.5 kB 98.5 kB
Legacy Client Bundles (polyfills)
vercel/next.js canary redbmk/next.js fix-53661 Change
polyfills-HASH.js gzip 31 kB 31 kB
Overall change 31 kB 31 kB
Client Pages
vercel/next.js canary redbmk/next.js fix-53661 Change
_app-HASH.js gzip 195 B 194 B N/A
_error-HASH.js gzip 183 B 182 B N/A
amp-HASH.js gzip 501 B 501 B
css-HASH.js gzip 321 B 321 B
dynamic-HASH.js gzip 2.5 kB 2.5 kB N/A
edge-ssr-HASH.js gzip 255 B 255 B
head-HASH.js gzip 349 B 350 B N/A
hooks-HASH.js gzip 368 B 369 B N/A
image-HASH.js gzip 4.27 kB 4.27 kB N/A
index-HASH.js gzip 255 B 256 B N/A
link-HASH.js gzip 2.61 kB 2.6 kB N/A
routerDirect..HASH.js gzip 311 B 309 B N/A
script-HASH.js gzip 384 B 384 B
withRouter-HASH.js gzip 307 B 306 B N/A
1afbb74e6ecf..834.css gzip 106 B 106 B
Overall change 1.57 kB 1.57 kB
Client Build Manifests
vercel/next.js canary redbmk/next.js fix-53661 Change
_buildManifest.js gzip 484 B 482 B N/A
Overall change 0 B 0 B
Rendered Page Sizes
vercel/next.js canary redbmk/next.js fix-53661 Change
index.html gzip 529 B 526 B N/A
link.html gzip 542 B 540 B N/A
withRouter.html gzip 523 B 522 B N/A
Overall change 0 B 0 B
Edge SSR bundle Size
vercel/next.js canary redbmk/next.js fix-53661 Change
edge-ssr.js gzip 93.7 kB 93.7 kB N/A
page.js gzip 146 kB 146 kB N/A
Overall change 0 B 0 B
Middleware size
vercel/next.js canary redbmk/next.js fix-53661 Change
middleware-b..fest.js gzip 627 B 623 B N/A
middleware-r..fest.js gzip 151 B 151 B
middleware.js gzip 37.4 kB 37.4 kB N/A
edge-runtime..pack.js gzip 1.92 kB 1.92 kB
Overall change 2.07 kB 2.07 kB
Next Runtimes
vercel/next.js canary redbmk/next.js fix-53661 Change
app-page-exp...dev.js gzip 168 kB 168 kB
app-page-exp..prod.js gzip 94.1 kB 94.1 kB
app-page-tur..prod.js gzip 94.8 kB 94.8 kB
app-page-tur..prod.js gzip 89.4 kB 89.4 kB
app-page.run...dev.js gzip 138 kB 138 kB
app-page.run..prod.js gzip 88.7 kB 88.7 kB
app-route-ex...dev.js gzip 24 kB 24 kB
app-route-ex..prod.js gzip 16.6 kB 16.6 kB
app-route-tu..prod.js gzip 16.6 kB 16.6 kB
app-route-tu..prod.js gzip 16.2 kB 16.2 kB
app-route.ru...dev.js gzip 23.4 kB 23.4 kB
app-route.ru..prod.js gzip 16.2 kB 16.2 kB
pages-api-tu..prod.js gzip 9.37 kB 9.37 kB
pages-api.ru...dev.js gzip 9.64 kB 9.64 kB
pages-api.ru..prod.js gzip 9.37 kB 9.37 kB
pages-turbo...prod.js gzip 21.9 kB 21.9 kB
pages.runtim...dev.js gzip 22.5 kB 22.5 kB
pages.runtim..prod.js gzip 21.9 kB 21.9 kB
server.runti..prod.js gzip 49.4 kB 49.4 kB
Overall change 931 kB 931 kB
Commit: 7f15d3f

@redbmk redbmk force-pushed the fix-53661 branch 2 times, most recently from 0da5d6c to 3db2e2c Compare December 13, 2023 20:18
@redbmk
Copy link
Contributor Author

redbmk commented Dec 13, 2023

I'm having a hard time thinking of a good way to test this in an automated way. The way I've tested manually is by hitting an endpoint with a long-lived (~10s) response and then send a kill -TERM signal to the process. Without this change it stops immediately and interrupts the response. With this change it waits for the response to finish before the process exits.

Also tested with ctrl-c on the server and in docker compose using ctrl-c (which should be the equivalent of docker compose stop, which sends a SIGTERM).

@redbmk
Copy link
Contributor Author

redbmk commented Dec 13, 2023

An interesting behavior that's more noticeable after adding the log is that process.exit triggers the on('exit' callback, which will also log Shutting down server, but right as its exiting. Might make more sense to either use a slightly different callback for that or just remove the exit handler and let it shut down immediately (essentially the same behavior as before since server.close is asynchronous)

@@ -279,7 +281,8 @@ export async function startServer(
// This is the render worker, we keep the process alive
console.error(err)
}
process.on('exit', (code) => cleanup(code))
Copy link
Contributor Author

@redbmk redbmk Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://nodejs.org/api/process.html#event-exit

Listener functions must only perform synchronous operations. The Node.js process will exit immediately after calling the 'exit' event listeners causing any additional work still queued in the event loop to be abandoned.

Once process.exit is called we don't really have a chance to clean up since server.close is asynchronous. Best we could do is log a message, but not sure that's very useful.

Also calling the same cleanup code, now that there's a log, was resulting in the log printing twice after a signal was received.

huozhi
huozhi previously approved these changes Dec 14, 2023
@huozhi huozhi enabled auto-merge (squash) December 14, 2023 14:18
Copy link
Member

@huozhi huozhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is breaking the test test/integration/config-output-export/test/index.test.ts

@huozhi huozhi dismissed their stale review December 14, 2023 15:53

test failing

auto-merge was automatically disabled December 14, 2023 17:45

Head branch was pushed to by a user without write access

@redbmk redbmk changed the title fix: gracefully shutdown standalone server fix: gracefully shutdown server Dec 14, 2023

process.on('exit', cleanup)
process.on('SIGINT', cleanup)
process.on('SIGTERM', cleanup)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other handlers were already killing the child - this one always sends SIGTERM. However, we still need to add a special handler for the exit event since it needs to be synchronous

process.on('SIGINT', () => handleSessionStop('SIGINT'))
process.on('SIGTERM', () => handleSessionStop('SIGTERM'))
process.on('SIGINT', (code) => handleSessionStop(code))
process.on('SIGTERM', (code) => handleSessionStop(code))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just sending handleSessionStop works as well but typescript seems to do better with this (e.g. if I set signal: object this way would give a typescript error, but process.on("SIGINT", handleSessionStop) wouldn't raise any errors.

@redbmk
Copy link
Contributor Author

redbmk commented Dec 14, 2023

@huozhi I did a little more cleanup and got the tests passing. I have an idea on how to add a test for this but not really sure where to put it.

It might involve adding an example app with a route that forces a response to take a while, then having a test that calls the route, sends a SIGTERM (or SIGINT), then makes sure the request comes back successfully before the server finishes.

@redbmk
Copy link
Contributor Author

redbmk commented Dec 14, 2023

Woah, the tests were passing before I refactored. I must have gotten some logic backwards. I'll take a look in a bit

@redbmk
Copy link
Contributor Author

redbmk commented Dec 15, 2023

Ah, OK I had made the condition function async so it always returned a promise, and !(new Promise()) would always be false...

Looking a lot better now

@redbmk
Copy link
Contributor Author

redbmk commented Dec 15, 2023

still one test failing related to teardown. I need to look into how to run these tests locally

process.on('SIGTERM', () => handleSessionStop('SIGKILL'))

// exit event must be synchronous
process.on('exit', () => child?.kill('SIGKILL'))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed these to SIGKILL to make sure the child process stops immediately. In a production server we would want to pass along the SIGINT or SIGTERM but if you're killing the dev server you probably wouldn't expect it to do any cleanup. This should also be consistent with the current behavior that exits right away with process.exit

@redbmk
Copy link
Contributor Author

redbmk commented Dec 17, 2023

@huozhi I was able to debug the issue and got tests all passing. I think it's ready for review again.

Comment on lines 543 to 547
try {
await waitForCondition(2000, () => !isAppRunning(instance))
} catch {
await killProcess(instance.pid, 'SIGKILL')
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's throwing error but still get caught and killed again, won't know the error. And adding 2000ms here might increase duration for all tests.

Can we remove the changes to the util and add a separate test for this?

Suggested change
try {
await waitForCondition(2000, () => !isAppRunning(instance))
} catch {
await killProcess(instance.pid, 'SIGKILL')
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's throwing SIGTERM by default, the apps will run indefinitely. Maybe we just have it throw SIGKILL by default instead and then adding a separate test for the clean shutdown would look like sending a SIGTERM instead and making sure it shuts down properly.

Since SIGTERM now waits for the app to shut down properly, it'll still require at least that change to the util

- Both the standalone server and the `startServer` function it calls
  attempt to stop the server on `SIGINT` and `SIGTERM` in different
  ways. This lets `server.js` yield to `startServer`
- The cleanup function in `startServer` was not waiting for the server
  to close before calling `process.exit`. This lets it wait for any
  in-flight requests to finish processing before exiting the process

fixes: vercel#53661
Copy link
Member

@huozhi huozhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@redbmk
Copy link
Contributor Author

redbmk commented Dec 19, 2023

Hi @huozhi - thanks for your patience with me on this one :)

Just checking if there was anything else before it was ready to merge - I saw you approved it yesterday.

@huozhi huozhi merged commit 2c48b87 into vercel:canary Dec 19, 2023
sokra added a commit that referenced this pull request Dec 20, 2023
feedthejim pushed a commit that referenced this pull request Dec 20, 2023
This reverts commit 2c48b87.

It breaks some test cases

Closes PACK-2168
@huozhi
Copy link
Member

huozhi commented Dec 20, 2023

@redbmk sorry The PR is reverted on canary branch due to showing failing tests after merging with canary. If you'd like to file another one we can take another round of look again. Would be awesome if we can get a new test associated with it 🙏

@redbmk
Copy link
Contributor Author

redbmk commented Dec 20, 2023

Ah bummer - OK yeah I'll see if I can get a test for it and see if I can figure out why the tests were failing in canary. Do more tests run in the canary branch that wouldn't run in the PR?

@huozhi
Copy link
Member

huozhi commented Dec 20, 2023

Should be the same, feels like it was reported incorrectly on the PR. It was failing in this job but marked as successful

@redbmk redbmk mentioned this pull request Dec 30, 2023
@github-actions github-actions bot added the locked label Jan 4, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CI approved Approve running CI for fork locked type: next
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Close http server in cleanup
3 participants