You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This implementation is incorrect: #2421 (to read first) & #2422
on receiving SIGTERM signal, set readiness probe to fail with 503, to tell the orchestrator to stop sending requests
Wait X seconds to be sure traffic stops being forwarded to the app by Kubernetes (should match the interval of the readiness probe + few seconds, to be sure the orchestrator is aware the pod should stop receive traffic),
proceed to close the webserver (process last requests if there are still some long ones running)
proceed to close database connections and others connections & shutdown the app
Minimum reproduction code
Load test your NestJS app running in a Kubernetes environment, and trigger a new deployment during this load test. You should notice a few failed requests.
Here is a simple example of load test you can run with k6:
cat << 'EOF' | k6 run -import http from 'k6/http';import { sleep } from 'k6';export const options = { scenarios: { constant_request_rate: { executor: 'constant-arrival-rate', rate: 5, // 5 iterations per second timeUnit: '1s', // 1 second duration: '2m', // 2 minutes preAllocatedVUs: 5, // Number of VUs to pre-allocate maxVUs: 10, // Maximum number of VUs to allow if needed }, },};export default function () { http.get('https://your-endpoint.com/livez'); sleep(1);}EOF
Steps to reproduce
No response
Expected behavior
The expected graceful shutdown behaviour from a production-ready NestJs app should be:
on receiving SIGTERM signal, set readiness probe to fail with 503, to tell the orchestrator to stop sending requests
Wait X seconds to be sure traffic stops being forwarded to the app by Kubernetes
set readiness probe to fail with 503, to tell the orchestrator to stop sending requests
proceed to close the webserver (process last requests if there are still some long ones running)
proceed to close database connections and others connections & shutdown the app
Therefore, if the loadbalancer is still sending a request before being aware the endpoint is removed, the requests won't we seen as failed with 502, but instead will still be processed and not lead to downtime during a rolling update.
@Lp-Francois You’re right I should update the CONTRIBUTING & README. Basically, you can npm build and then npm link to make it linkable to any node project. So you can to any project and just run npm link @nestjs/terminus.
Using npm run build:all you can build all the samples, if you wanna work with the samples folder to test things. You just need to do that once. After npm build in the root of the project should suffice (it should re-link all the samples with the newly built files)
Is there an existing issue for this?
Current behavior
This implementation is incorrect: #2421 (to read first) & #2422
readiness
probe to fail with 503, to tell the orchestrator to stop sending requestsMinimum reproduction code
Load test your NestJS app running in a Kubernetes environment, and trigger a new deployment during this load test. You should notice a few failed requests.
Here is a simple example of load test you can run with k6:
Steps to reproduce
No response
Expected behavior
The expected graceful shutdown behaviour from a production-ready NestJs app should be:
setreadiness
probe to fail with 503, to tell the orchestrator to stop sending requestsreadiness
probe to fail with 503, to tell the orchestrator to stop sending requestsTherefore, if the loadbalancer is still sending a request before being aware the endpoint is removed, the requests won't we seen as failed with 502, but instead will still be processed and not lead to downtime during a rolling update.
Package version
latest
NestJS version
latest
Node.js version
latest
In which operating systems have you tested?
Other
Resources that explains why the few seconds sleep is necessary:
https://learnk8s.io/graceful-shutdown
In the meantime, simply setting a sleep to 0s in Terminus, and adding a lifecycle preStop hook to sleep X sec is enough to fix the behaviour.
The text was updated successfully, but these errors were encountered: