Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide jobs with an AbortSignal to attempt to gracefully interrupt jobs after shutdown timeout occurs #412

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jbielick
Copy link
Owner

@jbielick jbielick commented Jan 9, 2025

Purpose

A job may be interrupted when a worker shuts down. In this case there are two mechanisms to ensure graceful interruption: the shutdown timeout and the execution context AbortSignal. The shutdown timeout is configured in WorkerOptions.timeout. When a worker is instructed to stop (via process signal or server message), it will stop accepting new work (e.g. quiet) and wait the configured duration for any in-progress jobs to complete uninterrupted. If this duration elapses and jobs are still in progress, these jobs will receive an AbortSignal via Context.signal. All jobs will be FAILed on the Faktory server, allowing them to retry later.

The abort signal can be used to interrupt asynchronous processes and perform some cleanup tasks before an abrupt exit (process.exit). After the abort signal is sent, a job will have 3 seconds to perform cleanup before the process is abruptly exited.

Approach

  • Private AbortController created upon .work()
  • After stop() and shutdown timeout, in-progress jobs are sent an abort signal and subsequently FAILed on the server.
  • 3 seconds delay is added after abort signal is sent to allow cleanup
  • exit code when jobs are aborted at end of grace period is 1

Example - A long-running subprocess:

faktory.register("JobWithAbort", (...args) => async ({ signal }) => {
  try {
    await execa("ffmpeg", [/* arg1, arg2, ..., argN */], { cancelSignal: signal });
  } catch (e) {
    if (e.code === "ABORT_ERR") {
      // Remove some tempfiles or other type of cleanup...
      // Propagating the ABORT_ERR is not necessary, the job will be FAILed if it was in-progress 
      // at the end of the shutdown timeout
    }
  }
});

Closes #409
Closes #251

- private AbortController created upon .work()
- After stop() and shutdown timeout, in-progress jobs are sent an abort
  signal and subsequently FAILed on the server.
- 3 seconds delay is added after abort signal is sent to allow cleanup
- exit code when jobs are aborted at end of grace period is 1

A job may be interrupted when a worker shuts down. In this case there
are two mechanisms to ensure graceful interruption: the shutdown timeout
and the execution context `AbortSignal`. The shutdown timeout is
configured in `WorkerOptions.timeout`. When a worker is instructed to
stop (via process signal or server message), it will stop accepting new
work (e.g. `quiet`) and wait the configured duration for any in-progress
jobs to complete uninterrupted. If this duration elapses and jobs are
still in progress, these jobs will receive an AbortSignal via
`Context.signal`. All jobs will be `FAIL`ed on the Faktory server,
allowing them to retry later. The abort signal can be used to interrupt
asynchronous processes and perform some cleanup tasks before an abrupt
exit (`process.exit`). After the abort signal is sent, a job will have 3
seconds to perform cleanup before the process is abruptly exited.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Pass cancellation signal to job functions so they can gracefully shut down if necessary
1 participant