-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace covert pipe with self-pipe SIGCHLD handler #2550
Open
ntrrgc
wants to merge
1
commit into
ninja-build:master
Choose a base branch
from
ntrrgc:2025-01-10-sigchld
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+119
−7
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid complicated logic in the signal handler. Instead write the child pid to a self-pipe that can be waited on in DoWork() (ensure the pipe descriptors are not leaked to spawned commands too). This avoids leaking the file descriptor to a global table, which makes reasoning about lifecycle difficult (e.g. there are code paths where this descriptor will never be closed properly in your code).
Using a linked list or any kind of map to find the fd from the pid is probably not needed. Just scan the array of Subprocess instances linearly, since command termination is not in the critical performance path (even when 1000+ of commands are launched in parallel).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this originally, but I don't know how many processes could end up in that table if someone really runs a lot of jobs share terminal, or if that is ever an use case.
I was trying to avoid modifications on
DoWork()
. I considered writing to pipes, but that has the additional risk that a write can potentially deadlock if the pipe buffer is full.However, if we can rely on ppoll/pselect() returning EINTR after the first SIGCHLD signal handler execution, we could instead use a simple
int
field to communicate between the two, similar to how it's done for SIGINT:Before ppoll(), set the "terminated PID" field to -1. Call ppoll(), if you get EINTR, check whether that field got a value other than -1. If it did, that's a process that is done. At that point we wouldn't even need the pipes, although they may still be useful to keep things orthogonal between the console and non-console use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it turns out that Ninja, in its current design, only allows to run a single "console" sub-process at a time (this is implemented elsewhere and is not visible in this file). I think this can be leveraged to avoid using a self-pipe entirely.
Technically, this is extremely unlikely. In this code, the signal handler can only run during the
pselect()
/ppoll()
call. It would require thousands of processes to all terminate during that exact syscall to block the pipe buffer (which are very large these days, see https://github.com/afborchert/pipebuf?tab=readme-ov-file#some-results for some not-so-recent practical results).But we can avoid pipes nonetheless.
There is no actual guarantee that the system call would return after only a single SIGCHLD signal was handled.
On the other hand, because there is only one console subprocess, it should be possible to write its pid value to a global variable that the signal handler compares to. In case of success, it would set an atomic flag to true that can be trivially checked in DoWork(). More specifically:
Add
SubprocessSet::console_subproc_
as aSubprocess*
pointer to the current console process if any.Ensure that starting a new subprocess updates the pointer if needed (and assert that only one can exist).
Add two global sig_atomic_t values. One
s_console_pid
, will contain the pid of the console subprocess after it is started, or a special value (e.g. -1) to indicate there is no console process currently (which would be written in Subprocess::Finish). The seconds_console_exited
will be used as a boolean flag.The SIGCHLD handler simply compares the signal's pid to the value of
s_console_pid
. If they match, it setss_console_exited
to 1.In DoWork(), set
s_console_exited
to 0 before callingpselect()
orppoll()
, and look at its value after the call.Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sad. Do you have a source on this by any chance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I am trying to wrap my head around signals again, this is all so subtle, some details coming back:
SIGCHLD
is a standard signals, not a real one, which means that it is not queued. When several processes terminate outside of thepselect()
call, they are collapsed into a single signal handler call during the syscall (probably passing the pid of the last process). Seehttps://stackoverflow.com/questions/48750382/can-not-receive-all-the-sigchld
In other words, you we can only treat
SIGCHLD
as a boolean flag that says "some child has stopped", then have to usewaitpid(..., WNOHANG)
to scan the state of all processes of interest. Luckily for Ninja, that would be looking at the state of the single console process.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that makes
si_pid
in SIGCHLD near-useless. If that is true, sigaction(2) should be updated to point it out.