Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libstd/sys/unix/process.rs: reap a zombie who didn't get through to exec(2) #19454

Merged
merged 1 commit into from
Dec 5, 2014

Conversation

nodakai
Copy link
Contributor

@nodakai nodakai commented Dec 2, 2014

After the library successfully called fork(2), the child does several
setup works such as setting UID, GID and current directory before it
calls exec(2). When those setup works failed, the child exits but the
parent didn't call waitpid(2) and left it as a zombie.

This patch also add several sanity checks. They shouldn't make any
noticeable impact to runtime performance.

The new test case in libstd/io/process.rs calls the ps command to check
if the new code can really reap a zombie.
The output of ps -A -o pid,sid,command should look like this:

  PID   SID COMMAND
    1     1 /sbin/init
    2     0 [kthreadd]
    3     0 [ksoftirqd/0]
...
12562  9237 ./spawn-failure
12563  9237 [spawn-failure] <defunct>
12564  9237 [spawn-failure] <defunct>
...
12592  9237 [spawn-failure] <defunct>
12593  9237 ps -A -o pid,sid,command
12884 12884 /bin/zsh
12922 12922 /bin/zsh
...

where ./spawn-failure is my test program which intentionally leaves many
zombies. Filtering the output with the "SID" (session ID) column is
a quick way to tell if a process (zombie) was spawned by my own test
program. Then the number of "defunct" lines is the number of zombie
children.

@rust-highfive
Copy link
Collaborator

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!

return match input.read(&mut bytes) {
Ok(4) => {
Ok(8) => {
assert!(b'N' == bytes[4] && b'O' == bytes[5] &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be cleaned up to assert_eq!(b"NOEX", bytes[4..8]).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also probably should have added some comments quite some time ago, but could you add some words explaining what this is doing? Basically just mention that the pipe is flagged with FIOCLEX so it'll close on a successful exec, and otherwise an errno will be written into it from the child if some syscall fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially what you do here is (de)serialization of a 32 bit integer into/from a byte-oriented stream. I believe we should always add some kinds of headers or footers for sanity check / debugging aid whenever we do serialization. Its runtime cost should be negligible compared to that of the fork(2) syscall. The choice of b"NOEX" was just arbitrary. "No exec call".

I worried if b"NOEX" == bytes[4..8] might be compiled into a costly code. Perhaps it was just my taste. If I really optimize it I would have used a helper function like this:

#[inline]
fn combine(a: u8, b: u8, c: u8, d: u8) -> u32 {
    let aa = a as u32;
    let bb = b as u32;
    let cc = c as u32;
    let dd = d as u32;

    aa | (bb << 8) | (cc << 16) | (dd << 24)
}

@alexcrichton
Copy link
Member

Nice find @nodakai! I totally forgot about this case!

…xec(2).

After the library successfully called fork(2), the child does several
setup works such as setting UID, GID and current directory before it
calls exec(2).  When those setup works failed, the child exits but the
parent didn't call waitpid(2) and left it as a zombie.

This patch also add several sanity checks.  They shouldn't make any
noticeable impact to runtime performance.

The new test case run-pass/wait-forked-but-failed-child.rs calls the ps
command to check if the new code can really reap a zombie.  When
I intentionally create many zombies with my test program
./spawn-failure, The output of "ps -A -o pid,sid,command" should look
like this:

  PID   SID COMMAND
    1     1 /sbin/init
    2     0 [kthreadd]
    3     0 [ksoftirqd/0]
...
12562  9237 ./spawn-failure
12563  9237 [spawn-failure] <defunct>
12564  9237 [spawn-failure] <defunct>
...
12592  9237 [spawn-failure] <defunct>
12593  9237 ps -A -o pid,sid,command
12884 12884 /bin/zsh
12922 12922 /bin/zsh
...

Filtering the output with the "SID" (session ID) column is a quick way
to tell if a process (zombie) was spawned by my own test program.  Then
the number of "defunct" lines is the number of zombie children.

Signed-off-by: NODA, Kai <nodakai@gmail.com>
@nodakai nodakai force-pushed the libstd-reap-failed-child branch from 149fa65 to 74fb798 Compare December 5, 2014 02:27
@nodakai
Copy link
Contributor Author

nodakai commented Dec 5, 2014

  1. As for message validation on the CLOEXEC pipe by my NOEX footer, I hope I made my intention clearer.
  2. Introduce a utility fn combine(u8, u8, u8, u8) -> i32
  3. Before errno: int is serialized into a pipe, it is cast to u32.
  4. assert! on p.wait(0) have a custom error message.
  5. Use Command::output() as suggested by @alexcrichton
  6. Test code is moved into a new run-pass test wait-forked-but-failed-child.rs

@alexcrichton alexcrichton merged commit 74fb798 into rust-lang:master Dec 5, 2014
@nodakai nodakai deleted the libstd-reap-failed-child branch December 17, 2014 04:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants