-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix child processes not being reaped when Process.detach
used
#3314
Conversation
02096b5
to
83f53bd
Compare
Thanks for the PR. LGTM, but I'd like to run some additional tests. Currently knee deep in some dumb s*&(... |
Does your latest comment #3313 (comment) change anything for these changes @stanhu? |
@dentarg I'm still investigating how it's possible for the application to interfere with I think this PR can't hurt, and it restores the previous behavior of checking known PIDs in addition to checking child processes. |
I think it would be worth it. Note that an alternative to running as PID 1 is to declare the process as a SUBREAPER: https://github.com/Shopify/pitchfork/blob/50f3e3389218e6e82c65638ab3c91f805ec02c4b/ext/pitchfork_http/child_subreaper.h It's a Linux only API, but would allow to do it as part of the main test suite when ran on Linux. I made a gem a while ago you could use: https://rubygems.org/gems/child_subreaper |
83f53bd
to
bd08e8a
Compare
I've updated this pull request to reflect that it appears that this works around a Ruby 3.1/3.2 bug with I think a test is a good idea, but that will take me longer to get to at the moment. |
Would be worth reporting upstream. Even if it's fixed in 3.3, it may be worth a backport. |
Good idea. I filed this bug: https://bugs.ruby-lang.org/issues/20181 |
bd08e8a
to
b20415f
Compare
Process.detach
used
@MSP-Greg @dentarg I've added an integration test that appears to reproduce https://bugs.ruby-lang.org/issues/20181. This test fails in |
It seems this bug was already reported in https://bugs.ruby-lang.org/issues/19837 and fixed in the |
Starting with Puma v6.4.1, we observed that killed Puma cluster workers were never being restarted when the parent was run as PID 1. For example, I issued a `kill 44` and PID 44 remained in the `defunct` state: ``` git@gitlab-webservice-default-78664bb757-2nxvh:/var/log/gitlab$ ps -ef UID PID PPID C STIME TTY TIME CMD git 1 0 0 Jan09 ? 00:01:39 puma 6.4.1 (tcp://0.0.0.0:8080) [gitlab-puma-worker] git 23 1 0 Jan09 ? 00:05:46 /usr/local/bin/gitlab-logger /var/log/gitlab git 41 1 0 Jan09 ? 00:01:55 ruby /srv/gitlab/bin/metrics-server git 44 1 0 Jan09 ? 00:02:41 [ruby] <defunct> git 46 1 0 Jan09 ? 00:02:38 puma: cluster worker 1: 1 [gitlab-puma-worker] git 48 1 0 Jan09 ? 00:02:42 puma: cluster worker 2: 1 [gitlab-puma-worker] git 49 1 0 Jan09 ? 00:02:41 puma: cluster worker 3: 1 [gitlab-puma-worker] git 5205 0 0 21:57 pts/0 00:00:00 bash git 5331 5205 0 22:00 pts/0 00:00:00 ps -ef ``` Further investigation showed that the introduction of `Process.wait2(-1, Process::WNOHANG)` in puma#3255 never appears to return anything when `Process.detach` is run on some process that has not exited. This bug appears to be present from Ruby 2.6 to 3.2, but has been been fixed in Ruby 3.3: https://bugs.ruby-lang.org/issues/19837 Previously `Process.wait(w.pid, Process::WNOHANG)` was called on each known worker PID. puma#3255 changed this behavior to do this only if the `fork_worker` config parameter were enabled, but it seems that we should always do this to ensure that terminated workers are reaped in a timely manner. Closes puma#3313
b20415f
to
883b630
Compare
@MSP-Greg Is there anything else I can help with to get this merged? |
Do you know how to clone one's self? Sorry. Soon, like no later than Sunday. I apologize for the delay, 'when it rains, it pours`
Ruby 3.2.3 is released... |
@MSP-Greg Sorry to bother you again, but would you have a moment to review? |
@stanhu No problem. Did you see the comment above about line 189 in test_integration_cluster.rb? I tried this without and with the lib patch on several Ruby versions. As you've mentioned, it passes on some 'current' patch releases and 'head', but fails on many older ones. |
@MSP-Greg I did not see the comment. Is it published? |
Sorry, my mistake. I didn't click 'Publish Review'. Until one does that, the reviewer can see it, but no one else... Do you see it now? |
This test ensures that Puma handles the `Process.detach` bug described in https://bugs.ruby-lang.org/issues/19837.
883b630
to
d4ac708
Compare
Thank you for the PR. Sorry for the delay. |
…ed (puma#3314)" This reverts commit 9bd838b. Did this start to happen after this commit? Sure looks like that so far https://github.com/dentarg/puma/actions/runs/7709969145/job/21012318760#step:10:43
@nateberkopec Would you mind releasing an update with this? We're blocked on Puma 6.4.0 until this gets shipped. |
@nateberkopec Sorry to bother you again. Could you find some time to release a new version of Puma? |
FYI, this pull request is still needed in Ruby 3.1 and 3.2 because |
Even if it was fixed in tiny releases, I'd still recommend keeping the workaround until the entire line is no longer supported by puma. |
I agree. I'm just trying to raise the need for a release because any application that launches a subprocess with Puma will find that their cluster workers no longer will be reaped from Puma 6.4.1 to 6.4.2. |
Description
Starting with Puma v6.4.1, we observed that killed Puma cluster workers were never being restarted when the parent was run as PID 1. For example, I issued a
kill 44
and PID 44 remained in thedefunct
state:Further investigation showed that the introduction of
Process.wait2(-1, Process::WNOHANG)
in #3255 never appears to return anything whenProcess.detach
is run on some process that has not exited. This bug appears to be present from Ruby 2.6 to 3.2, but has been been fixed in Ruby 3.3: https://bugs.ruby-lang.org/issues/19837Previously
Process.wait(w.pid, Process::WNOHANG)
was called on each known worker PID. #3255 changed this behavior to do this only if thefork_worker
config parameter were enabled, but it seems that we should always do this to ensure that terminated workers are reaped in a timely manner.Closes #3313
Your checklist for this pull request
[ci skip]
to the title of the PR.#issue
" to the PR description or my commit messages.