Skip to content

Commit

Permalink
Fix child processes not being reaped when Process.detach used (puma…
Browse files Browse the repository at this point in the history
…#3314)

* Fix child processes not being reaped when `Process.detach` used

Starting with Puma v6.4.1, we observed that killed Puma cluster
workers were never being restarted when the parent was run as PID
1. For example, I issued a `kill 44` and PID 44 remained in the
`defunct` state:

```
git@gitlab-webservice-default-78664bb757-2nxvh:/var/log/gitlab$ ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
git            1       0  0 Jan09 ?        00:01:39 puma 6.4.1 (tcp://0.0.0.0:8080) [gitlab-puma-worker]
git           23       1  0 Jan09 ?        00:05:46 /usr/local/bin/gitlab-logger /var/log/gitlab
git           41       1  0 Jan09 ?        00:01:55 ruby /srv/gitlab/bin/metrics-server
git           44       1  0 Jan09 ?        00:02:41 [ruby] <defunct>
git           46       1  0 Jan09 ?        00:02:38 puma: cluster worker 1: 1 [gitlab-puma-worker]
git           48       1  0 Jan09 ?        00:02:42 puma: cluster worker 2: 1 [gitlab-puma-worker]
git           49       1  0 Jan09 ?        00:02:41 puma: cluster worker 3: 1 [gitlab-puma-worker]
git         5205       0  0 21:57 pts/0    00:00:00 bash
git         5331    5205  0 22:00 pts/0    00:00:00 ps -ef
```

Further investigation showed that the introduction of
`Process.wait2(-1, Process::WNOHANG)` in puma#3255 never appears to return
anything when `Process.detach` is run on some process that has not
exited. This bug appears to be present from Ruby 2.6 to 3.2, but has
been been fixed in Ruby 3.3: https://bugs.ruby-lang.org/issues/19837

Previously `Process.wait(w.pid, Process::WNOHANG)` was called on each
known worker PID. puma#3255 changed this behavior to do this only if the
`fork_worker` config parameter were enabled, but it seems that we
should always do this to ensure that terminated workers are
reaped in a timely manner.

Closes puma#3313

* Add integration test for Puma worker reaping

This test ensures that Puma handles the `Process.detach` bug described
in https://bugs.ruby-lang.org/issues/19837.
  • Loading branch information
stanhu authored Jan 26, 2024
1 parent f6ab4e9 commit 9bd838b
Show file tree
Hide file tree
Showing 3 changed files with 47 additions and 5 deletions.
9 changes: 6 additions & 3 deletions lib/puma/cluster.rb
Original file line number Diff line number Diff line change
Expand Up @@ -563,9 +563,12 @@ def wait_workers
@workers.reject! do |w|
next false if w.pid.nil?
begin
# When `fork_worker` is enabled, some worker may not be direct children, but grand children.
# Because of this they won't be reaped by `Process.wait2(-1)`, so we need to check them individually)
if reaped_children.delete(w.pid) || (@options[:fork_worker] && Process.wait(w.pid, Process::WNOHANG))
# We may need to check the PID individually because:
# 1. From Ruby versions 2.6 to 3.2, `Process.detach` can prevent or delay
# `Process.wait2(-1)` from detecting a terminated process: https://bugs.ruby-lang.org/issues/19837.
# 2. When `fork_worker` is enabled, some worker may not be direct children,
# but grand children. Because of this they won't be reaped by `Process.wait2(-1)`.
if reaped_children.delete(w.pid) || Process.wait(w.pid, Process::WNOHANG)
true
else
w.term if w.term?
Expand Down
11 changes: 11 additions & 0 deletions test/config/process_detach_before_fork.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
worker_shutdown_timeout 0

before_fork do
pid = fork do
sleep 30 # This has to exceed the test timeout
end

pid_filename = File.join(Dir.tmpdir, 'process_detach_test.pid')
File.write(pid_filename, pid)
Process.detach(pid)
end
32 changes: 30 additions & 2 deletions test/test_integration_cluster.rb
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,34 @@ def test_stuck_external_term_spawn
end
end

# From Ruby 2.6 to 3.2, `Process.detach` can delay or prevent
# `Process.wait2(-1)` from detecting a terminated child:
# https://bugs.ruby-lang.org/issues/19837. However,
# `Process.wait2(<child pid>)` still works properly. This bug has
# been fixed in Ruby 3.3.
def test_workers_respawn_with_process_detach
skip_unless_signal_exist? :KILL

config = 'test/config/process_detach_before_fork.rb'

worker_respawn(0, workers, config) do |phase0_worker_pids|
phase0_worker_pids.each do |pid|
Process.kill :KILL, pid
end
end

# `test/config/process_detach_before_fork.rb` forks and detaches a
# process. Since MiniTest attempts to join all threads before
# finishing, terminate the process so that the test can end quickly
# if it passes.
pid_filename = File.join(Dir.tmpdir, 'process_detach_test.pid')
if File.exist?(pid_filename)
pid = File.read(pid_filename).chomp.to_i
File.unlink(pid_filename)
Process.kill :TERM, pid if pid > 0
end
end

# mimicking stuck workers, test restart
def test_stuck_phased_restart
skip_unless_signal_exist? :USR1
Expand Down Expand Up @@ -681,10 +709,10 @@ def usr1_all_respond(unix: false, config: '')
end
end

def worker_respawn(phase = 1, size = workers)
def worker_respawn(phase = 1, size = workers, config = 'test/config/worker_shutdown_timeout_2.rb')
threads = []

cli_server "-w #{workers} -t 1:1 -C test/config/worker_shutdown_timeout_2.rb test/rackup/sleep_pid.ru"
cli_server "-w #{workers} -t 1:1 -C #{config} test/rackup/sleep_pid.ru"

# make sure two workers have booted
phase0_worker_pids = get_worker_pids
Expand Down

0 comments on commit 9bd838b

Please sign in to comment.