Skip to content

Commit

Permalink
Fix Deadlock with parallel stop-world/fork calls in MT (#15096)
Browse files Browse the repository at this point in the history
Trying to stop the world from a thread while threads are forking leads
to a deadlock situation in glibc (at least) because we disable the
reception of *all* signals while we fork.

Since the suspend and resume signals are handled directly and not
processed through the event loop (for obvious reasons) we can safely
keep these signals enabled. Apparently it's even safer.
  • Loading branch information
ysbaddaden authored Oct 19, 2024
1 parent 796a0d2 commit 57017f6
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 1 deletion.
7 changes: 7 additions & 0 deletions src/crystal/system/unix/process.cr
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,14 @@ struct Crystal::System::Process
newmask = uninitialized LibC::SigsetT
oldmask = uninitialized LibC::SigsetT

# block signals while we fork, so the child process won't forward signals it
# may receive to the parent through the signal pipe, but make sure to not
# block stop-the-world signals as it appears to create deadlocks in glibc
# for example; this is safe because these signal handlers musn't be
# registered through `Signal.trap` but directly through `sigaction`.
LibC.sigfillset(pointerof(newmask))
LibC.sigdelset(pointerof(newmask), System::Thread.sig_suspend)
LibC.sigdelset(pointerof(newmask), System::Thread.sig_resume)
ret = LibC.pthread_sigmask(LibC::SIG_SETMASK, pointerof(newmask), pointerof(oldmask))
raise RuntimeError.from_errno("Failed to disable signals") unless ret == 0

Expand Down
19 changes: 18 additions & 1 deletion src/crystal/system/unix/pthread.cr
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,8 @@ module Crystal::System::Thread
end
end

# the suspend/resume signals follow BDWGC
# the suspend/resume signals try to follow BDWGC but aren't exact (e.g. it may
# use SIGUSR1 and SIGUSR2 on FreeBSD instead of SIGRT).

private SIG_SUSPEND =
{% if flag?(:linux) %}
Expand All @@ -266,6 +267,22 @@ module Crystal::System::Thread
{% else %}
LibC::SIGXCPU
{% end %}

def self.sig_suspend : ::Signal
if GC.responds_to?(:sig_suspend)
GC.sig_suspend
else
::Signal.new(SIG_SUSPEND)
end
end

def self.sig_resume : ::Signal
if GC.responds_to?(:sig_resume)
GC.sig_resume
else
::Signal.new(SIG_RESUME)
end
end
end

# In musl (alpine) the calls to unwind API segfaults
Expand Down
14 changes: 14 additions & 0 deletions src/gc/boehm.cr
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,8 @@ lib LibGC

fun stop_world_external = GC_stop_world_external
fun start_world_external = GC_start_world_external
fun get_suspend_signal = GC_get_suspend_signal : Int
fun get_thr_restart_signal = GC_get_thr_restart_signal : Int
end

module GC
Expand Down Expand Up @@ -483,4 +485,16 @@ module GC
def self.start_world : Nil
LibGC.start_world_external
end

{% if flag?(:unix) %}
# :nodoc:
def self.sig_suspend : Signal
Signal.new(LibGC.get_suspend_signal)
end

# :nodoc:
def self.sig_resume : Signal
Signal.new(LibGC.get_thr_restart_signal)
end
{% end %}
end

0 comments on commit 57017f6

Please sign in to comment.