Skip to content

Commit

Permalink
Internal monitor impl not using coop mutex causing deadlocks on Android.
Browse files Browse the repository at this point in the history
On Android we have seen ANR issues, like the one described in
#111485. After investigating
several different dumps including all threads it turns out that we
could end up in a deadlock when init a monitor since that code path
didn't use a coop mutex and owner of lock could end up in GC code
while holding that lock, leading to deadlock if another thread was
about to lock the same monitor init lock. In several dumps we see
the following two threads:

Thread 1:

syscall+28
__futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+14
NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*)+384
sgen_gc_lock+105
mono_gc_wait_for_bridge_processing_internal+70
sgen_gchandle_get_target+288
alloc_mon+358
ves_icall_System_Threading_Monitor_Monitor_wait+452
ves_icall_System_Threading_Monitor_Monitor_wait_raw+583

Thread 2:

syscall+28
__futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+144
NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*)+652
alloc_mon+105
ves_icall_System_Threading_Monitor_Monitor_wait+452
ves_icall_System_Threading_Monitor_Monitor_wait_raw+583

So in this scenario Thread 1 holds monitor_mutex that is not a coop
mutex and end up trying to take GC lock, since it calls,
mono_gc_wait_for_bridge_processing_internal, but since a GC is already
started (waiting on STW to complete), Thread 1 will block holding
monitor_mutex.

Thread 2 will try to lock monitor_mutex as well, and since its not a
coop mutex it will block on OS __futex_wait_ex without changing
Mono thread state to blocking, preventing the STW from processing.

Fix is to switch to coop aware implementation of monitor_mutex.

Normally this should have been resolved on Android since we run
hybrid suspend meaning we should be able to run a signal handler on the
blocking thread that would suspend it meaning that STW would continue,
but for some reason the signal can't have been executed in this case putting the app under coop suspend limitations.

This fix will take care of the deadlock, but if there are issues running
Signals on Android, then threads not attached to runtime using
coop attach methods could end up in similar situations
blocking STW.
  • Loading branch information
lateralusX authored and github-actions committed Feb 10, 2025
1 parent 6cbac3f commit b99fad6
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions src/mono/mono/metadata/monitor.c
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,9 @@ struct _MonitorArray {
MonoThreadsSync monitors [MONO_ZERO_LEN_ARRAY];
};

#define mono_monitor_allocator_lock() mono_os_mutex_lock (&monitor_mutex)
#define mono_monitor_allocator_unlock() mono_os_mutex_unlock (&monitor_mutex)
static mono_mutex_t monitor_mutex;
#define mono_monitor_allocator_lock() mono_coop_mutex_lock (&monitor_mutex)
#define mono_monitor_allocator_unlock() mono_coop_mutex_unlock (&monitor_mutex)
static MonoCoopMutex monitor_mutex;
static MonoThreadsSync *monitor_freelist;
static MonitorArray *monitor_allocated;
static int array_size = 16;
Expand Down Expand Up @@ -255,7 +255,7 @@ lock_word_new_flat (gint32 owner)
void
mono_monitor_init (void)
{
mono_os_mutex_init_recursive (&monitor_mutex);
mono_coop_mutex_init_recursive (&monitor_mutex);
}

static int
Expand Down

0 comments on commit b99fad6

Please sign in to comment.