-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/8.0] Internal monitor impl not using coop mutex causing deadlocks on Android. #112374
Merged
steveisok
merged 1 commit into
release/8.0-staging
from
backport/pr-112358-to-release/8.0
Feb 12, 2025
Merged
[release/8.0] Internal monitor impl not using coop mutex causing deadlocks on Android. #112374
steveisok
merged 1 commit into
release/8.0-staging
from
backport/pr-112358-to-release/8.0
Feb 12, 2025
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
On Android we have seen ANR issues, like the one described in #111485. After investigating several different dumps including all threads it turns out that we could end up in a deadlock when init a monitor since that code path didn't use a coop mutex and owner of lock could end up in GC code while holding that lock, leading to deadlock if another thread was about to lock the same monitor init lock. In several dumps we see the following two threads: Thread 1: syscall+28 __futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+14 NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*)+384 sgen_gc_lock+105 mono_gc_wait_for_bridge_processing_internal+70 sgen_gchandle_get_target+288 alloc_mon+358 ves_icall_System_Threading_Monitor_Monitor_wait+452 ves_icall_System_Threading_Monitor_Monitor_wait_raw+583 Thread 2: syscall+28 __futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+144 NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*)+652 alloc_mon+105 ves_icall_System_Threading_Monitor_Monitor_wait+452 ves_icall_System_Threading_Monitor_Monitor_wait_raw+583 So in this scenario Thread 1 holds monitor_mutex that is not a coop mutex and end up trying to take GC lock, since it calls, mono_gc_wait_for_bridge_processing_internal, but since a GC is already started (waiting on STW to complete), Thread 1 will block holding monitor_mutex. Thread 2 will try to lock monitor_mutex as well, and since its not a coop mutex it will block on OS __futex_wait_ex without changing Mono thread state to blocking, preventing the STW from processing. Fix is to switch to coop aware implementation of monitor_mutex. Normally this should have been resolved on Android since we run hybrid suspend meaning we should be able to run a signal handler on the blocking thread that would suspend it meaning that STW would continue, but for some reason the signal can't have been executed in this case putting the app under coop suspend limitations. This fix will take care of the deadlock, but if there are issues running Signals on Android, then threads not attached to runtime using coop attach methods could end up in similar situations blocking STW.
This was referenced Feb 11, 2025
Open
jeffschwMSFT
approved these changes
Feb 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. please get a code review. we will take for consideration in 8.0.x
BrzVlad
approved these changes
Feb 11, 2025
lateralusX
approved these changes
Feb 12, 2025
/ba-g BA showing known tests |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #112358 to release/8.0
/cc @steveisok @lateralusX
Customer Impact
As reported in #111485, customers experienced deadlocks and ultimately crashes on their Android apps. This is due to certain code paths not using a cooperative mutex and ending up in GC code where another thread attempts to lock the same monitor init lock.
Regression
Normally, android operates in hybrid suspend, so a deadlock like this should never happen. It could be due to a native component they are using taking ownership of a signal handler we use for coop suspend and/or a bug in coop suspend itself.
Testing
Manual testing
Risk
Low
IMPORTANT: If this backport is for a servicing release, please verify that:
release/X.0-staging
, notrelease/X.0
.Package authoring no longer needed in .NET 9
IMPORTANT: Starting with .NET 9, you no longer need to edit a NuGet package's csproj to enable building and bump the version.
Keep in mind that we still need package authoring in .NET 8 and older versions.