-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thread safety to zthr_{cancel|resume}() #8087
Conversation
Currently, zthr_resume() is not safe to call while zthr_cancel() is running. This is problematic because zthr_cancel() is called from spa_export_common() without holding any locks to prevent this race. This patch simply changes zthr_resume() so that it can handle races against other calls to zthr_cancel() and zthr_resume(). Signed-off-by: Tom Caputi <tcaputi@datto.com>
Codecov Report
@@ Coverage Diff @@
## master #8087 +/- ##
==========================================
- Coverage 78.46% 78.45% -0.01%
==========================================
Files 377 377
Lines 114518 114519 +1
==========================================
- Hits 89851 89846 -5
- Misses 24667 24673 +6
Continue to review full report at Codecov.
|
Related to #8070. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there may still be potential for a hang with a scenario involving a running zthr and simultaneous calls to zthr_cancel() and zthr_resume(). Could this happen?
- The zthr is running and a call is made to zthr_cancel() which acquires the zthr_lock, broadcasts to waken the zthr, and then cv_waits.
- A call was made to the zthr_resume() as well and it acquires the zthr_lock before the zthr thread
and then ends up in the loop waiting for the in progress cancel to complete and cv_waits. - The zthr thread then acquires the lock and exits setting zthr_thread = NULL and broadcasting.
- Both the resume and cancel threads are woken but the resume thread wins the race and gets
the lock. There is still an ongoing cancel so it stays in the loop and cv_waits() again. - The cancel thread then acquires the zthr_lock and completes the cancel and gives up the lock.
- The resume thread is left hanging with no one to wake it.
One way to eliminate that hang, assuming it's possible, is to add a cv_signal() for the to the end of zthr_cancel().
@brad-lewis Thats a good point. I spoke with @sdimitro about this a while ago and we were saying that the zthr code could use a bit of work to remove races like this in general. @sdimitro has that happened yet or have other plans been made? |
Closing is favor of the approach in #8229. |
Currently, zthr_resume() is not safe to call while zthr_cancel()
is running. This is problematic because zthr_cancel() is called from
spa_export_common() without holding any locks to prevent this race.
This patch simply changes zthr_resume() so that it can handle races
against other calls to zthr_cancel() and zthr_resume().
Signed-off-by: Tom Caputi tcaputi@datto.com
How Has This Been Tested?
Observed and verified with ztest.
Types of changes
Checklist:
Signed-off-by
.