-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZTHR refactoring - eliminates multiple races #8070
Conversation
There are a couple of similar race conditions with code as it is right now: 1] Race Condition 1 As we are running the zthr_func(), zthr_cancel()` raises the cancel flag and blocks until zthr_thread is NULL, dropping the lock in the meantime. If`zthr_resume() is called at that point, it grabs the lock and it assumes that the cancel flag is not raised, failing the assertion. In reality, this doesn't usually happen because the spa_export() code always checks if zthr_isrunning() before calling zthr_resume(), but still there is a small window where zthr_isrunning() drops the lock and zthr_resume() has to grab it again. With multiple threads there could be scenarios where the same assertion fails. This race condition also sets up the basis for race condition 2. This patch: Puts the check of zthr_isrunning() inside zthr_resume(). This takes care of aforementioned window between zthr_isrunning() and zthr_cancel(). 2] Race Condition 2 zthr_cancel() was called while the zthr_func was running and the cancel flag was raised. zthr_cancel() waits for zthr_thread to be set to NULL. This doesn't happen until zthr_procedure calls zthr_exit() which sets zthr_thread to NULL and drops the zthr_lock. Then, ideally, the zthr_cancel() thread would wake up and set the cancel flag 0. Unfortunately, if a zthr_resume() is called at that point `zthr_thread` is NULL but the cancel flag is still raised, thus we can fail the assertion again. This patch: Ensures that the zthr is the one reseting the cancel flag to 0. This is nice for 2 reasons: (1) Any flag raised from other threads requesting cancellation, resumption, or waiting on the zthr, are reset from the zthr itself. This is makes reasoning easier. (current flags: zthr_cancel, zthr_initializing) (2) The zthr_procedure() sets zthr_thread to NULL, resets the cancel flag and then drops the lock. So even if a zthr_resume()` comes up and sees that zthr_thread is NULL it won't fail the assertion because the cancel flag has been reset. Withe the above change though another problem comes up. If a zthr_cancel() raises the cancel flag and waits for zthr_thread to be set to NULL. The zthr gets cancelled, resets the flag and sets zthr_thread to NULL. Then, before the zthr_cancel() wakes up again, a zthr_resume() comes up and spawns a new thread, essentially repopulating zthr_thread. As a result, zthr_cancel() hangs forever waiting for zthr_thread to be NULL. The way this patch deals with the above, is by introducing the zthr_cancellation_stamp which is basically a logical timestamp for each cancellation. The timestamp is incremented every time the zthr gets cancelled. The idea is that before dropping the lock, zthr_cancel() looks at the value of the current at the time timestamp. This way, even if the above scenario happens when zthr_thread has been repopulated the cancellation timestamp of the zthr would have changed and the zthr_cancel() thread will detect that and go on with its flow instead of going to sleep again. Other Side-changes: * Ripped out zthr_rc (zthr return code) and change the signature of zthr_func_t to return void. This code was never really needed. * Ripped out zthr_exit(). It wasn't really used anywhere outside the ZTHR code (e.g. zthr_procedure()) and even there we were dropping and grabbing the zthr lock for no reason. * Introduced zthr_initializing variables. Basically the flag answers the question "Are we initializing the zthr now?". This deals with any race conditions that can take place while the zthr is initializing (this is critical especially for later when the fast clone deletion feature will add the zthr_wait() changes). Signed-off-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
8627a78
to
a023970
Compare
Codecov Report
@@ Coverage Diff @@
## master #8070 +/- ##
==========================================
+ Coverage 78.47% 78.51% +0.03%
==========================================
Files 377 377
Lines 114494 114501 +7
==========================================
+ Hits 89851 89899 +48
+ Misses 24643 24602 -41
Continue to review full report at Codecov.
|
kmutex_t zthr_lock; | ||
kcondvar_t zthr_cv; | ||
boolean_t zthr_cancel; | ||
uint64_t zthr_cancellation_stamp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's potential confusion around the term "stamp" / "timestamp", since this is not wall-clock time, hrtime, or ticks. How about calling this the "generation" instead (zthr_cancel_gen
could be OK for brevity)?
|
||
/* | ||
* Mark the end of initialization and notify waiters | ||
* (e.g. callers of zthr_resume()) that we are done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"done" -> "ready" or "running"? or "done initializing"?
* Wait until the zthr has finished initializing. | ||
* This ensures that the only time another thread | ||
* can obtain the lock is when the zthr func is | ||
* running or the zthr is asleep. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does zthr_initializing accomplish this? It looks like it's just used when starting/resuming the thread to make sure we don't return until the thread has grabbed the lock. So maybe it helps this thread not be able to call back into the zthr code before the thread has started running, but I think other threads could still call concurrently and see zthr_initializing=TRUE
.
Superseded by #8229 |
Motivation and Context
This fixes issue: #7744
It also fixes a bunch of other related and non-related races mentioned below
Description
There are a couple of similar race conditions with code
as it is right now:
1] Race Condition 1
As we are running the zthr_func(), zthr_cancel() raises
the cancel flag and blocks until zthr_thread is NULL,
dropping the lock in the meantime. If`zthr_resume() is
called at that point, it grabs the lock and it assumes
that the cancel flag is not raised, failing the assertion.
In reality, this doesn't usually happen because the
spa_export() code always checks if zthr_isrunning()
before calling zthr_resume(), but still there is a small
window where zthr_isrunning() drops the lock and
zthr_resume() has to grab it again. With multiple threads
there could be scenarios where the same assertion fails.
This race condition also sets up the basis for race
condition 2.
This patch:
Puts the check of zthr_isrunning() inside zthr_resume().
This takes care of aforementioned window between
zthr_isrunning() and zthr_cancel().
2] Race Condition 2
zthr_cancel() was called while the zthr_func was running
and the cancel flag was raised. zthr_cancel() waits for
zthr_thread to be set to NULL. This doesn't happen until
zthr_procedure calls zthr_exit() which sets zthr_thread
to NULL and drops the zthr_lock. Then, ideally, the
zthr_cancel() thread would wake up and set the cancel
flag 0. Unfortunately, if a zthr_resume() is called at
that point
zthr_thread
is NULL but the cancel flag isstill raised, thus we can fail the assertion again.
This patch:
Ensures that the zthr is the one reseting the cancel flag
to 0. This is nice for 2 reasons:
(1) Any flag raised from other threads requesting
cancellation, resumption, or waiting on the zthr, are reset
from the zthr itself. This is makes reasoning easier.
(current flags: zthr_cancel, zthr_initializing)
(2) The zthr_procedure() sets zthr_thread to NULL, resets
the cancel flag and then drops the lock. So even if a
zthr_resume()` comes up and sees that zthr_thread is NULL
it won't fail the assertion because the cancel flag has
been reset.
Withe the above change though another problem comes up. If
a zthr_cancel() raises the cancel flag and waits for
zthr_thread to be set to NULL. The zthr gets cancelled,
resets the flag and sets zthr_thread to NULL. Then, before
the zthr_cancel() wakes up again, a zthr_resume() comes up
and spawns a new thread, essentially repopulating zthr_thread.
As a result, zthr_cancel() hangs forever waiting for
zthr_thread to be NULL.
The way this patch deals with the above, is by introducing
the zthr_cancellation_stamp which is basically a logical
timestamp for each cancellation. The timestamp is incremented
every time the zthr gets cancelled. The idea is that before
dropping the lock, zthr_cancel() looks at the value of the
current at the time timestamp. This way, even if the above
scenario happens when zthr_thread has been repopulated
the cancellation timestamp of the zthr would have changed
and the zthr_cancel() thread will detect that and go on with
its flow instead of going to sleep again.
Other Side-changes:
signature of zthr_func_t to return void. This code was never
really needed.
outside the ZTHR code (e.g. zthr_procedure()) and even there
we were dropping and grabbing the zthr lock for no reason.
answers the question "Are we initializing the zthr now?".
This deals with any race conditions that can take place
while the zthr is initializing (this is critical especially
for later when the fast clone deletion feature will add the
zthr_wait() changes).
How Has This Been Tested?
Running ztest for a while in Linux (at the time of this writing for ~2 hours).
In illumos I had this patch running for around week with
ztest_spa_create_destroy
having a frequency of always.Types of changes
Checklist:
Signed-off-by
.