ZTHR refactoring - eliminates multiple races #8070

sdimitro · 2018-10-29T22:00:20Z

Motivation and Context

This fixes issue: #7744
It also fixes a bunch of other related and non-related races mentioned below

Description

There are a couple of similar race conditions with code
as it is right now:

1] Race Condition 1

As we are running the zthr_func(), zthr_cancel() raises
the cancel flag and blocks until zthr_thread is NULL,
dropping the lock in the meantime. If`zthr_resume() is
called at that point, it grabs the lock and it assumes
that the cancel flag is not raised, failing the assertion.
In reality, this doesn't usually happen because the
spa_export() code always checks if zthr_isrunning()
before calling zthr_resume(), but still there is a small
window where zthr_isrunning() drops the lock and
zthr_resume() has to grab it again. With multiple threads
there could be scenarios where the same assertion fails.
This race condition also sets up the basis for race
condition 2.

This patch:
Puts the check of zthr_isrunning() inside zthr_resume().
This takes care of aforementioned window between
zthr_isrunning() and zthr_cancel().

2] Race Condition 2
zthr_cancel() was called while the zthr_func was running
and the cancel flag was raised. zthr_cancel() waits for
zthr_thread to be set to NULL. This doesn't happen until
zthr_procedure calls zthr_exit() which sets zthr_thread
to NULL and drops the zthr_lock. Then, ideally, the
zthr_cancel() thread would wake up and set the cancel
flag 0. Unfortunately, if a zthr_resume() is called at
that point zthr_thread is NULL but the cancel flag is
still raised, thus we can fail the assertion again.

This patch:
Ensures that the zthr is the one reseting the cancel flag
to 0. This is nice for 2 reasons:
(1) Any flag raised from other threads requesting
cancellation, resumption, or waiting on the zthr, are reset
from the zthr itself. This is makes reasoning easier.
(current flags: zthr_cancel, zthr_initializing)
(2) The zthr_procedure() sets zthr_thread to NULL, resets
the cancel flag and then drops the lock. So even if a
zthr_resume()` comes up and sees that zthr_thread is NULL
it won't fail the assertion because the cancel flag has
been reset.

Withe the above change though another problem comes up. If
a zthr_cancel() raises the cancel flag and waits for
zthr_thread to be set to NULL. The zthr gets cancelled,
resets the flag and sets zthr_thread to NULL. Then, before
the zthr_cancel() wakes up again, a zthr_resume() comes up
and spawns a new thread, essentially repopulating zthr_thread.
As a result, zthr_cancel() hangs forever waiting for
zthr_thread to be NULL.

The way this patch deals with the above, is by introducing
the zthr_cancellation_stamp which is basically a logical
timestamp for each cancellation. The timestamp is incremented
every time the zthr gets cancelled. The idea is that before
dropping the lock, zthr_cancel() looks at the value of the
current at the time timestamp. This way, even if the above
scenario happens when zthr_thread has been repopulated
the cancellation timestamp of the zthr would have changed
and the zthr_cancel() thread will detect that and go on with
its flow instead of going to sleep again.

Other Side-changes:

Ripped out zthr_rc (zthr return code) and change the
signature of zthr_func_t to return void. This code was never
really needed.
Ripped out zthr_exit(). It wasn't really used anywhere
outside the ZTHR code (e.g. zthr_procedure()) and even there
we were dropping and grabbing the zthr lock for no reason.
Introduced zthr_initializing variables. Basically the flag
answers the question "Are we initializing the zthr now?".
This deals with any race conditions that can take place
while the zthr is initializing (this is critical especially
for later when the fast clone deletion feature will add the
zthr_wait() changes).

How Has This Been Tested?

Running ztest for a while in Linux (at the time of this writing for ~2 hours).
In illumos I had this patch running for around week with ztest_spa_create_destroy having a frequency of always.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.

There are a couple of similar race conditions with code as it is right now: 1] Race Condition 1 As we are running the zthr_func(), zthr_cancel()` raises the cancel flag and blocks until zthr_thread is NULL, dropping the lock in the meantime. If`zthr_resume() is called at that point, it grabs the lock and it assumes that the cancel flag is not raised, failing the assertion. In reality, this doesn't usually happen because the spa_export() code always checks if zthr_isrunning() before calling zthr_resume(), but still there is a small window where zthr_isrunning() drops the lock and zthr_resume() has to grab it again. With multiple threads there could be scenarios where the same assertion fails. This race condition also sets up the basis for race condition 2. This patch: Puts the check of zthr_isrunning() inside zthr_resume(). This takes care of aforementioned window between zthr_isrunning() and zthr_cancel(). 2] Race Condition 2 zthr_cancel() was called while the zthr_func was running and the cancel flag was raised. zthr_cancel() waits for zthr_thread to be set to NULL. This doesn't happen until zthr_procedure calls zthr_exit() which sets zthr_thread to NULL and drops the zthr_lock. Then, ideally, the zthr_cancel() thread would wake up and set the cancel flag 0. Unfortunately, if a zthr_resume() is called at that point `zthr_thread` is NULL but the cancel flag is still raised, thus we can fail the assertion again. This patch: Ensures that the zthr is the one reseting the cancel flag to 0. This is nice for 2 reasons: (1) Any flag raised from other threads requesting cancellation, resumption, or waiting on the zthr, are reset from the zthr itself. This is makes reasoning easier. (current flags: zthr_cancel, zthr_initializing) (2) The zthr_procedure() sets zthr_thread to NULL, resets the cancel flag and then drops the lock. So even if a zthr_resume()` comes up and sees that zthr_thread is NULL it won't fail the assertion because the cancel flag has been reset. Withe the above change though another problem comes up. If a zthr_cancel() raises the cancel flag and waits for zthr_thread to be set to NULL. The zthr gets cancelled, resets the flag and sets zthr_thread to NULL. Then, before the zthr_cancel() wakes up again, a zthr_resume() comes up and spawns a new thread, essentially repopulating zthr_thread. As a result, zthr_cancel() hangs forever waiting for zthr_thread to be NULL. The way this patch deals with the above, is by introducing the zthr_cancellation_stamp which is basically a logical timestamp for each cancellation. The timestamp is incremented every time the zthr gets cancelled. The idea is that before dropping the lock, zthr_cancel() looks at the value of the current at the time timestamp. This way, even if the above scenario happens when zthr_thread has been repopulated the cancellation timestamp of the zthr would have changed and the zthr_cancel() thread will detect that and go on with its flow instead of going to sleep again. Other Side-changes: * Ripped out zthr_rc (zthr return code) and change the signature of zthr_func_t to return void. This code was never really needed. * Ripped out zthr_exit(). It wasn't really used anywhere outside the ZTHR code (e.g. zthr_procedure()) and even there we were dropping and grabbing the zthr lock for no reason. * Introduced zthr_initializing variables. Basically the flag answers the question "Are we initializing the zthr now?". This deals with any race conditions that can take place while the zthr is initializing (this is critical especially for later when the fast clone deletion feature will add the zthr_wait() changes). Signed-off-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>

codecov · 2018-10-30T08:37:53Z

Codecov Report

Merging #8070 into master will increase coverage by 0.03%.
The diff coverage is 95.23%.

@@            Coverage Diff             @@
##           master    #8070      +/-   ##
==========================================
+ Coverage   78.47%   78.51%   +0.03%     
==========================================
  Files         377      377              
  Lines      114494   114501       +7     
==========================================
+ Hits        89851    89899      +48     
+ Misses      24643    24602      -41

Flag	Coverage Δ
#kernel	`78.72% <95.23%> (-0.12%)`	⬇️
#user	`67.58% <88.09%> (-0.01%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bea7578...a023970. Read the comment docs.

ahrens · 2018-10-30T16:32:11Z

module/zfs/zthr.c

+	kmutex_t	zthr_lock;
+	kcondvar_t	zthr_cv;
+	boolean_t	zthr_cancel;
+	uint64_t	zthr_cancellation_stamp;


There's potential confusion around the term "stamp" / "timestamp", since this is not wall-clock time, hrtime, or ticks. How about calling this the "generation" instead (zthr_cancel_gen could be OK for brevity)?

ahrens · 2018-10-30T16:33:30Z

module/zfs/zthr.c

+
+	/*
+	 * Mark the end of initialization and notify waiters
+	 * (e.g. callers of zthr_resume()) that we are done.


"done" -> "ready" or "running"? or "done initializing"?

ahrens · 2018-10-30T23:10:39Z

module/zfs/zthr.c

+		* Wait until the zthr has finished initializing.
+		* This ensures that the only time another thread
+		* can obtain the lock is when the zthr func is
+		* running or the zthr is asleep.


How does zthr_initializing accomplish this? It looks like it's just used when starting/resuming the thread to make sure we don't return until the thread has grabbed the lock. So maybe it helps this thread not be able to call back into the zthr code before the thread has started running, but I think other threads could still call concurrently and see zthr_initializing=TRUE.

sdimitro · 2019-01-02T17:13:19Z

Superseded by #8229

sdimitro requested review from ahrens, behlendorf, tcaputi and shartse October 29, 2018 22:00

behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 29, 2018

sdimitro force-pushed the zthr_refactor branch from 8627a78 to a023970 Compare October 29, 2018 22:14

ahrens reviewed Oct 30, 2018

View reviewed changes

behlendorf mentioned this pull request Nov 5, 2018

Add thread safety to zthr_{cancel|resume}() #8087

Closed

12 tasks

sdimitro closed this Jan 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZTHR refactoring - eliminates multiple races #8070

ZTHR refactoring - eliminates multiple races #8070

sdimitro commented Oct 29, 2018 •

edited

Loading

codecov bot commented Oct 30, 2018 •

edited

Loading

ahrens Oct 30, 2018

ahrens Oct 30, 2018

ahrens Oct 30, 2018

sdimitro commented Jan 2, 2019

ZTHR refactoring - eliminates multiple races #8070

ZTHR refactoring - eliminates multiple races #8070

Conversation

sdimitro commented Oct 29, 2018 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

codecov bot commented Oct 30, 2018 • edited Loading

Codecov Report

ahrens Oct 30, 2018

Choose a reason for hiding this comment

ahrens Oct 30, 2018

Choose a reason for hiding this comment

ahrens Oct 30, 2018

Choose a reason for hiding this comment

sdimitro commented Jan 2, 2019

sdimitro commented Oct 29, 2018 •

edited

Loading

codecov bot commented Oct 30, 2018 •

edited

Loading