Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZTHR refactoring - eliminates multiple races #8070

Closed
wants to merge 1 commit into from

Conversation

sdimitro
Copy link
Contributor

@sdimitro sdimitro commented Oct 29, 2018

Motivation and Context

This fixes issue: #7744
It also fixes a bunch of other related and non-related races mentioned below

Description

There are a couple of similar race conditions with code
as it is right now:

1] Race Condition 1

As we are running the zthr_func(), zthr_cancel() raises
the cancel flag and blocks until zthr_thread is NULL,
dropping the lock in the meantime. If`zthr_resume() is
called at that point, it grabs the lock and it assumes
that the cancel flag is not raised, failing the assertion.
In reality, this doesn't usually happen because the
spa_export() code always checks if zthr_isrunning()
before calling zthr_resume(), but still there is a small
window where zthr_isrunning() drops the lock and
zthr_resume() has to grab it again. With multiple threads
there could be scenarios where the same assertion fails.
This race condition also sets up the basis for race
condition 2.

This patch:
Puts the check of zthr_isrunning() inside zthr_resume().
This takes care of aforementioned window between
zthr_isrunning() and zthr_cancel().

2] Race Condition 2
zthr_cancel() was called while the zthr_func was running
and the cancel flag was raised. zthr_cancel() waits for
zthr_thread to be set to NULL. This doesn't happen until
zthr_procedure calls zthr_exit() which sets zthr_thread
to NULL and drops the zthr_lock. Then, ideally, the
zthr_cancel() thread would wake up and set the cancel
flag 0. Unfortunately, if a zthr_resume() is called at
that point zthr_thread is NULL but the cancel flag is
still raised, thus we can fail the assertion again.

This patch:
Ensures that the zthr is the one reseting the cancel flag
to 0. This is nice for 2 reasons:
(1) Any flag raised from other threads requesting
cancellation, resumption, or waiting on the zthr, are reset
from the zthr itself. This is makes reasoning easier.
(current flags: zthr_cancel, zthr_initializing)
(2) The zthr_procedure() sets zthr_thread to NULL, resets
the cancel flag and then drops the lock. So even if a
zthr_resume()` comes up and sees that zthr_thread is NULL
it won't fail the assertion because the cancel flag has
been reset.

Withe the above change though another problem comes up. If
a zthr_cancel() raises the cancel flag and waits for
zthr_thread to be set to NULL. The zthr gets cancelled,
resets the flag and sets zthr_thread to NULL. Then, before
the zthr_cancel() wakes up again, a zthr_resume() comes up
and spawns a new thread, essentially repopulating zthr_thread.
As a result, zthr_cancel() hangs forever waiting for
zthr_thread to be NULL.

The way this patch deals with the above, is by introducing
the zthr_cancellation_stamp which is basically a logical
timestamp for each cancellation. The timestamp is incremented
every time the zthr gets cancelled. The idea is that before
dropping the lock, zthr_cancel() looks at the value of the
current at the time timestamp. This way, even if the above
scenario happens when zthr_thread has been repopulated
the cancellation timestamp of the zthr would have changed
and the zthr_cancel() thread will detect that and go on with
its flow instead of going to sleep again.

Other Side-changes:

  • Ripped out zthr_rc (zthr return code) and change the
    signature of zthr_func_t to return void. This code was never
    really needed.
  • Ripped out zthr_exit(). It wasn't really used anywhere
    outside the ZTHR code (e.g. zthr_procedure()) and even there
    we were dropping and grabbing the zthr lock for no reason.
  • Introduced zthr_initializing variables. Basically the flag
    answers the question "Are we initializing the zthr now?".
    This deals with any race conditions that can take place
    while the zthr is initializing (this is critical especially
    for later when the fast clone deletion feature will add the
    zthr_wait() changes).

How Has This Been Tested?

Running ztest for a while in Linux (at the time of this writing for ~2 hours).
In illumos I had this patch running for around week with ztest_spa_create_destroy having a frequency of always.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 29, 2018
There are a couple of similar race conditions with code
as it is right now:

1] Race Condition 1

As we are running the zthr_func(), zthr_cancel()` raises
the cancel flag and blocks until zthr_thread is NULL,
dropping the lock in the meantime. If`zthr_resume() is
called at that point, it grabs the lock and it assumes
that the cancel flag is not raised, failing the assertion.
In reality, this doesn't usually happen because the
spa_export() code always checks if zthr_isrunning()
before calling zthr_resume(), but still there is a small
window where zthr_isrunning() drops the lock and
zthr_resume() has to grab it again. With multiple threads
there could be scenarios where the same assertion fails.
This race condition also sets up the basis for race
condition 2.

This patch:
Puts the check of zthr_isrunning() inside zthr_resume().
This takes care of aforementioned window between
zthr_isrunning() and zthr_cancel().

2] Race Condition 2
zthr_cancel() was called while the zthr_func was running
and the cancel flag was raised. zthr_cancel() waits for
zthr_thread to be set to NULL. This doesn't happen until
zthr_procedure calls zthr_exit() which sets zthr_thread
to NULL and drops the zthr_lock. Then, ideally, the
zthr_cancel() thread would wake up and set the cancel
flag 0. Unfortunately, if a zthr_resume() is called at
that point `zthr_thread` is NULL but the cancel flag is
still raised, thus we can fail the assertion again.

This patch:
Ensures that the zthr is the one reseting the cancel flag
to 0. This is nice for 2 reasons:
(1) Any flag raised from other threads requesting
cancellation, resumption, or waiting on the zthr, are reset
from the zthr itself. This is makes reasoning easier.
(current flags: zthr_cancel, zthr_initializing)
(2) The zthr_procedure() sets zthr_thread to NULL, resets
the cancel flag and then drops the lock. So even if a
zthr_resume()` comes up and sees that zthr_thread is NULL
it won't fail the assertion because the cancel flag has
been reset.

Withe the above change though another problem comes up. If
a zthr_cancel() raises the cancel flag and waits for
zthr_thread to be set to NULL. The zthr gets cancelled,
resets the flag and sets zthr_thread to NULL. Then, before
the zthr_cancel() wakes up again, a zthr_resume() comes up
and spawns a new thread, essentially repopulating zthr_thread.
As a result, zthr_cancel() hangs forever waiting for
zthr_thread to be NULL.

The way this patch deals with the above, is by introducing
the zthr_cancellation_stamp which is basically a logical
timestamp for each cancellation. The timestamp is incremented
every time the zthr gets cancelled. The idea is that before
dropping the lock, zthr_cancel() looks at the value of the
current at the time timestamp. This way, even if the above
scenario happens when zthr_thread has been repopulated
the cancellation timestamp of the zthr would have changed
and the zthr_cancel() thread will detect that and go on with
its flow instead of going to sleep again.

Other Side-changes:
* Ripped out zthr_rc (zthr return code) and change the
signature of zthr_func_t to return void. This code was never
really needed.
* Ripped out zthr_exit(). It wasn't really used anywhere
outside the ZTHR code (e.g. zthr_procedure()) and even there
we were dropping and grabbing the zthr lock for no reason.
* Introduced zthr_initializing variables. Basically the flag
answers the question "Are we initializing the zthr now?".
This deals with any race conditions that can take place
while the zthr is initializing (this is critical especially
for later when the fast clone deletion feature will add the
zthr_wait() changes).

Signed-off-by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
@codecov
Copy link

codecov bot commented Oct 30, 2018

Codecov Report

Merging #8070 into master will increase coverage by 0.03%.
The diff coverage is 95.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8070      +/-   ##
==========================================
+ Coverage   78.47%   78.51%   +0.03%     
==========================================
  Files         377      377              
  Lines      114494   114501       +7     
==========================================
+ Hits        89851    89899      +48     
+ Misses      24643    24602      -41
Flag Coverage Δ
#kernel 78.72% <95.23%> (-0.12%) ⬇️
#user 67.58% <88.09%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bea7578...a023970. Read the comment docs.

kmutex_t zthr_lock;
kcondvar_t zthr_cv;
boolean_t zthr_cancel;
uint64_t zthr_cancellation_stamp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's potential confusion around the term "stamp" / "timestamp", since this is not wall-clock time, hrtime, or ticks. How about calling this the "generation" instead (zthr_cancel_gen could be OK for brevity)?


/*
* Mark the end of initialization and notify waiters
* (e.g. callers of zthr_resume()) that we are done.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"done" -> "ready" or "running"? or "done initializing"?

* Wait until the zthr has finished initializing.
* This ensures that the only time another thread
* can obtain the lock is when the zthr func is
* running or the zthr is asleep.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does zthr_initializing accomplish this? It looks like it's just used when starting/resuming the thread to make sure we don't return until the thread has grabbed the lock. So maybe it helps this thread not be able to call back into the zthr code before the thread has started running, but I think other threads could still call concurrently and see zthr_initializing=TRUE.

@sdimitro
Copy link
Contributor Author

sdimitro commented Jan 2, 2019

Superseded by #8229

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants