-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deadlock between spa_errlog_lock and dp_config_rwlock #14289
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we might want a test case for this to avoid future regressions, but otherwise, this looks good tome.
After this change the code now reliably asserts when running the |
@behlendorf thanks for bringing that to my attention. I could have changed the lock order there too, but I decided to make a more extensive change to simplify this code, by removing the ability to get the exact number of entries that "would be" returned by spa_get_errlog(). In the course of testing that, I realized that there were many missing checks for overflowing the user's buffer ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I see this code snippet would now have an out-of-date comment:
static int
status_callback(zpool_handle_t *zhp, void *data)
{
<snip>
/*
* If the approximate error count is small, get a
* precise count by fetching the entire log and
* uniquifying the results.
*/
if (nerr > 0 && nerr < 100 && !cbp->cb_verbose &&
zpool_get_errlog(zhp, &nverrlist) == 0) {
nvpair_t *elem;
elem = NULL;
nerr = 0;
while ((elem = nvlist_next_nvpair(nverrlist,
elem)) != NULL) {
nerr++;
}
}
This code has been around since the beginning of time so I don't have much context on its useful but updating the comment seems warranted.
@grwilson I agree that behavior is potentially confusing. I've changed it to always display the "approximate" error count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can squash and rebase this we should be able to get a cleaner test run on all the builders.
we can see regression on DilOS with original commit. This test /redundancy/redundancy_stripe has unstable result. On DilOS this degradation begun after import this commit: could you please try to check this use case for your testing ? |
There is a lock order inversion deadlock between `spa_errlog_lock` and `dp_config_rwlock`: A thread in `spa_delete_dataset_errlog()` is running from a sync task. It is holding the `dp_config_rwlock` for writer (see `dsl_sync_task_sync()`), and waiting for the `spa_errlog_lock`. A thread in `dsl_pool_config_enter()` is holding the `spa_errlog_lock` (see `spa_get_errlog_size()`) and waiting for the `dp_config_rwlock` (as reader). Note that this was introduced by openzfs#12812. This commit address this by defining the lock ordering to be dp_config_rwlock first, then spa_errlog_lock / spa_errlist_lock. spa_get_errlog() and spa_get_errlog_size() can acquire the locks in this order, and then process_error_block() and get_head_and_birth_txg() can verify that the dp_config_rwlock is already held. Additionally, a buffer overrun in `spa_get_errlog()` is corrected. Many code paths didn't check if `*count` got to zero, instead continuing to overwrite past the beginning of the userspace buffer at `uaddr`. Tested by having some errors in the pool (via `zinject -t data /path/to/file`), one thread running `zpool iostat 0.001`, and another thread runs `zfs destroy` (in a loop, although it hits the first time). This reproduces the problem easily without the fix, and works with the fix. Closes openzfs#14239 Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
@behlendorf rebased and squashed @ikozhukhov I think you're saying that the below error was introduced by 0409d33, which also caused the bug that I'm fixing here?
If that's the case then please file a separate bug report as I don't think it's related to the deadlock that I'm addressing in this PR. That said, it looks like your bug is that the test suite (test-runner.py) is calling a python function with the wrong arguments? I don't see any changes to python files in 0409d33, so I'm not sure how it would cause this, but I could be missing something. |
yes, we can see issue with original commit what you try to fix by this PR. |
There is a lock order inversion deadlock between `spa_errlog_lock` and `dp_config_rwlock`: A thread in `spa_delete_dataset_errlog()` is running from a sync task. It is holding the `dp_config_rwlock` for writer (see `dsl_sync_task_sync()`), and waiting for the `spa_errlog_lock`. A thread in `dsl_pool_config_enter()` is holding the `spa_errlog_lock` (see `spa_get_errlog_size()`) and waiting for the `dp_config_rwlock` (as reader). Note that this was introduced by openzfs#12812. This commit address this by defining the lock ordering to be dp_config_rwlock first, then spa_errlog_lock / spa_errlist_lock. spa_get_errlog() and spa_get_errlog_size() can acquire the locks in this order, and then process_error_block() and get_head_and_birth_txg() can verify that the dp_config_rwlock is already held. Additionally, a buffer overrun in `spa_get_errlog()` is corrected. Many code paths didn't check if `*count` got to zero, instead continuing to overwrite past the beginning of the userspace buffer at `uaddr`. Tested by having some errors in the pool (via `zinject -t data /path/to/file`), one thread running `zpool iostat 0.001`, and another thread runs `zfs destroy` (in a loop, although it hits the first time). This reproduces the problem easily without the fix, and works with the fix. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: George Amanakis <gamanakis@gmail.com> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#14239 Closes openzfs#14289
Motivation and Context
There is a lock order inversion deadlock between
spa_errlog_lock
anddp_config_rwlock
:A thread in
spa_delete_dataset_errlog()
is running from a sync task. It is holding thedp_config_rwlock
for writer (seedsl_sync_task_sync()
), and waiting for thespa_errlog_lock
.A thread in
dsl_pool_config_enter()
is holding thespa_errlog_lock
(seespa_get_errlog_size()
) and waiting for thedp_config_rwlock
(as reader).Note that this was introduced by #12812 @gamanakis.
See #14239
Description
This commit address this by defining the lock ordering to be dp_config_rwlock first, then spa_errlog_lock / spa_errlist_lock. spa_get_errlog() and spa_get_errlog_size() can acquire the locks in this order, and then process_error_block() and get_head_and_birth_txg() can verify that the dp_config_rwlock is already held.
How Has This Been Tested?
Tested by having some errors in the pool (via
zinject -t data /path/to/file
), one thread runningzpool iostat 0.001
, and another thread runszfs destroy
(in a loop, although it hits the first time). This reproduces the problem easily without the fix, and works with the fix.Types of changes
Checklist:
Signed-off-by
.