-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More ztest fixes #8061
More ztest fixes #8061
Conversation
Currently, zthr_resume() is not safe to call while zthr_cancel() is running. This is problematic because zthr_cancel() is called from spa_export_common() without holding any locks to prevent this race. This patch simply changes zthr_resume() so that it can handle races against other calls to zthr_cancel() and zthr_resume(). Signed-off-by: Tom Caputi <tcaputi@datto.com>
b25311a
to
cae3d8d
Compare
Hey Tom! I think I know what's the issue that you are trying to fix here. @shartse and I have worked in some fixes in the zthr infrastructure (some of them are in her fast clone deletion project stuck behind send/receive and the rest are in an internal review where I fix this issue #7744 ). I think Sara's code specifically fix the above issue. I think it would be best to take some time and maybe extract the zthr fixes from Sara's project and combine them with mine, so there are no code differences in the future. Does this sound good to you? Also, out of curiosity, does this issue come up very often? |
@sdimitro That sounds fine. I'm all for anything that gets the problem fixed. I would just want to see what you guys came up with to fix this issue so I can verify we're talking about the same problem.
With all the ztest fixes that went in yesterday, I was actually able to run ztest locally for 12 hours with this being the only failure, which only occurred once. I bet you could get it to happen fairly frequently if you configured ztest to just run |
zfs_rename_006_pos has been flaky in the past because it was missing a call to block_device_wait to ensure the zvols it creates are present before running dd. Whenever this this happened, zfs_rename_009_neg would also fail because the first test would leak a zvol clone that it did not know how to clean up. This patch fixes the root cause and reenables the test. It also fixes some minor grammar errors. Signed-off-by: Tom Caputi <tcaputi@datto.com>
This patch simply corrects an issue where vdev_dtl_reassess() could attempt to dirty the vdev config even when the spa was not elligable for writing. Signed-off-by: Tom Caputi <tcaputi@datto.com>
vdev_clear() can call vdev_set_deferred_resilver() with a non-leaf vdev to setup a deferred resilver. However, this function is currently written to only handle leaf vdevs. This patch makes this function recursive so that it can find appropriate vdevs to resilver and set vdev_resilver_deferred on them. Signed-off-by: Tom Caputi <tcaputi@datto.com> TEST_ZTEST_TIMEOUT=7200
c753555
to
f3af06a
Compare
Let's get independent PRs open for each of these. |
c693cc6
to
d9df89d
Compare
This patch ensures that logs are replayed on all datasets prior to starting ztest workers. This ensures that the call to vdev_offline() a log device in ztest_fault_inject() will not fail due to the log device being required for replay. This patch also fixes a small issue found during testing where spa_keystore_load_wkey() does not check that the dataset specified is an encryption root. This check was present in libzfs, however. Signed-off-by: Tom Caputi <tcaputi@datto.com> TEST_ZTEST_TIMEOUT=3600
This patch fixes a race condition where the end of vdev_remove_replace_with_indirect(), which holds svr_lock, would race against spa_vdev_removal_destroy(), which destroys the same lock and is called asynchronously via dsl_sync_task_nowait(). Signed-off-by: Tom Caputi <tcaputi@datto.com>
This patch simply ensures that vdev_indirect_splits_damage() cannot hit a divide by zero exception if a split has no children with valid data. The normal reconstruction code path in vdev_indirect_reconstruct_io_done() already has this check. Signed-off-by: Tom Caputi <tcaputi@datto.com> TEST_ZTEST_TIMEOUT=3600
d9df89d
to
a065662
Compare
Codecov Report
@@ Coverage Diff @@
## master #8061 +/- ##
==========================================
- Coverage 78.43% 78.35% -0.08%
==========================================
Files 377 377
Lines 114518 114521 +3
==========================================
- Hits 89817 89737 -80
- Misses 24701 24784 +83
Continue to review full report at Codecov.
|
This PR is a follow-up to #8010 with more fixes for ztest.
Signed-off-by: Tom Caputi tcaputi@datto.com
How Has This Been Tested?
All problems and fixes in this patch were discovered and verified with ztest (as much as ztest can verify anything).
Types of changes
Checklist:
Signed-off-by
.