Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txg_sync should handle write errors in ZIL #14283

Merged
merged 1 commit into from
Mar 10, 2023

Conversation

ryao
Copy link
Contributor

@ryao ryao commented Dec 12, 2022

Motivation and Context

The txg_sync thread will see certain buffers in a DR_IN_DMU_SYNC state when ZIL is writing them out. Then it waits until the state changes, but has an assertion to check that they were not DR_NOT_OVERRIDDEN. If the data write failed with an error, ZIL will put it into the DR_NOT_OVERRIDDEN state. It looks like the code will handle that state without an issue, so we can just delete the assertion.

This was done by Klara Systems and sponsored by Wasabi Technology, Inc.

Description

Delete unnecessary assertion.

How Has This Been Tested?

It has been runtime tested on ZFS by running a benchmark used by Wasabi Technology, Inc on top of a pool using a patched driver. The driver had other WIP patches on top, but this issue is believed to be present in the code without those patches.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Dec 14, 2022
The txg_sync thread will see certain buffers in a DR_IN_DMU_SYNC state
when ZIL is writing them out. Then it waits until the state changes, but
has an assertion to check that they were not DR_NOT_OVERRIDDEN. If the
data write failed with an error, ZIL will put it into the
DR_NOT_OVERRIDDEN state. It looks like the code will handle that state
without an issue, so we can just delete the assertion.

Sponsored-By: Wasabi Technology, Inc.
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
@ryao
Copy link
Contributor Author

ryao commented Feb 27, 2023

@behlendorf I have rebased this on master and repushed. While I did not keep a count of the precise number of test runs that included this patch, I have records for 1328 runs. While 20 of those runs had issues, none were attributable to this patch. As previously described in another pull request, the testing done on this patch was as follows:

We have a system with a 14 disk pool (typically all in a single raidz3 vdev, but also sometimes all as individual top level vdevs for testing purposes) and a script that will:

- Run a ZIL intensive workload that creates dozens of gigabytes of data
- Power-off 5 drives at the backplane while the workload is running
- After some time has elapsed and the pool has long faulted, power-on the drives
- Run zpool clear on the pool to unsuspend it.
- (As of a few weeks ago) Do a scrub.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Mar 10, 2023
@behlendorf behlendorf merged commit 7316fdd into openzfs:master Mar 10, 2023
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Mar 13, 2023
The txg_sync thread will see certain buffers in a DR_IN_DMU_SYNC state
when ZIL is writing them out. Then it waits until the state changes, but
has an assertion to check that they were not DR_NOT_OVERRIDDEN. If the
data write failed with an error, ZIL will put it into the
DR_NOT_OVERRIDDEN state. It looks like the code will handle that state
without an issue, so we can just delete the assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Closes openzfs#14283
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Mar 16, 2023
The txg_sync thread will see certain buffers in a DR_IN_DMU_SYNC state
when ZIL is writing them out. Then it waits until the state changes, but
has an assertion to check that they were not DR_NOT_OVERRIDDEN. If the
data write failed with an error, ZIL will put it into the
DR_NOT_OVERRIDDEN state. It looks like the code will handle that state
without an issue, so we can just delete the assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Closes openzfs#14283
datacore-rm pushed a commit to DataCoreSoftware/openzfs that referenced this pull request Aug 14, 2023
The txg_sync thread will see certain buffers in a DR_IN_DMU_SYNC state
when ZIL is writing them out. Then it waits until the state changes, but
has an assertion to check that they were not DR_NOT_OVERRIDDEN. If the
data write failed with an error, ZIL will put it into the
DR_NOT_OVERRIDDEN state. It looks like the code will handle that state
without an issue, so we can just delete the assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Closes openzfs#14283
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants