Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine split block reconstruction #7934

Merged
merged 3 commits into from
Oct 1, 2018
Merged

Conversation

behlendorf
Copy link
Contributor

Motivation and Context

Address the concerns raised by @sdimitro in openzfs/openzfs#625.
This follow up patch implements the suggested improvements.

Description

Due to a flaw in 4589f3a the number of unique combinations
could be calculated incorrectly. This could result in the
random combinations reconstruction being used when it would
have been possible to check all combinations.

This change fixes the unique combinations calculation and
simplifies the reconstruction logic by maintaining a per-
segment list of unique copies.

How Has This Been Tested?

For debugging the code was instrumented to intentionally damage
random segment copies, then log the the number of possible combinations
and how many attempts were required before a successful reconstruction.
This was done in addition to the normal damage which is tested by ztest.

A 12 hour run of ztest with the additional debugging showed, as expected,
that in the normal case where a copy was not damaged there was only a
single possible combination. When damage was intentionally injected
the unique copies were detected and the number of combinations
increased.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

@behlendorf behlendorf added the Status: Work in Progress Not yet ready for general review label Sep 20, 2018
Copy link
Member

@ahrens ahrens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great! I was worried that the linked list would make the code messier but it actually makes a few things more clear. Thanks!

* number of unique combinations when attempting reconstruction.
* Determine the unique children for a split segment and add them
* to the is_unique_child list. By restricting reconstruction
* attempts to these children only unique combinations will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should read will *be* considered

Personally, I'd add commas after children and after considered

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

is->is_child[j].ic_data) == 0) {
is->is_child[j].ic_duplicate = i;
}
ASSERT3P(ic_j->ic_duplicate, ==, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems self-evident given the code 1 line above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, it was a bit of extra debug code I forgot to drop.

module/zfs/vdev_indirect.c Show resolved Hide resolved
module/zfs/vdev_indirect.c Outdated Show resolved Hide resolved
Copy link
Contributor Author

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick feedback, I've refreshed the PR.

I also bumped up the default number of reconstruction attempts from 100 to 256. Making this a power of two feels more correct considering we expect two copies to be the common case.

* number of unique combinations when attempting reconstruction.
* Determine the unique children for a split segment and add them
* to the is_unique_child list. By restricting reconstruction
* attempts to these children only unique combinations will
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

is->is_child[j].ic_data) == 0) {
is->is_child[j].ic_duplicate = i;
}
ASSERT3P(ic_j->ic_duplicate, ==, NULL);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, it was a bit of extra debug code I forgot to drop.

module/zfs/vdev_indirect.c Show resolved Hide resolved
module/zfs/vdev_indirect.c Outdated Show resolved Hide resolved
@behlendorf behlendorf added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Sep 20, 2018
@ahrens
Copy link
Member

ahrens commented Sep 20, 2018

Did your testing cover the case where we have to try randomly because there are too many combinations? Just checking since you've done such a good job reducing the number of combinations possible, that it's a lot harder to hit the randomization code now.

@behlendorf
Copy link
Contributor Author

@ahrens yes, I took the opportunity to manually inflict some severe damage to test that code. I was tempted to leave that debug code in place to make it easier to test, but in the end decided against it since it was a bit ugly. If you'd like I'm happy to put something like it back in.

zio->io_error = EIO;
vdev_indirect_all_checksum_errors(zio);
zio_checksum_verified(zio);
return;
}

combinations *= is_copies;
combinations *= is->is_unique_children;
is->is_good_child = list_head(&is->is_unique_child);
}

for (;;) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I remember looking at this code the first time and it does make sense although I found it a bit hard to read due to the fact that we loop back here and then deciding again if we are enumerating or randomly picking (plus thinking about attempts and more).

I believe it is fine leaving the code as it is, on the other hand I thought of the layout below which may be a bit simpler. Basically in case of having this infinite loop we have the following:

int ret;
if (combinations <= attempts_max) {
   ret = vdev_indirect_splits_enumerate_all(iv, zio);
} else {
   ret = vdev_indirect_splits_enumerate_randomly(iv, zio, max_attempts);
}

if (ret != 0) {
    /* All attempted combinations failed. */
    zio->io_error = ret;
    vdev_indirect_all_checksum_errors(zio);
    zio_checksum_verified(zio);
}

where:

int
vdev_indirect_splits_checksum_validate(iv)
{
    for (indirect_split_t *is = list_head(&iv->iv_splits);
        is != NULL; is = list_next(&iv->iv_splits, is)) {
        ASSERT3P(is->is_good_child->ic_data, !=, NULL);
        ASSERT3P(is->is_good_child->ic_duplicate, ==, NULL);
        abd_copy_off(zio->io_abd, is->is_good_child->ic_data,
            is->is_split_offset, 0, is->is_size);
    }
    ... zbc;
    return(zio_checksum_error(zio, &zbc));
}

int
vdev_indirect_splits_enumerate_all(iv, zio)
{
    ... do the enumeration part ... assert that you don't do more than the *combinations* variable ....
   ... check every combination with vdev_indirect_splits_checksum_validate() ... check its return value ...
}

int
 vdev_indirect_splits_enumerate_randomly(iv, zio, max_attempts)
{
    for (attempt = 0; attempt < max_attempts; attempt++) {
      ... do the spa_get_random_part , and check every time with vdev_indirect_splits_checksum_validate() ...
      ... break when you match .... [you can also have a dprintf saying ("matched after %d attempts", attempt)]
    }
}

The disadvantage of the above layout is that we have auxiliary functions but hopefully make the code more readable (instead of having one big loop that decides if we are randomly enumerating over some stuff or enumerating through everything, in every iteration, we have two separate loops with more straightforward logic). An additional benefit may be that we can more easily trace things with DTrace/eBPF because of the auxiliary functions, and gives us more introspection about which strategy we decide).

Again, if you think it doesn't worth it or there is something wrong feel free to drop this.
The code LGTM as it is.

@ahrens
Copy link
Member

ahrens commented Sep 21, 2018

If you'd like I'm happy to put something like it back in.

I don't think that's necessary. Just wanted to be sure that the rare-case code got exercised at least once :)

Due to a flaw in 4589f3a the number of unique combinations
could be calculated incorrectly.  This could result in the
random combinations reconstruction being used when it would
have been possible to check all combinations.

This change fixes the unique combinations calculation and
simplifies the reconstruction logic by maintaining a per-
segment list of unique copies.

The vdev_indirect_splits_damage() function was introduced
to validate both the enumeration and random reconstruction
logic with ztest.  It is implemented such it will never
make a known recoverable block unrecoverable.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
@behlendorf
Copy link
Contributor Author

Refreshed, please take another look.

I've updated the patch to use the refactoring @sdimitro proposed. Overall I do think it is an improvement, and it allowed me to add a vdev_indirect_splits_damage() function which is used by ztest to stress both the enumeration and random reconstruction logic.

For comparison purposes the previous version of this PR was moved to the https://github.com/behlendorf/zfs/tree/issue-6900-original branch.

for (indirect_split_t *is = list_head(&iv->iv_splits);
is != NULL; is = list_next(&iv->iv_splits, is)) {
if ((is->is_good_child = list_next(
&is->is_unique_child, is->is_good_child)) != NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stylistically, I find it easier to read without side effects in if statements where practical. e.g.:

is_good_child = list_next();
if (is_good_child != NULL) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, me too as a general rule. I'll update it.


abd_zero(ic->ic_data, ic->ic_data->abd_size);
}
iv->iv_attempts_max *= 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add an assertion that this does not overflow? Or cap it to UINT64_MAX? Or is there something else that prevents that from happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's instead stop damaging copies if this gets out of control, making this take longer doesn't improve the test coverage.

}

/* Attempt to select a valid one randomly. */
int error = vdev_indirect_splits_enumerate_randomly(iv, zio);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could be more specific: set each is_good_child to a randomly-selected child


zfs_dbgmsg("reconstruction failed (%d) after %llu / %llu "
"allowed attempts, %llu unique combination(s)\n", error,
(u_longlong_t)iv->iv_attempts,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want to spam dbgmsg if a disk is silently damaged. I don't think we do that today.

/*
* Checksum failed; try a different combination of split
* children.
* The checksum has been successfully validated issue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean ... validated. Issue ...

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
@codecov
Copy link

codecov bot commented Sep 27, 2018

Codecov Report

Merging #7934 into master will increase coverage by 0.05%.
The diff coverage is 91.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7934      +/-   ##
==========================================
+ Coverage   78.64%    78.7%   +0.05%     
==========================================
  Files         377      377              
  Lines      114014   114078      +64     
==========================================
+ Hits        89668    89786     +118     
+ Misses      24346    24292      -54
Flag Coverage Δ
#kernel 78.9% <2.8%> (-0.02%) ⬇️
#user 67.93% <92.79%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d126145...e1aea23. Read the comment docs.

@ahrens ahrens added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 28, 2018
Due to the extensive damage which ztest can inflict on the
pool during testing increasing the default maximum number
of attempts is still required.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
@behlendorf
Copy link
Contributor Author

I've updated this PR to keep the increased number of reconstruction attempts when ztest is calling zdb (though it was reduced to 65536). While things are significantly improved, we did hit this in the automated testing which is why I put it back. See the end of the following ztest log.

http://build.zfsonlinux.org/builders/Ubuntu%2018.04%20x86_64%20%28TEST%29/builds/1360/steps/shell_6/logs/1.ztest.out

@behlendorf behlendorf merged commit 1258bd7 into openzfs:master Oct 1, 2018
GregorKopka pushed a commit to GregorKopka/zfs that referenced this pull request Jan 7, 2019
Due to a flaw in 4589f3a the number of unique combinations
could be calculated incorrectly.  This could result in the
random combinations reconstruction being used when it would
have been possible to check all combinations.

This change fixes the unique combinations calculation and
simplifies the reconstruction logic by maintaining a per-
segment list of unique copies.

The vdev_indirect_splits_damage() function was introduced
to validate both the enumeration and random reconstruction
logic with ztest.  It is implemented such it will never
make a known recoverable block unrecoverable.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#6900 
Closes openzfs#7934
@behlendorf behlendorf deleted the issue-6900 branch April 19, 2021 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants