-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine split block reconstruction #7934
Conversation
91b1322
to
0bc944c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great! I was worried that the linked list would make the code messier but it actually makes a few things more clear. Thanks!
module/zfs/vdev_indirect.c
Outdated
* number of unique combinations when attempting reconstruction. | ||
* Determine the unique children for a split segment and add them | ||
* to the is_unique_child list. By restricting reconstruction | ||
* attempts to these children only unique combinations will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should read will *be* considered
Personally, I'd add commas after children
and after considered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
module/zfs/vdev_indirect.c
Outdated
is->is_child[j].ic_data) == 0) { | ||
is->is_child[j].ic_duplicate = i; | ||
} | ||
ASSERT3P(ic_j->ic_duplicate, ==, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems self-evident given the code 1 line above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, it was a bit of extra debug code I forgot to drop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick feedback, I've refreshed the PR.
I also bumped up the default number of reconstruction attempts from 100 to 256. Making this a power of two feels more correct considering we expect two copies to be the common case.
module/zfs/vdev_indirect.c
Outdated
* number of unique combinations when attempting reconstruction. | ||
* Determine the unique children for a split segment and add them | ||
* to the is_unique_child list. By restricting reconstruction | ||
* attempts to these children only unique combinations will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
module/zfs/vdev_indirect.c
Outdated
is->is_child[j].ic_data) == 0) { | ||
is->is_child[j].ic_duplicate = i; | ||
} | ||
ASSERT3P(ic_j->ic_duplicate, ==, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, it was a bit of extra debug code I forgot to drop.
Did your testing cover the case where we have to try randomly because there are too many combinations? Just checking since you've done such a good job reducing the number of combinations possible, that it's a lot harder to hit the randomization code now. |
@ahrens yes, I took the opportunity to manually inflict some severe damage to test that code. I was tempted to leave that debug code in place to make it easier to test, but in the end decided against it since it was a bit ugly. If you'd like I'm happy to put something like it back in. |
a46dfa1
to
cc6b6c7
Compare
module/zfs/vdev_indirect.c
Outdated
zio->io_error = EIO; | ||
vdev_indirect_all_checksum_errors(zio); | ||
zio_checksum_verified(zio); | ||
return; | ||
} | ||
|
||
combinations *= is_copies; | ||
combinations *= is->is_unique_children; | ||
is->is_good_child = list_head(&is->is_unique_child); | ||
} | ||
|
||
for (;;) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I remember looking at this code the first time and it does make sense although I found it a bit hard to read due to the fact that we loop back here and then deciding again if we are enumerating or randomly picking (plus thinking about attempts
and more
).
I believe it is fine leaving the code as it is, on the other hand I thought of the layout below which may be a bit simpler. Basically in case of having this infinite loop we have the following:
int ret;
if (combinations <= attempts_max) {
ret = vdev_indirect_splits_enumerate_all(iv, zio);
} else {
ret = vdev_indirect_splits_enumerate_randomly(iv, zio, max_attempts);
}
if (ret != 0) {
/* All attempted combinations failed. */
zio->io_error = ret;
vdev_indirect_all_checksum_errors(zio);
zio_checksum_verified(zio);
}
where:
int
vdev_indirect_splits_checksum_validate(iv)
{
for (indirect_split_t *is = list_head(&iv->iv_splits);
is != NULL; is = list_next(&iv->iv_splits, is)) {
ASSERT3P(is->is_good_child->ic_data, !=, NULL);
ASSERT3P(is->is_good_child->ic_duplicate, ==, NULL);
abd_copy_off(zio->io_abd, is->is_good_child->ic_data,
is->is_split_offset, 0, is->is_size);
}
... zbc;
return(zio_checksum_error(zio, &zbc));
}
int
vdev_indirect_splits_enumerate_all(iv, zio)
{
... do the enumeration part ... assert that you don't do more than the *combinations* variable ....
... check every combination with vdev_indirect_splits_checksum_validate() ... check its return value ...
}
int
vdev_indirect_splits_enumerate_randomly(iv, zio, max_attempts)
{
for (attempt = 0; attempt < max_attempts; attempt++) {
... do the spa_get_random_part , and check every time with vdev_indirect_splits_checksum_validate() ...
... break when you match .... [you can also have a dprintf saying ("matched after %d attempts", attempt)]
}
}
The disadvantage of the above layout is that we have auxiliary functions but hopefully make the code more readable (instead of having one big loop that decides if we are randomly enumerating over some stuff or enumerating through everything, in every iteration, we have two separate loops with more straightforward logic). An additional benefit may be that we can more easily trace things with DTrace/eBPF because of the auxiliary functions, and gives us more introspection about which strategy we decide).
Again, if you think it doesn't worth it or there is something wrong feel free to drop this.
The code LGTM as it is.
I don't think that's necessary. Just wanted to be sure that the rare-case code got exercised at least once :) |
Due to a flaw in 4589f3a the number of unique combinations could be calculated incorrectly. This could result in the random combinations reconstruction being used when it would have been possible to check all combinations. This change fixes the unique combinations calculation and simplifies the reconstruction logic by maintaining a per- segment list of unique copies. The vdev_indirect_splits_damage() function was introduced to validate both the enumeration and random reconstruction logic with ztest. It is implemented such it will never make a known recoverable block unrecoverable. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
cc6b6c7
to
31c7f44
Compare
Refreshed, please take another look. I've updated the patch to use the refactoring @sdimitro proposed. Overall I do think it is an improvement, and it allowed me to add a For comparison purposes the previous version of this PR was moved to the https://github.com/behlendorf/zfs/tree/issue-6900-original branch. |
module/zfs/vdev_indirect.c
Outdated
for (indirect_split_t *is = list_head(&iv->iv_splits); | ||
is != NULL; is = list_next(&iv->iv_splits, is)) { | ||
if ((is->is_good_child = list_next( | ||
&is->is_unique_child, is->is_good_child)) != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stylistically, I find it easier to read without side effects in if
statements where practical. e.g.:
is_good_child = list_next();
if (is_good_child != NULL) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, me too as a general rule. I'll update it.
|
||
abd_zero(ic->ic_data, ic->ic_data->abd_size); | ||
} | ||
iv->iv_attempts_max *= 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add an assertion that this does not overflow? Or cap it to UINT64_MAX? Or is there something else that prevents that from happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's instead stop damaging copies if this gets out of control, making this take longer doesn't improve the test coverage.
} | ||
|
||
/* Attempt to select a valid one randomly. */ | ||
int error = vdev_indirect_splits_enumerate_randomly(iv, zio); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could be more specific: set each is_good_child to a randomly-selected child
module/zfs/vdev_indirect.c
Outdated
|
||
zfs_dbgmsg("reconstruction failed (%d) after %llu / %llu " | ||
"allowed attempts, %llu unique combination(s)\n", error, | ||
(u_longlong_t)iv->iv_attempts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we want to spam dbgmsg if a disk is silently damaged. I don't think we do that today.
module/zfs/vdev_indirect.c
Outdated
/* | ||
* Checksum failed; try a different combination of split | ||
* children. | ||
* The checksum has been successfully validated issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean ... validated. Issue ...
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Codecov Report
@@ Coverage Diff @@
## master #7934 +/- ##
==========================================
+ Coverage 78.64% 78.7% +0.05%
==========================================
Files 377 377
Lines 114014 114078 +64
==========================================
+ Hits 89668 89786 +118
+ Misses 24346 24292 -54
Continue to review full report at Codecov.
|
Due to the extensive damage which ztest can inflict on the pool during testing increasing the default maximum number of attempts is still required. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
I've updated this PR to keep the increased number of reconstruction attempts when |
Due to a flaw in 4589f3a the number of unique combinations could be calculated incorrectly. This could result in the random combinations reconstruction being used when it would have been possible to check all combinations. This change fixes the unique combinations calculation and simplifies the reconstruction logic by maintaining a per- segment list of unique copies. The vdev_indirect_splits_damage() function was introduced to validate both the enumeration and random reconstruction logic with ztest. It is implemented such it will never make a known recoverable block unrecoverable. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#6900 Closes openzfs#7934
Motivation and Context
Address the concerns raised by @sdimitro in openzfs/openzfs#625.
This follow up patch implements the suggested improvements.
Description
Due to a flaw in 4589f3a the number of unique combinations
could be calculated incorrectly. This could result in the
random combinations reconstruction being used when it would
have been possible to check all combinations.
This change fixes the unique combinations calculation and
simplifies the reconstruction logic by maintaining a per-
segment list of unique copies.
How Has This Been Tested?
For debugging the code was instrumented to intentionally damage
random segment copies, then log the the number of possible combinations
and how many attempts were required before a successful reconstruction.
This was done in addition to the normal damage which is tested by ztest.
A 12 hour run of ztest with the additional debugging showed, as expected,
that in the normal case where a copy was not damaged there was only a
single possible combination. When damage was intentionally injected
the unique copies were detected and the number of combinations
increased.
Types of changes
Checklist:
Signed-off-by
.