-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zpool reports 16E expandsize on disks with oddball number of sectors #8391
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes good sense, thank you for the clear explanation and concise fix. As for a ZTS test case we could probably work something up, possibly using a zvol and some debug code, but I agree it's not critical.
psize = available; | ||
else | ||
psize = bdev_capacity(bdev); | ||
psize = MAX(available, bdev_capacity(bdev)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your analysis looks correct and I think it would be good document that here.
Nice fix! Could you update the comment in |
Does it make sense to also |
@loli10K that should always be true, but I'd be reluctant to add it as an ASSERT since it based on values returned to us by the underlying devices. If that devices are buggy/damaged it would be best not to assert, and instead simply consider them as unavailable. How about adding a normal error exit path for this instead in |
With the following changes on local builder diff --git a/module/zfs/vdev.c b/module/zfs/vdev.c
index 81c34da07..343f0cbf8 100644
--- a/module/zfs/vdev.c
+++ b/module/zfs/vdev.c
@@ -1642,6 +1642,17 @@ vdev_open(vdev_t *vd)
error = vd->vdev_ops->vdev_op_open(vd, &osize, &max_osize, &ashift);
+ /*
+ * Physical volume size should never be larger than its max size, unless
+ * the disk has shrunk while we were reading it or the device is buggy
+ * or damaged.
+ */
+ if (osize > max_osize) {
+ vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,
+ VDEV_AUX_OPEN_FAILED);
+ return (SET_ERROR(EOVERFLOW));
+ }
+
/*
* Reset the vdev_reopening flag so that we actually close
* the vdev on error. i am simulating a disk reporting osize=10G and max_osize=1G:
|
module/zfs/vdev.c
Outdated
if (osize > max_osize) { | ||
vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN, | ||
VDEV_AUX_OPEN_FAILED); | ||
return (SET_ERROR(EOVERFLOW)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than add EOVERFLOW
to zpool_standard_error_fmt()
why don't we return ENXIO
here instead to maintain compatibility. EOVERFLOW
is more descriptive, but for this very unlikely case I think the generic error, "one or more devices is currently unavailable", would be acceptable.
The issue is caused by a small discrepancy in how userland creates the partition layout and the kernel estimates available space: * zpool command: subtract 9M from the usable device size, then align to 1M boundary. 9M is the sum of 1M "start" partition alignment + 8M EFI "reserved" partition. * kernel module: subtract 10M from the device size. 10M is the sum of 1M "start" partition alignment + 1m "end" partition alignment + 8M EFI "reserved" partition. For devices where the number of sectors is not a multiple of the alignment size the zpool command will create a partition layout which reserves less than 1M after the 8M EFI "reserved" partition: Disk /dev/sda: 1024 MiB, 1073739776 bytes, 2097148 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: 49811D40-16F4-4E41-84A9-387703950D7F Device Start End Sectors Size Type /dev/sda1 2048 2078719 2076672 1014M Solaris /usr & Apple ZFS /dev/sda9 2078720 2095103 16384 8M Solaris reserved 1 When the kernel module vdev_open() the device its max_asize ends up being slightly smaller than asize: this results in a huge number (16E) reported by metaslab_class_expandable_space(). This change prevents bdev_max_capacity() from returing a size smaller than bdev_capacity(). Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #8391 +/- ##
==========================================
- Coverage 78.55% 78.53% -0.02%
==========================================
Files 380 380
Lines 116004 116005 +1
==========================================
- Hits 91125 91106 -19
- Misses 24879 24899 +20
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - thanks for adding the clear explanations
Motivation and Context
Fix #1468
Description
The issue is caused by a small discrepancy in how userland creates the partition layout and the kernel estimates available space:
zpool command: subtract 9M from the usable device size, then align to 1M boundary. 9M is the sum of 1M "start" partition alignment + 8M EFI "reserved" partition.
kernel module: subtract 10M from the device size. 10M is the sum of 1M "start" partition alignment + 1m "end" partition alignment + 8M EFI "reserved" partition.
For devices where the number of sectors is not a multiple of the alignment size the zpool command will create a partition layout which reserves less than 1M after the 8M EFI "reserved" partition:
When the kernel module
vdev_open()
the device itsmax_asize
ends up being slightly smaller thanasize
: this results in a huge number (16E) reported bymetaslab_class_expandable_space()
.This change prevents
bdev_max_capacity()
from returing a size smaller thanbdev_capacity()
.How Has This Been Tested?
This change has been tested manually with virtual disks on a Debian builder: unfortunately this issue affects only wholedisk devices where the size is not a multiple of 1m, so we cannot use device mapper/loop/scsi_debug devices to test this in the ZFS Test Suite.
Types of changes
Checklist:
Signed-off-by
.