btrfs-progs: cmds/qgroup: use sysfs to detect inconsistent status early #952

adam900710 · 2025-02-13T06:49:43Z

Currently if "btrfs qgroup show" detects the qgroup is in an inconsistent, it will output a warning:

WARNING: qgroup data inconsistent, rescan recommended

But the detection is based on the tree search ioctl, and qgroup tree is only updated at transaction commit time.

This means if qgroup is marked inconsistent, and the transaction is not commit, there can be a huge window as long as 30s before "btrfs qgroup show" to give a proper warning about inconsistent qgroup numbers.

To address this window, use the
'/sys/fs/btrfs//qgroup/inconsistent' file to do extra check.

That file is updated at real time, thus there is no delay, and can give an early warning about inconsistent qgroup numbers.

Link: https://bugzilla.suse.com/show_bug.cgi?id=1235765

Unify the error message style a bit, print the relevant path if missing. Signed-off-by: David Sterba <dsterba@suse.com>

The preferred error message should have a prefix with problem description and then the errno description as we use the negative errno convention almost everywhere. - drop additional %d in the message if %m is present - replace %d with %m Signed-off-by: David Sterba <dsterba@suse.com>

The subvolumes created during mkfs are not printed in the summary because btrfs_mkfs_fill_dir() deletes them from the list as they get created. Signed-off-by: David Sterba <dsterba@suse.com>

Print indicators in the summary if the subvolume is read-write, read-only or default: $ mkfs.btrfs --subvol ro:subvolro --subvol rw:subvolrw --subvol default-ro:defaultro --rootdir /rootdir/path img ... Rootdir from: /rootdir/path Compress: no Subvolume (rw): subvolrw Subvolume (ro): subvolro Subvolume (dro): defaultro ... The path is relative to the rootdir path and may not be a subvolume in the source directory so drop the rootdir as this may be confusing. Signed-off-by: David Sterba <dsterba@suse.com>

Fix missing or wrong formatting, enhance descriptions or add more references. Signed-off-by: David Sterba <dsterba@suse.com>

Docker warns about CMD syntax, which is harmless in our case but let's fix the warning by using the json (quoted strings in array) syntax. JSONArgsRecommended: JSON arguments recommended for CMD to prevent unintended behavior related to OS signals (line 29) Explained in https://docs.docker.com/reference/build-checks/json-args-recommended . Signed-off-by: David Sterba <dsterba@suse.com>

Enable CodeQL code scanning checks. For that the automatically detected build steps don't work as the documentation is expected to build. Add the manual steps. Signed-off-by: David Sterba <dsterba@suse.com>

[ci skip] Signed-off-by: David Sterba <dsterba@suse.com>

The values of UNITS_ modes are supposed to be unique and non-overlapping. Due to wrong definition this was not true for UNITS_DECIMAL which set 2 bits. This breaks code that strictly depends on the uniqueness of the bits and not on the order of processing of the constants. Signed-off-by: David Sterba <dsterba@suse.com>

There's a special case when scrub rate is printed, the device sizes and rate use different units. The size can be in terabytes while the rate can be in hundreds of megabytes (contemporary 10-20T disks, 250MB/s). The sizes use what is set on command line (or human readable by default), while the rate is always human readable with exception to the option --raw to provide a way to print the exact numbers without any conversions. This got broken in commit ec3c842 ("btrfs-progs: scrub status: with --si, show rate in metric units") that forced the command line mode to the rate as well. Instead of that we need to detect the SI/IEC mode and set it to the human readable format of rate. Signed-off-by: David Sterba <dsterba@suse.com>

Add currently implemented types and update recommendations. Signed-off-by: David Sterba <dsterba@suse.com>

Thanks to hunspell. [ci skip] Signed-off-by: David Sterba <dsterba@suse.com>

Add sections for each directory so it's more visible where the sections start and end. Update formatting, enhance descriptions. Signed-off-by: David Sterba <dsterba@suse.com>

The references used in :doc: need to reference relative path of the document, otherwise this leads to a warning. btrfs-progs/Documentation/Send-receive.rst:19: WARNING: unknown document: 'dev-send-stream' [ref.doc] btrfs-progs/Documentation/dev/CmdLineConventions.rst:60: WARNING: unknown document: 'btrfstune' [ref.doc] Signed-off-by: David Sterba <dsterba@suse.com>

The kernel commit 08fe4db170b419 ("Btrfs: Fix uninitialized root flags for subvolumes") from 2011 sets the flag BTRFS_INODE_ROOT_ITEM_INIT on root items, to work around a bug where flags and byte_limit weren't being set. Copy this behaviour in mkfs, to prevent the kernel from having to do it on the first mount. We memset the btrfs_root_item, so there's no corruption issue as there once was. We already do this in btrfs_make_subvolume(), as otherwise the readonly flag of any subvolumes would get reset. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>

The kernel adds a zeroed btrfs_dev_stats_item for each device on the first mount. Preempt this by doing it at mkfs time. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>

Follow the kernel by setting the BIG_METADATA incompat flag if nodesize is greater than the page size. This flag was introduced with commit 727011e07cbdf8 ("Btrfs: allow metadata blocks larger than the page size") in 2010, as kernels before 2.6.36 would crash due to a buggy page cache implementation. The flag has no real meaning anymore but we can at least set it at mkfs time. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>

Just like all qgroup functions, if a qgroup is marked inconsistent, limit will not work as expected. In fact with recent kernels, limit and qgroup number updating will be fully skipped if qgroup is already inconsistent. Add one extra note on `btrfs qgroup limit` subcommand for it. Link: https://bugzilla.suse.com/show_bug.cgi?id=1235765 Reported-by: Vojtech Lacina <vlacina@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com>

…wer redundancy [BUG] There is a bug report that when deleting a device using sysfs /sys/block/<dev>/device/delete, the kernel module will still try to read and write the device. Normally it's fine as long as all chunks can tolerate that removed device (e.g. all RAID1). But the problem is when one is trying to lower the redundancy by converting to another profile: # mkfs.btrfs -f -m raid1 -d raid1 /dev/sdd /dev/sde # mount /dev/sdd /mnt # echo 1 > /sys/block/sde/device/delete # btrfs balance start --force -mdup -dsingle /mnt This will lead to the filesystem mounted RO, with the following error messages: sd 6:0:0:0: [sde] Synchronizing SCSI cache ata7.00: Entering standby power mode btrfs: attempt to access beyond end of device sde: rw=6145, sector=21696, nr_sectors = 32 limit=0 btrfs: attempt to access beyond end of device sde: rw=6145, sector=21728, nr_sectors = 32 limit=0 btrfs: attempt to access beyond end of device sde: rw=6145, sector=21760, nr_sectors = 32 limit=0 BTRFS error (device sdd): bdev /dev/sde errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device sdd): bdev /dev/sde errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device sdd): bdev /dev/sde errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device sdd): bdev /dev/sde errs: wr 3, rd 0, flush 1, corrupt 0, gen 0 btrfs: attempt to access beyond end of device sde: rw=145409, sector=128, nr_sectors = 8 limit=0 BTRFS warning (device sdd): lost super block write due to IO error on /dev/sde (-5) BTRFS error (device sdd): bdev /dev/sde errs: wr 4, rd 0, flush 1, corrupt 0, gen 0 btrfs: attempt to access beyond end of device sde: rw=14337, sector=131072, nr_sectors = 8 limit=0 BTRFS warning (device sdd): lost super block write due to IO error on /dev/sde (-5) BTRFS error (device sdd): bdev /dev/sde errs: wr 5, rd 0, flush 1, corrupt 0, gen 0 BTRFS error (device sdd): error writing primary super block to device 2 BTRFS info (device sdd): balance: start -dconvert=single -mconvert=dup -sconvert=dup BTRFS info (device sdd): relocating block group 1372585984 flags data|raid1 BTRFS error (device sdd): bdev /dev/sde errs: wr 5, rd 0, flush 2, corrupt 0, gen 0 BTRFS warning (device sdd): chunk 2446327808 missing 1 devices, max tolerance is 0 for writable mount BTRFS: error (device sdd) in write_all_supers:4044: errno=-5 IO failure (errors while submitting device barriers.) BTRFS info (device sdd state E): forced readonly BTRFS warning (device sdd state E): Skipping commit of aborted transaction. BTRFS error (device sdd state EA): Transaction aborted (error -5) BTRFS: error (device sdd state EA) in cleanup_transaction:2017: errno=-5 IO failure BTRFS info (device sdd state EA): balance: ended with status: -5 [CAUSE] Btrfs doesn't have any runtime device error handling, it fully rely on the extra copy provided. For the sysfs block device removal, normally there is a device shutdown callback to the running fs, but unfortunately btrfs doesn't support this callback yet. Thus even with that device removed, btrfs will still access that removed device (both read and write, even if they will fail). Normally for a full RAID1 btrfs, it will still be fine reading/write the fs as usual. The proper action is to replace the removed/missing/failing device with a newer one using `btrfs device replace`. But when doing the convert, btrfs will allocate new metadata chunks on to the removed device (which will lose all writes). And since the new metadata profile is DUP, which can not handle any missing device of that metadata chunk, finally it triggers the final protection at transaction commit time, and flips the filesystem to RO, before causing any real data loss. [DOC ENHANCEMENT] Add a warning to the `convert` filter about the dangerous doing convert to a lower redundancy profile when there is a known failing/removed device. And mention the proper way to handle such failing/missing device. The root fix is to introduce a failing/removed device detection for btrfs, but that will be a pretty big feature and will take quite some time before landing it upstream. Link: https://lore.kernel.org/linux-btrfs/2cb1d81e-12a8-4fb1-b3fc-e7e83d31e059@siddall.name/ Reported-by: Jeff Siddall <news@siddall.name> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

There already is misc/065-csum-conversion-inject . Signed-off-by: David Sterba <dsterba@suse.com>

The 065 and 066 are taken. Signed-off-by: David Sterba <dsterba@suse.com>

There are two tests 064, and we're missing 061 so fill the gap. Signed-off-by: David Sterba <dsterba@suse.com>

Use 'symbolic' as the term. Signed-off-by: David Sterba <dsterba@suse.com>

The templates need to be renamed, so make it more suggestive by prepending the expected name of the script test.sh. Also fix the permissions and make it 755 so it's not missed later as this would not execute the script. Signed-off-by: David Sterba <dsterba@suse.com>

Minor fixes and updates that reflect current state. [ci skip] Signed-off-by: David Sterba <dsterba@suse.com>

For quick checks before a push of non-code changes we may want to do only the spellchecking workflow. Any branch pushed matching the prefix "codespell/" will be picked by this. Signed-off-by: David Sterba <dsterba@suse.com>

Currently if "btrfs qgroup show" detects the qgroup is in an inconsistent, it will output a warning: WARNING: qgroup data inconsistent, rescan recommended But the detection is based on the tree search ioctl, and qgroup tree is only updated at transaction commit time. This means if qgroup is marked inconsistent, and the transaction is not commit, there can be a huge window as long as 30s before "btrfs qgroup show" to give a proper warning about inconsistent qgroup numbers. To address this window, use the '/sys/fs/btrfs/<fsid>/qgroup/inconsistent' file to do extra check. That file is updated at real time, thus there is no delay, and can give an early warning about inconsistent qgroup numbers. Link: https://bugzilla.suse.com/show_bug.cgi?id=1235765 Signed-off-by: Qu Wenruo <wqu@suse.com>

kdave and others added 27 commits February 11, 2025 20:43

btrfs-progs: mkfs: update subvolume relatd error messages

a77705a

Unify the error message style a bit, print the relevant path if missing. Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: mkfs: fix printing of subvolumes in the summary

83c3bc4

The subvolumes created during mkfs are not printed in the summary because btrfs_mkfs_fill_dir() deletes them from the list as they get created. Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: docs: update mkfs.btrfs manual page

8f580ad

Fix missing or wrong formatting, enhance descriptions or add more references. Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: ci: add workflow for CodeQL

800d7b0

Enable CodeQL code scanning checks. For that the automatically detected build steps don't work as the documentation is expected to build. Add the manual steps. Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: README: add status badge of codeql workflow

6609489

[ci skip] Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: docs: update json API

4382c2f

Add currently implemented types and update recommendations. Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: docs: fix typos

c5c9456

Thanks to hunspell. [ci skip] Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: docs: update sysfs descriptions

9e7853c

Add sections for each directory so it's more visible where the sections start and end. Update formatting, enhance descriptions. Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: mkfs: add dev stats and device items

80cd98e

The kernel adds a zeroed btrfs_dev_stats_item for each device on the first mount. Preempt this by doing it at mkfs time. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: tests: adjust number of misc/065-image-filename

e1085e6

There already is misc/065-csum-conversion-inject . Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: tests: adjust number of misc/065-btrfstune-simple-quota

92ab5b5

The 065 and 066 are taken. Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: tests: adjust number of misc/064-reverse-receive

792fe15

There are two tests 064, and we're missing 061 so fill the gap. Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: tests: rename and update convert/027-large-symbol-link

ab0d5e2

Use 'symbolic' as the term. Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: tests: update README

d12e76e

Minor fixes and updates that reflect current state. [ci skip] Signed-off-by: David Sterba <dsterba@suse.com>

btrfs-progs: ci: add option to run only codespell workflow

a1926e3

For quick checks before a push of non-code changes we may want to do only the spellchecking workflow. Any branch pushed matching the prefix "codespell/" will be picked by this. Signed-off-by: David Sterba <dsterba@suse.com>

kdave force-pushed the devel branch 3 times, most recently from 1480f4a to 103338d Compare February 17, 2025 17:30

kdave force-pushed the devel branch from 103338d to 8c4549c Compare February 17, 2025 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

btrfs-progs: cmds/qgroup: use sysfs to detect inconsistent status early #952

btrfs-progs: cmds/qgroup: use sysfs to detect inconsistent status early #952

adam900710 commented Feb 13, 2025

btrfs-progs: cmds/qgroup: use sysfs to detect inconsistent status early #952

Are you sure you want to change the base?

btrfs-progs: cmds/qgroup: use sysfs to detect inconsistent status early #952

Conversation

adam900710 commented Feb 13, 2025