Bad block pointer and poor user experience for recovering #12245

tcaputi · 2021-06-16T08:53:45Z

System information

Type	Version/Name
Distribution Name	Ubuntu
Distribution Version	20.04
Linux Kernel	5.4.0-74-generic
Architecture	x86_64
ZFS Version	9a15da17 + 2 extra cherry-picked commits for dropbear support

Describe the problem you're observing

My personal server has a bad block pointer. The server hosts several personal projects of mine including backup services for devices on my network, a minecraft server, a number of websites, and a service that downloads and categorizes tweets in a mysql database for research projects. The twitter service is the most intensive workload on the server.

The code is hitting zfs_panic_recover() so it seems that we somehow wrote out a correct checksum for this bad block pointer, which is very concerning. See the logs for more details.

In addition to the inconvenience of corrupted data, the experience of attempting to recover from this situation was not great:

The systemd services hang indefinitely without a timeout.
It is not obvious to the user that they can try to set zfs_recover = 1 to attempt to resolve the issue and its not clear to the user what this will actually do.
In my case, my server panics anyway once this flag is set because one of the bad values in the block pointer is an absolutely massive allocated size.
We don't print a lot of helpful information anywhere about the actual bp thats corrupted. It is hard to know what data has been lost or even just find it in the pool for debugging purposes

Describe how to reproduce the problem

Unfortunately I don't know what caused this, but it does look like a code problem. We did see this at Datto several times in production and never 100% got to the bottom of it. It was simply too infrequent to debug and hard to identify when the problem was actually introduced.

Include any warning/errors/backtraces from the system logs

Backtraces from dmesg (note the WARNING messages):

[ 1092.230029] INFO: task zpool:58350 blocked for more than 120 seconds.
[ 1092.230034]       Tainted: P           OE     5.4.0-74-generic #83-Ubuntu
[ 1092.230036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1092.230038] zpool           D    0 58350      1 0x00000000
[ 1092.230040] Call Trace:
[ 1092.230047]  __schedule+0x2e3/0x740
[ 1092.230050]  schedule+0x42/0xb0
[ 1092.230051]  schedule_preempt_disabled+0xe/0x10
[ 1092.230053]  __mutex_lock.isra.0+0x182/0x4f0
[ 1092.230054]  ? _cond_resched+0x19/0x30
[ 1092.230056]  __mutex_lock_slowpath+0x13/0x20
[ 1092.230057]  mutex_lock+0x2e/0x40
[ 1092.230107]  spa_open_common+0x61/0x4d0 [zfs]
[ 1092.230154]  spa_get_stats+0x57/0x580 [zfs]
[ 1092.230192]  zfs_ioc_pool_stats+0x39/0x90 [zfs]
[ 1092.230229]  zfsdev_ioctl_common+0x657/0x810 [zfs]
[ 1092.230232]  ? __kmalloc_node+0x267/0x330
[ 1092.230268]  zfsdev_ioctl+0x55/0xe0 [zfs]
[ 1092.230271]  do_vfs_ioctl+0x407/0x670
[ 1092.230274]  ? do_user_addr_fault+0x216/0x450
[ 1092.230275]  ksys_ioctl+0x67/0x90
[ 1092.230276]  __x64_sys_ioctl+0x1a/0x20
[ 1092.230279]  do_syscall_64+0x57/0x190
[ 1092.230280]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1092.230282] RIP: 0033:0x7f9ae346d37b
[ 1092.230285] Code: Bad RIP value.
[ 1092.230286] RSP: 002b:00007fffc2b33938 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 1092.230288] RAX: ffffffffffffffda RBX: 0000557baa298740 RCX: 00007f9ae346d37b
[ 1092.230288] RDX: 00007fffc2b33960 RSI: 0000000000005a05 RDI: 0000000000000003
[ 1092.230289] RBP: 00007fffc2b36f50 R08: 0000557baa298890 R09: 00007f9ae3541eb0
[ 1092.230289] R10: 00007f9ae3723508 R11: 0000000000000202 R12: 00007fffc2b33960
[ 1092.230290] R13: 0000557baa296470 R14: 0000000000000000 R15: 00007fffc2b36f64
[ 1180.442357] WARNING: pool: blkptr at 000000001daeb35d has invalid TYPE 175
[ 1180.442360] WARNING: pool: blkptr at 000000001daeb35d has invalid CHECKSUM 21
[ 1180.442361] WARNING: pool: blkptr at 000000001daeb35d DVA 0 has invalid VDEV 10679131
[ 1180.442362] WARNING: pool: blkptr at 000000001daeb35d DVA 1 has invalid VDEV 7885259
[ 1180.442363] WARNING: pool: blkptr at 000000001daeb35d DVA 2 has invalid VDEV 10410665
[ 1180.442371] VERIFY3(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT) failed (36028797018963967 < 32768)
[ 1180.442376] PANIC at zio.c:293:zio_buf_alloc()
[ 1180.442378] Showing stack for process 44732
[ 1180.442381] CPU: 8 PID: 44732 Comm: zpool Tainted: P           OE     5.4.0-74-generic #83-Ubuntu
[ 1180.442381] Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 3402 01/13/2021
[ 1180.442382] Call Trace:
[ 1180.442390]  dump_stack+0x6d/0x8b
[ 1180.442399]  spl_dumpstack+0x29/0x2b [spl]
[ 1180.442403]  spl_panic+0xd4/0xfc [spl]
[ 1180.442406]  ? __wake_up_common_lock+0x8a/0xc0
[ 1180.442408]  ? _cond_resched+0x19/0x30
[ 1180.442412]  ? kmem_cache_alloc+0x178/0x230
[ 1180.442416]  ? spl_kmem_cache_alloc+0x82/0x830 [spl]
[ 1180.442419]  ? spl_kmem_cache_alloc+0xa1/0x830 [spl]
[ 1180.442476]  ? zio_taskq_member.isra.0.constprop.0+0x60/0x60 [zfs]
[ 1180.442525]  ? spa_taskq_dispatch_ent+0x65/0xc0 [zfs]
[ 1180.442573]  zio_buf_alloc+0x5b/0x60 [zfs]
[ 1180.442620]  abd_alloc_linear+0x6e/0xf0 [zfs]
[ 1180.442664]  abd_alloc+0x343/0x4a0 [zfs]
[ 1180.442696]  ? arc_space_consume+0x54/0xe0 [zfs]
[ 1180.442728]  arc_get_data_abd.isra.0+0x4f/0x60 [zfs]
[ 1180.442759]  arc_hdr_alloc_abd+0x55/0xb0 [zfs]
[ 1180.442791]  arc_hdr_alloc+0xd1/0x1c0 [zfs]
[ 1180.442823]  arc_read+0x559/0x11a0 [zfs]
[ 1180.442825]  ? __kmalloc_node+0x213/0x330
[ 1180.442863]  traverse_prefetch_metadata+0xc2/0x100 [zfs]
[ 1180.442898]  traverse_visitbp+0x433/0xa30 [zfs]
[ 1180.442900]  ? __kmalloc_node+0x213/0x330
[ 1180.442934]  traverse_visitbp+0x4a3/0xa30 [zfs]
[ 1180.442969]  traverse_dnode+0xba/0x1f0 [zfs]
[ 1180.443002]  traverse_visitbp+0x8ed/0xa30 [zfs]
[ 1180.443034]  ? arc_buf_destroy_impl+0x14e/0x300 [zfs]
[ 1180.443068]  traverse_impl+0x1e3/0x480 [zfs]
[ 1180.443102]  traverse_dataset_resume+0x4c/0x60 [zfs]
[ 1180.443146]  ? spa_sync+0xff0/0xff0 [zfs]
[ 1180.443180]  traverse_pool+0x184/0x1c0 [zfs]
[ 1180.443220]  ? spa_sync+0xff0/0xff0 [zfs]
[ 1180.443260]  spa_load+0x124e/0x14e0 [zfs]
[ 1180.443299]  spa_load_best+0x57/0x2d0 [zfs]
[ 1180.443336]  spa_import+0x1eb/0x820 [zfs]
[ 1180.443341]  ? nvpair_value_common.part.0+0xc2/0x140 [znvpair]
[ 1180.443390]  zfs_ioc_pool_import+0x138/0x150 [zfs]
[ 1180.443436]  zfsdev_ioctl_common+0x657/0x810 [zfs]
[ 1180.443438]  ? __kmalloc_node+0x267/0x330
[ 1180.443484]  zfsdev_ioctl+0x55/0xe0 [zfs]
[ 1180.443487]  do_vfs_ioctl+0x407/0x670
[ 1180.443490]  ? do_user_addr_fault+0x216/0x450
[ 1180.443492]  ksys_ioctl+0x67/0x90
[ 1180.443494]  __x64_sys_ioctl+0x1a/0x20
[ 1180.443496]  do_syscall_64+0x57/0x190
[ 1180.443498]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

The text was updated successfully, but these errors were encountered:

behlendorf · 2021-06-16T20:27:10Z

@tcaputi if your pool is still damaged I'd suggest applying the patch from #12054. While the original motivation behind that change was to resolve a deadlock when removing L2ARC devices, the fix happens to move some block pointer verification code in to arc_read(). That should let arc_read() detect the damage in many cases and then handle it like any other checksum error. Instead of asserting.

Oh course this doesn't address any of your points about this not being particularly easy to resolve, not does it explain how the block pointer was damaged in the first place. But it's a start.

tcaputi · 2021-06-16T23:17:48Z

Thanks Brian. I'm gunna do some more debugging tonight and see what I can figure out. Is that patch in the latest release?

kernelOfTruth · 2021-06-17T11:08:16Z

it just got merged to master, so it's not in 2.0.4 , just suggested it for inclusion in 2.0.5:

#12182

tcaputi · 2021-06-17T11:20:49Z

Got it. I'll cherrypick it manually tonight and see if it helps. Thanks

stale · 2022-06-18T16:47:28Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

behlendorf · 2022-06-24T22:58:58Z

This particular error path was addressed, as were a few others, but there's no doubt there are others we'll need to improve as they're discovered. We can open new issues as needed, closing.

tcaputi added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Jun 16, 2021

stale bot added the Status: Stale No recent activity for issue label Jun 18, 2022

behlendorf removed Status: Stale No recent activity for issue Status: Triage Needed New issue which needs to be triaged labels Jun 24, 2022

behlendorf closed this as completed Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad block pointer and poor user experience for recovering #12245

Bad block pointer and poor user experience for recovering #12245

tcaputi commented Jun 16, 2021

behlendorf commented Jun 16, 2021 •

edited

Loading

tcaputi commented Jun 16, 2021 •

edited

Loading

kernelOfTruth commented Jun 17, 2021

tcaputi commented Jun 17, 2021

stale bot commented Jun 18, 2022

behlendorf commented Jun 24, 2022

Bad block pointer and poor user experience for recovering #12245

Bad block pointer and poor user experience for recovering #12245

Comments

tcaputi commented Jun 16, 2021

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

behlendorf commented Jun 16, 2021 • edited Loading

tcaputi commented Jun 16, 2021 • edited Loading

kernelOfTruth commented Jun 17, 2021

tcaputi commented Jun 17, 2021

stale bot commented Jun 18, 2022

behlendorf commented Jun 24, 2022

behlendorf commented Jun 16, 2021 •

edited

Loading

tcaputi commented Jun 16, 2021 •

edited

Loading