Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad block pointer and poor user experience for recovering #12245

Closed
tcaputi opened this issue Jun 16, 2021 · 6 comments
Closed

Bad block pointer and poor user experience for recovering #12245

tcaputi opened this issue Jun 16, 2021 · 6 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@tcaputi
Copy link
Contributor

tcaputi commented Jun 16, 2021

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 20.04
Linux Kernel 5.4.0-74-generic
Architecture x86_64
ZFS Version 9a15da17 + 2 extra cherry-picked commits for dropbear support

Describe the problem you're observing

My personal server has a bad block pointer. The server hosts several personal projects of mine including backup services for devices on my network, a minecraft server, a number of websites, and a service that downloads and categorizes tweets in a mysql database for research projects. The twitter service is the most intensive workload on the server.

The code is hitting zfs_panic_recover() so it seems that we somehow wrote out a correct checksum for this bad block pointer, which is very concerning. See the logs for more details.

In addition to the inconvenience of corrupted data, the experience of attempting to recover from this situation was not great:

  • The systemd services hang indefinitely without a timeout.
  • It is not obvious to the user that they can try to set zfs_recover = 1 to attempt to resolve the issue and its not clear to the user what this will actually do.
  • In my case, my server panics anyway once this flag is set because one of the bad values in the block pointer is an absolutely massive allocated size.
  • We don't print a lot of helpful information anywhere about the actual bp thats corrupted. It is hard to know what data has been lost or even just find it in the pool for debugging purposes

Describe how to reproduce the problem

Unfortunately I don't know what caused this, but it does look like a code problem. We did see this at Datto several times in production and never 100% got to the bottom of it. It was simply too infrequent to debug and hard to identify when the problem was actually introduced.

Include any warning/errors/backtraces from the system logs

Backtraces from dmesg (note the WARNING messages):

[ 1092.230029] INFO: task zpool:58350 blocked for more than 120 seconds.
[ 1092.230034]       Tainted: P           OE     5.4.0-74-generic #83-Ubuntu
[ 1092.230036] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1092.230038] zpool           D    0 58350      1 0x00000000
[ 1092.230040] Call Trace:
[ 1092.230047]  __schedule+0x2e3/0x740
[ 1092.230050]  schedule+0x42/0xb0
[ 1092.230051]  schedule_preempt_disabled+0xe/0x10
[ 1092.230053]  __mutex_lock.isra.0+0x182/0x4f0
[ 1092.230054]  ? _cond_resched+0x19/0x30
[ 1092.230056]  __mutex_lock_slowpath+0x13/0x20
[ 1092.230057]  mutex_lock+0x2e/0x40
[ 1092.230107]  spa_open_common+0x61/0x4d0 [zfs]
[ 1092.230154]  spa_get_stats+0x57/0x580 [zfs]
[ 1092.230192]  zfs_ioc_pool_stats+0x39/0x90 [zfs]
[ 1092.230229]  zfsdev_ioctl_common+0x657/0x810 [zfs]
[ 1092.230232]  ? __kmalloc_node+0x267/0x330
[ 1092.230268]  zfsdev_ioctl+0x55/0xe0 [zfs]
[ 1092.230271]  do_vfs_ioctl+0x407/0x670
[ 1092.230274]  ? do_user_addr_fault+0x216/0x450
[ 1092.230275]  ksys_ioctl+0x67/0x90
[ 1092.230276]  __x64_sys_ioctl+0x1a/0x20
[ 1092.230279]  do_syscall_64+0x57/0x190
[ 1092.230280]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1092.230282] RIP: 0033:0x7f9ae346d37b
[ 1092.230285] Code: Bad RIP value.
[ 1092.230286] RSP: 002b:00007fffc2b33938 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 1092.230288] RAX: ffffffffffffffda RBX: 0000557baa298740 RCX: 00007f9ae346d37b
[ 1092.230288] RDX: 00007fffc2b33960 RSI: 0000000000005a05 RDI: 0000000000000003
[ 1092.230289] RBP: 00007fffc2b36f50 R08: 0000557baa298890 R09: 00007f9ae3541eb0
[ 1092.230289] R10: 00007f9ae3723508 R11: 0000000000000202 R12: 00007fffc2b33960
[ 1092.230290] R13: 0000557baa296470 R14: 0000000000000000 R15: 00007fffc2b36f64
[ 1180.442357] WARNING: pool: blkptr at 000000001daeb35d has invalid TYPE 175
[ 1180.442360] WARNING: pool: blkptr at 000000001daeb35d has invalid CHECKSUM 21
[ 1180.442361] WARNING: pool: blkptr at 000000001daeb35d DVA 0 has invalid VDEV 10679131
[ 1180.442362] WARNING: pool: blkptr at 000000001daeb35d DVA 1 has invalid VDEV 7885259
[ 1180.442363] WARNING: pool: blkptr at 000000001daeb35d DVA 2 has invalid VDEV 10410665
[ 1180.442371] VERIFY3(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT) failed (36028797018963967 < 32768)
[ 1180.442376] PANIC at zio.c:293:zio_buf_alloc()
[ 1180.442378] Showing stack for process 44732
[ 1180.442381] CPU: 8 PID: 44732 Comm: zpool Tainted: P           OE     5.4.0-74-generic #83-Ubuntu
[ 1180.442381] Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 3402 01/13/2021
[ 1180.442382] Call Trace:
[ 1180.442390]  dump_stack+0x6d/0x8b
[ 1180.442399]  spl_dumpstack+0x29/0x2b [spl]
[ 1180.442403]  spl_panic+0xd4/0xfc [spl]
[ 1180.442406]  ? __wake_up_common_lock+0x8a/0xc0
[ 1180.442408]  ? _cond_resched+0x19/0x30
[ 1180.442412]  ? kmem_cache_alloc+0x178/0x230
[ 1180.442416]  ? spl_kmem_cache_alloc+0x82/0x830 [spl]
[ 1180.442419]  ? spl_kmem_cache_alloc+0xa1/0x830 [spl]
[ 1180.442476]  ? zio_taskq_member.isra.0.constprop.0+0x60/0x60 [zfs]
[ 1180.442525]  ? spa_taskq_dispatch_ent+0x65/0xc0 [zfs]
[ 1180.442573]  zio_buf_alloc+0x5b/0x60 [zfs]
[ 1180.442620]  abd_alloc_linear+0x6e/0xf0 [zfs]
[ 1180.442664]  abd_alloc+0x343/0x4a0 [zfs]
[ 1180.442696]  ? arc_space_consume+0x54/0xe0 [zfs]
[ 1180.442728]  arc_get_data_abd.isra.0+0x4f/0x60 [zfs]
[ 1180.442759]  arc_hdr_alloc_abd+0x55/0xb0 [zfs]
[ 1180.442791]  arc_hdr_alloc+0xd1/0x1c0 [zfs]
[ 1180.442823]  arc_read+0x559/0x11a0 [zfs]
[ 1180.442825]  ? __kmalloc_node+0x213/0x330
[ 1180.442863]  traverse_prefetch_metadata+0xc2/0x100 [zfs]
[ 1180.442898]  traverse_visitbp+0x433/0xa30 [zfs]
[ 1180.442900]  ? __kmalloc_node+0x213/0x330
[ 1180.442934]  traverse_visitbp+0x4a3/0xa30 [zfs]
[ 1180.442969]  traverse_dnode+0xba/0x1f0 [zfs]
[ 1180.443002]  traverse_visitbp+0x8ed/0xa30 [zfs]
[ 1180.443034]  ? arc_buf_destroy_impl+0x14e/0x300 [zfs]
[ 1180.443068]  traverse_impl+0x1e3/0x480 [zfs]
[ 1180.443102]  traverse_dataset_resume+0x4c/0x60 [zfs]
[ 1180.443146]  ? spa_sync+0xff0/0xff0 [zfs]
[ 1180.443180]  traverse_pool+0x184/0x1c0 [zfs]
[ 1180.443220]  ? spa_sync+0xff0/0xff0 [zfs]
[ 1180.443260]  spa_load+0x124e/0x14e0 [zfs]
[ 1180.443299]  spa_load_best+0x57/0x2d0 [zfs]
[ 1180.443336]  spa_import+0x1eb/0x820 [zfs]
[ 1180.443341]  ? nvpair_value_common.part.0+0xc2/0x140 [znvpair]
[ 1180.443390]  zfs_ioc_pool_import+0x138/0x150 [zfs]
[ 1180.443436]  zfsdev_ioctl_common+0x657/0x810 [zfs]
[ 1180.443438]  ? __kmalloc_node+0x267/0x330
[ 1180.443484]  zfsdev_ioctl+0x55/0xe0 [zfs]
[ 1180.443487]  do_vfs_ioctl+0x407/0x670
[ 1180.443490]  ? do_user_addr_fault+0x216/0x450
[ 1180.443492]  ksys_ioctl+0x67/0x90
[ 1180.443494]  __x64_sys_ioctl+0x1a/0x20
[ 1180.443496]  do_syscall_64+0x57/0x190
[ 1180.443498]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
@tcaputi tcaputi added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Jun 16, 2021
@behlendorf
Copy link
Contributor

behlendorf commented Jun 16, 2021

@tcaputi if your pool is still damaged I'd suggest applying the patch from #12054. While the original motivation behind that change was to resolve a deadlock when removing L2ARC devices, the fix happens to move some block pointer verification code in to arc_read(). That should let arc_read() detect the damage in many cases and then handle it like any other checksum error. Instead of asserting.

Oh course this doesn't address any of your points about this not being particularly easy to resolve, not does it explain how the block pointer was damaged in the first place. But it's a start.

@tcaputi
Copy link
Contributor Author

tcaputi commented Jun 16, 2021

Thanks Brian. I'm gunna do some more debugging tonight and see what I can figure out. Is that patch in the latest release?

@kernelOfTruth
Copy link
Contributor

it just got merged to master, so it's not in 2.0.4 , just suggested it for inclusion in 2.0.5:

#12182

@tcaputi
Copy link
Contributor Author

tcaputi commented Jun 17, 2021

Got it. I'll cherrypick it manually tonight and see if it helps. Thanks

@stale
Copy link

stale bot commented Jun 18, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Jun 18, 2022
@behlendorf behlendorf removed Status: Stale No recent activity for issue Status: Triage Needed New issue which needs to be triaged labels Jun 24, 2022
@behlendorf
Copy link
Contributor

This particular error path was addressed, as were a few others, but there's no doubt there are others we'll need to improve as they're discovered. We can open new issues as needed, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants