Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"zpool import" hangs #8518

Closed
freywald opened this issue Mar 19, 2019 · 4 comments
Closed

"zpool import" hangs #8518

freywald opened this issue Mar 19, 2019 · 4 comments

Comments

@freywald
Copy link

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version Disco Dingo dev
Linux Kernel 5.0.0-7-generic
Architecture x86_64
ZFS Version 0.7.12-1ubuntu5
SPL Version 0.7.12-1ubuntu3

Describe the problem you're observing

Import does not work anymore. Task hangs.

sudo zpool import POOL1
sudo zpool import POOL2
sudo zpool import POOL3
sudo zpool import POOL4

Describe how to reproduce the problem

i don't know. Yesterday I scrubbed all 8 disks (mirrored, = 4 pools). Installed the newest version of Ubuntu 19.04, the last version was 18.10. Did an extended smart check. zfs complained that the host id changed upon import so it required me to do import -f. Then everything was fine. I rebooted, the import hangs now and I'm not sure how to go from here.

I read the issue #6244, where behlendorf commented that the 0.8.0-version will have tweaks. I'm not sure if I need to try that or if there is another way to make it running again.

Include any warning/errors/backtraces from the system logs

 243.014080] INFO: task zpool:2913 blocked for more than 120 seconds.
[  243.014088]       Tainted: P           OE     5.0.0-7-generic #8-Ubuntu
[  243.014090] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.014094] zpool           D    0  2913   2912 0x00000004
[  243.014099] Call Trace:
[  243.014111]  __schedule+0x2d0/0x840
[  243.014115]  ? __switch_to_asm+0x40/0x70
[  243.014120]  schedule+0x2c/0x70
[  243.014124]  schedule_timeout+0x258/0x360
[  243.014131]  wait_for_completion+0xb7/0x140
[  243.014135]  ? wake_up_q+0x80/0x80
[  243.014145]  __floppy_read_block_0+0x138/0x190 [floppy]
[  243.014153]  ? floppy_cmos_show+0x30/0x30 [floppy]
[  243.014161]  floppy_revalidate+0xf8/0x230 [floppy]
[  243.014166]  check_disk_change+0x62/0x70
[  243.014173]  floppy_open+0x2ae/0x380 [floppy]
[  243.014178]  __blkdev_get+0xe5/0x550
[  243.014182]  ? bd_acquire+0xd0/0xd0
[  243.014185]  blkdev_get+0x10c/0x330
[  243.014190]  ? bd_acquire+0xd0/0xd0
[  243.014193]  blkdev_open+0x92/0x100
[  243.014197]  do_dentry_open+0x138/0x360
[  243.014201]  vfs_open+0x2d/0x30
[  243.014205]  path_openat+0x2d4/0x16d0
[  243.014208]  ? filename_lookup.part.60+0xe0/0x170
[  243.014213]  ? strncpy_from_user+0x56/0x1b0
[  243.014217]  do_filp_open+0x93/0x100
[  243.014220]  ? strncpy_from_user+0x56/0x1b0
[  243.014225]  ? __alloc_fd+0x46/0x140
[  243.014229]  do_sys_open+0x177/0x280
[  243.014233]  ? _cond_resched+0x19/0x30
[  243.014238]  __x64_sys_openat+0x20/0x30
[  243.014242]  do_syscall_64+0x5a/0x110
[  243.014246]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  243.014250] RIP: 0033:0x7f7fe943b4db
[  243.014259] Code: Bad RIP value.
[  243.014261] RSP: 002b:00007ffd579bb4b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  243.014265] RAX: ffffffffffffffda RBX: 000055e5dd9e77e0 RCX: 00007f7fe943b4db
[  243.014267] RDX: 0000000000080000 RSI: 000055e5dd8e1800 RDI: 00000000ffffff9c
[  243.014268] RBP: 000055e5dd8df670 R08: 0000000000000000 R09: 0000000000000000
[  243.014270] R10: 0000000000000000 R11: 0000000000000246 R12: 000000005c9146a4
[  243.014272] R13: 00007f7fe94b43c4 R14: 00000000dc9146a4 R15: 00007f7fe94a3425
sudo cat /proc/2913/stack 
[sudo] password for lycia: 
[<0>] __floppy_read_block_0+0x138/0x190 [floppy]
[<0>] floppy_revalidate+0xf8/0x230 [floppy]
[<0>] check_disk_change+0x62/0x70
[<0>] floppy_open+0x2ae/0x380 [floppy]
[<0>] __blkdev_get+0xe5/0x550
[<0>] blkdev_get+0x10c/0x330
[<0>] blkdev_open+0x92/0x100
[<0>] do_dentry_open+0x138/0x360
[<0>] vfs_open+0x2d/0x30
[<0>] path_openat+0x2d4/0x16d0
[<0>] do_filp_open+0x93/0x100
[<0>] do_sys_open+0x177/0x280
[<0>] __x64_sys_openat+0x20/0x30
[<0>] do_syscall_64+0x5a/0x110
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff
sudo cat /proc/2913/status
Name:	zpool
Umask:	0022
State:	D (disk sleep)
...
@freywald
Copy link
Author

I forgot to tell that I use luks disk encryption. But never mind. I rebooted, disabled my floppy drive, import works now. Then I re-enabled it again and there is still no problem whatsoever anymore.

I'm not sure that it wouldn't happen again.

@pzakha
Copy link
Contributor

pzakha commented Dec 3, 2019

related to linux 5.0 scsi-mq used as default

@kpande Do you have more details on this? We've upgraded to 5.0 kernel and so see the same issue intermittently with the floppy drive. Removing it from the vmx config seems to fix the issue, however I'd like to understand better if there is a way to workaround this without having to log into ESX.

@pzakha
Copy link
Contributor

pzakha commented Dec 4, 2019

So I do not believe this is a ZFS issue, so we should probably close this bug.

Disabling the floppy drive (or blacklisting it as we did in our case) does fix the issue. This problem is not seen on the 4.15 kernel (used by default on Ubuntu 18.04) and is specific to the 5.0 kernel.

This is likely related to: https://lkml.org/lkml/2018/11/23/84

@pzakha pzakha closed this as completed Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@pzakha @freywald and others