Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS/zpool hang on zfs destroying a ZVOL with a snapshot #1948

Closed
niekbergboer opened this issue Dec 10, 2013 · 9 comments
Closed

ZFS/zpool hang on zfs destroying a ZVOL with a snapshot #1948

niekbergboer opened this issue Dec 10, 2013 · 9 comments

Comments

@niekbergboer
Copy link

Scenario:

Debian Jessie system kernel Linux batalix 3.11-2-amd64 #1 SMP Debian 3.11.8-1 (2013-11-13) x86_64 GNU/Linux . Running ZoL 0.6.2.

I moved a ZVOL by snapshotting it, zfs sending it elsewhere, after which I deleted the ZVOL from the source pool:

$ sudo zfs destroy -r -v vmvol/sb
will destroy vmvol/sb@move

... after which nothing happens anymore. zpool and zfs commands hang, although I can access files on already-mounted ZFS datasets. After a while, the following appears in dmesg:

[102003.731621] INFO: task spl_system_task:1447 blocked for more than 120 seconds.
[102003.731624] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102003.731625] spl_system_task D ffff880310994360 0 1447 2 0x00000000
[102003.731629] ffff880310994040 0000000000000246 0000000000014300 ffff88030e8dffd8
[102003.731631] 0000000000014300 ffff88030e8dffd8 ffff88030e0d3468 ffff88030e0d3440
[102003.731633] ffff88030e0d3470 0000000000000000 0000000000000002 0000000000000000
[102003.731635] Call Trace:
[102003.731649] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102003.731654] [] ? wake_up_atomic_t+0x30/0x30
[102003.731667] [] ? traverse_prefetcher+0x8b/0x140 [zfs]
[102003.731677] [] ? traverse_visitbp+0x2d7/0x6d0 [zfs]
[102003.731684] [] ? arc_read+0x549/0x8d0 [zfs]
[102003.731693] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731702] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731711] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731720] [] ? traverse_dnode+0x78/0x130 [zfs]
[102003.731729] [] ? traverse_visitbp+0x504/0x6d0 [zfs]
[102003.731738] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731747] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731755] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731764] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731773] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731781] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731790] [] ? traverse_dnode+0x78/0x130 [zfs]
[102003.731798] [] ? traverse_visitbp+0x5bd/0x6d0 [zfs]
[102003.731802] [] ? xen_end_context_switch+0x9/0x20
[102003.731804] [] ? __switch_to+0x125/0x490
[102003.731813] [] ? traverse_prefetch_thread+0x86/0xc0 [zfs]
[102003.731822] [] ? dmu_recv_end+0x210/0x210 [zfs]
[102003.731826] [] ? taskq_thread+0x22c/0x4a0 [spl]
[102003.731829] [] ? wake_up_state+0x10/0x10
[102003.731833] [] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[102003.731835] [] ? kthread+0xaf/0xc0
[102003.731838] [] ? kthread_create_on_node+0x110/0x110
[102003.731841] [] ? ret_from_fork+0x7c/0xb0
[102003.731844] [] ? kthread_create_on_node+0x110/0x110

@niekbergboer
Copy link
Author

After a little while more, considerably more noise appears:

[102123.730810] INFO: task spl_system_task:1447 blocked for more than 120 seconds.
[102123.730814] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102123.730815] spl_system_task D ffff880310994360 0 1447 2 0x00000000
[102123.730819] ffff880310994040 0000000000000246 0000000000014300 ffff88030e8dffd8
[102123.730821] 0000000000014300 ffff88030e8dffd8 ffff88030e0d3468 ffff88030e0d3440
[102123.730823] ffff88030e0d3470 0000000000000000 0000000000000002 0000000000000000
[102123.730825] Call Trace:
[102123.730838] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102123.730843] [] ? wake_up_atomic_t+0x30/0x30
[102123.730857] [] ? traverse_prefetcher+0x8b/0x140 [zfs]
[102123.730866] [] ? traverse_visitbp+0x2d7/0x6d0 [zfs]
[102123.730873] [] ? arc_read+0x549/0x8d0 [zfs]
[102123.730883] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730892] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730902] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730911] [] ? traverse_dnode+0x78/0x130 [zfs]
[102123.730920] [] ? traverse_visitbp+0x504/0x6d0 [zfs]
[102123.730929] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730938] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730946] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730955] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730964] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730972] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730981] [] ? traverse_dnode+0x78/0x130 [zfs]
[102123.730989] [] ? traverse_visitbp+0x5bd/0x6d0 [zfs]
[102123.730992] [] ? xen_end_context_switch+0x9/0x20
[102123.730995] [] ? __switch_to+0x125/0x490
[102123.731004] [] ? traverse_prefetch_thread+0x86/0xc0 [zfs]
[102123.731013] [] ? dmu_recv_end+0x210/0x210 [zfs]
[102123.731017] [] ? taskq_thread+0x22c/0x4a0 [spl]
[102123.731020] [] ? wake_up_state+0x10/0x10
[102123.731023] [] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[102123.731026] [] ? kthread+0xaf/0xc0
[102123.731028] [] ? kthread_create_on_node+0x110/0x110
[102123.731032] [] ? ret_from_fork+0x7c/0xb0
[102123.731034] [] ? kthread_create_on_node+0x110/0x110
[102123.731091] INFO: task zfs:19108 blocked for more than 120 seconds.
[102123.731092] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102123.731093] zfs D ffff88025eb003e0 0 19108 19107 0x00000004
[102123.731095] ffff88025eb000c0 0000000000000282 0000000000014300 ffff8802a2d25fd8
[102123.731097] 0000000000014300 ffff8802a2d25fd8 ffff88025eb000c0 ffff8802a2d25d80
[102123.731099] ffff88022de145e8 ffff88022de145f0 ffffffff00000000 ffff88022de145f8
[102123.731101] Call Trace:
[102123.731105] [] ? rwsem_down_write_failed+0xfd/0x1d0
[102123.731115] [] ? dnode_rele+0x3a/0x70 [zfs]
[102123.731119] [] ? call_rwsem_down_write_failed+0x13/0x20
[102123.731122] [] ? down_write+0x24/0x26
[102123.731132] [] ? dsl_dataset_make_exclusive+0x93/0xc0 [zfs]
[102123.731143] [] ? dsl_dataset_destroy+0x78/0x4d0 [zfs]
[102123.731145] [] ? mutex_lock+0x9/0x25
[102123.731148] [] ? _raw_spin_unlock_irqrestore+0xe/0x10
[102123.731158] [] ? dsl_dataset_tryown+0x36/0x130 [zfs]
[102123.731167] [] ? dmu_objset_destroy+0x31/0x40 [zfs]
[102123.731178] [] ? zfs_ioc_destroy+0x2c/0x70 [zfs]
[102123.731187] [] ? zfsdev_ioctl+0xef/0x190 [zfs]
[102123.731191] [] ? do_vfs_ioctl+0x2d4/0x4b0
[102123.731193] [] ? finish_task_switch+0x48/0xd0
[102123.731194] [] ? SyS_ioctl+0x80/0xa0
[102123.731197] [] ? system_call_fastpath+0x16/0x1b
[102123.731199] INFO: task zpool:19203 blocked for more than 120 seconds.
[102123.731200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102123.731201] zpool D ffff880286aaeaa0 0 19203 19202 0x00000004
[102123.731203] ffff880286aae780 0000000000000286 0000000000014300 ffff8802adf25fd8
[102123.731205] 0000000000014300 ffff8802adf25fd8 ffff88022de14610 ffff88022de145b0
[102123.731207] ffff88022de14618 0000000000000000 0000000000000002 ffff88022de14400
[102123.731208] Call Trace:
[102123.731212] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102123.731215] [] ? wake_up_atomic_t+0x30/0x30
[102123.731225] [] ? dsl_dataset_hold_ref+0x102/0x1d0 [zfs]
[102123.731235] [] ? dsl_dataset_stats+0x1e3/0x300 [zfs]
[102123.731245] [] ? dmu_objset_stats+0x15/0x50 [zfs]
[102123.731255] [] ? zfs_ioc_objset_stats_impl+0x55/0xe0 [zfs]
[102123.731264] [] ? zfs_ioc_objset_stats+0x2f/0x50 [zfs]
[102123.731272] [] ? zfs_ioc_dataset_list_next+0x17b/0x1b0 [zfs]
[102123.731281] [] ? zfsdev_ioctl+0xef/0x190 [zfs]
[102123.731283] [] ? do_vfs_ioctl+0x2d4/0x4b0
[102123.731286] [] ? vfs_read+0xee/0x160
[102123.731288] [] ? SyS_ioctl+0x80/0xa0
[102123.731290] [] ? SyS_read+0x43/0xa0
[102123.731293] [] ? system_call_fastpath+0x16/0x1b
[102243.729992] INFO: task spl_system_task:1447 blocked for more than 120 seconds.
[102243.730005] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102243.730007] spl_system_task D ffff880310994360 0 1447 2 0x00000000
[102243.730011] ffff880310994040 0000000000000246 0000000000014300 ffff88030e8dffd8
[102243.730013] 0000000000014300 ffff88030e8dffd8 ffff88030e0d3468 ffff88030e0d3440
[102243.730015] ffff88030e0d3470 0000000000000000 0000000000000002 0000000000000000
[102243.730017] Call Trace:
[102243.730030] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102243.730035] [] ? wake_up_atomic_t+0x30/0x30
[102243.730048] [] ? traverse_prefetcher+0x8b/0x140 [zfs]
[102243.730057] [] ? traverse_visitbp+0x2d7/0x6d0 [zfs]
[102243.730064] [] ? arc_read+0x549/0x8d0 [zfs]
[102243.730074] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730083] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730092] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730101] [] ? traverse_dnode+0x78/0x130 [zfs]
[102243.730110] [] ? traverse_visitbp+0x504/0x6d0 [zfs]
[102243.730119] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730128] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730136] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730145] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730153] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730162] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730171] [] ? traverse_dnode+0x78/0x130 [zfs]
[102243.730179] [] ? traverse_visitbp+0x5bd/0x6d0 [zfs]
[102243.730183] [] ? xen_end_context_switch+0x9/0x20
[102243.730186] [] ? __switch_to+0x125/0x490
[102243.730195] [] ? traverse_prefetch_thread+0x86/0xc0 [zfs]
[102243.730204] [] ? dmu_recv_end+0x210/0x210 [zfs]
[102243.730208] [] ? taskq_thread+0x22c/0x4a0 [spl]
[102243.730211] [] ? wake_up_state+0x10/0x10
[102243.730214] [] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[102243.730217] [] ? kthread+0xaf/0xc0
[102243.730220] [] ? kthread_create_on_node+0x110/0x110
[102243.730223] [] ? ret_from_fork+0x7c/0xb0
[102243.730225] [] ? kthread_create_on_node+0x110/0x110
[102243.730281] INFO: task zfs:19108 blocked for more than 120 seconds.
[102243.730281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102243.730282] zfs D ffff88025eb003e0 0 19108 19107 0x00000004
[102243.730285] ffff88025eb000c0 0000000000000282 0000000000014300 ffff8802a2d25fd8
[102243.730286] 0000000000014300 ffff8802a2d25fd8 ffff88025eb000c0 ffff8802a2d25d80
[102243.730288] ffff88022de145e8 ffff88022de145f0 ffffffff00000000 ffff88022de145f8
[102243.730290] Call Trace:
[102243.730294] [] ? rwsem_down_write_failed+0xfd/0x1d0
[102243.730304] [] ? dnode_rele+0x3a/0x70 [zfs]
[102243.730308] [] ? call_rwsem_down_write_failed+0x13/0x20
[102243.730311] [] ? down_write+0x24/0x26
[102243.730322] [] ? dsl_dataset_make_exclusive+0x93/0xc0 [zfs]
[102243.730332] [] ? dsl_dataset_destroy+0x78/0x4d0 [zfs]
[102243.730335] [] ? mutex_lock+0x9/0x25
[102243.730337] [] ? _raw_spin_unlock_irqrestore+0xe/0x10
[102243.730347] [] ? dsl_dataset_tryown+0x36/0x130 [zfs]
[102243.730356] [] ? dmu_objset_destroy+0x31/0x40 [zfs]
[102243.730367] [] ? zfs_ioc_destroy+0x2c/0x70 [zfs]
[102243.730376] [] ? zfsdev_ioctl+0xef/0x190 [zfs]
[102243.730379] [] ? do_vfs_ioctl+0x2d4/0x4b0
[102243.730381] [] ? finish_task_switch+0x48/0xd0
[102243.730383] [] ? SyS_ioctl+0x80/0xa0
[102243.730386] [] ? system_call_fastpath+0x16/0x1b
[102243.730388] INFO: task zpool:19203 blocked for more than 120 seconds.
[102243.730388] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102243.730389] zpool D ffff880286aaeaa0 0 19203 19202 0x00000004
[102243.730391] ffff880286aae780 0000000000000286 0000000000014300 ffff8802adf25fd8
[102243.730393] 0000000000014300 ffff8802adf25fd8 ffff88022de14610 ffff88022de145b0
[102243.730395] ffff88022de14618 0000000000000000 0000000000000002 ffff88022de14400
[102243.730397] Call Trace:
[102243.730400] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102243.730403] [] ? wake_up_atomic_t+0x30/0x30
[102243.730413] [] ? dsl_dataset_hold_ref+0x102/0x1d0 [zfs]
[102243.730423] [] ? dsl_dataset_stats+0x1e3/0x300 [zfs]
[102243.730433] [] ? dmu_objset_stats+0x15/0x50 [zfs]
[102243.730443] [] ? zfs_ioc_objset_stats_impl+0x55/0xe0 [zfs]
[102243.730451] [] ? zfs_ioc_objset_stats+0x2f/0x50 [zfs]
[102243.730460] [] ? zfs_ioc_dataset_list_next+0x17b/0x1b0 [zfs]
[102243.730469] [] ? zfsdev_ioctl+0xef/0x190 [zfs]
[102243.730471] [] ? do_vfs_ioctl+0x2d4/0x4b0
[102243.730474] [] ? vfs_read+0xee/0x160
[102243.730475] [] ? SyS_ioctl+0x80/0xa0
[102243.730478] [] ? SyS_read+0x43/0xa0
[102243.730480] [] ? system_call_fastpath+0x16/0x1b
[102243.730483] INFO: task zfs:19305 blocked for more than 120 seconds.
[102243.730483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102243.730484] zfs D ffff88031091f360 0 19305 19304 0x00000004
[102243.730486] ffff88031091f040 0000000000000286 0000000000014300 ffff880132e99fd8
[102243.730488] 0000000000014300 ffff880132e99fd8 ffff88022de14610 ffff88022de145b0
[102243.730490] ffff88022de14618 0000000000000000 0000000000000002 ffff88022de14400
[102243.730491] Call Trace:
[102243.730495] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102243.730497] [] ? wake_up_atomic_t+0x30/0x30
[102243.730507] [] ? dsl_dataset_hold_ref+0x102/0x1d0 [zfs]
[102243.730517] [] ? dsl_dataset_stats+0x1e3/0x300 [zfs]
[102243.730519] [] ? mutex_lock+0x9/0x25
[102243.730529] [] ? dnode_rele+0x3a/0x70 [zfs]
[102243.730531] [] ? kfree+0x131/0x230
[102243.730541] [] ? dmu_objset_stats+0x15/0x50 [zfs]
[102243.730550] [] ? zfs_ioc_objset_stats_impl+0x55/0xe0 [zfs]
[102243.730558] [] ? zfs_ioc_objset_stats+0x2f/0x50 [zfs]
[102243.730567] [] ? zfs_ioc_dataset_list_next+0x17b/0x1b0 [zfs]
[102243.730575] [] ? zfsdev_ioctl+0xef/0x190 [zfs]
[102243.730577] [] ? do_vfs_ioctl+0x2d4/0x4b0
[102243.730580] [] ? vfs_read+0xee/0x160
[102243.730581] [] ? SyS_ioctl+0x80/0xa0
[102243.730583] [] ? SyS_read+0x43/0xa0
[102243.730586] [] ? system_call_fastpath+0x16/0x1b
[102363.729201] INFO: task spl_system_task:1447 blocked for more than 120 seconds.
[102363.729204] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102363.729206] spl_system_task D ffff880310994360 0 1447 2 0x00000000
[102363.729209] ffff880310994040 0000000000000246 0000000000014300 ffff88030e8dffd8
[102363.729212] 0000000000014300 ffff88030e8dffd8 ffff88030e0d3468 ffff88030e0d3440
[102363.729213] ffff88030e0d3470 0000000000000000 0000000000000002 0000000000000000
[102363.729216] Call Trace:
[102363.729229] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102363.729234] [] ? wake_up_atomic_t+0x30/0x30
[102363.729248] [] ? traverse_prefetcher+0x8b/0x140 [zfs]
[102363.729258] [] ? traverse_visitbp+0x2d7/0x6d0 [zfs]
[102363.729266] [] ? arc_read+0x549/0x8d0 [zfs]
[102363.729276] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729285] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729294] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729303] [] ? traverse_dnode+0x78/0x130 [zfs]
[102363.729312] [] ? traverse_visitbp+0x504/0x6d0 [zfs]
[102363.729321] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729330] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729338] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729347] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729355] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729364] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729373] [] ? traverse_dnode+0x78/0x130 [zfs]
[102363.729382] [] ? traverse_visitbp+0x5bd/0x6d0 [zfs]
[102363.729385] [] ? xen_end_context_switch+0x9/0x20
[102363.729389] [] ? __switch_to+0x125/0x490
[102363.729397] [] ? traverse_prefetch_thread+0x86/0xc0 [zfs]
[102363.729406] [] ? dmu_recv_end+0x210/0x210 [zfs]
[102363.729410] [] ? taskq_thread+0x22c/0x4a0 [spl]
[102363.729414] [] ? wake_up_state+0x10/0x10
[102363.729418] [] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[102363.729421] [] ? kthread+0xaf/0xc0
[102363.729423] [] ? kthread_create_on_node+0x110/0x110
[102363.729427] [] ? ret_from_fork+0x7c/0xb0
[102363.729429] [] ? kthread_create_on_node+0x110/0x110
[102363.729486] INFO: task zfs:19108 blocked for more than 120 seconds.
[102363.729486] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102363.729488] zfs D ffff88025eb003e0 0 19108 19107 0x00000004
[102363.729490] ffff88025eb000c0 0000000000000282 0000000000014300 ffff8802a2d25fd8
[102363.729492] 0000000000014300 ffff8802a2d25fd8 ffff88025eb000c0 ffff8802a2d25d80
[102363.729494] ffff88022de145e8 ffff88022de145f0 ffffffff00000000 ffff88022de145f8
[102363.729495] Call Trace:
[102363.729499] [] ? rwsem_down_write_failed+0xfd/0x1d0
[102363.729510] [] ? dnode_rele+0x3a/0x70 [zfs]
[102363.729514] [] ? call_rwsem_down_write_failed+0x13/0x20
[102363.729517] [] ? down_write+0x24/0x26
[102363.729527] [] ? dsl_dataset_make_exclusive+0x93/0xc0 [zfs]
[102363.729538] [] ? dsl_dataset_destroy+0x78/0x4d0 [zfs]
[102363.729540] [] ? mutex_lock+0x9/0x25
[102363.729543] [] ? _raw_spin_unlock_irqrestore+0xe/0x10
[102363.729553] [] ? dsl_dataset_tryown+0x36/0x130 [zfs]
[102363.729562] [] ? dmu_objset_destroy+0x31/0x40 [zfs]
[102363.729573] [] ? zfs_ioc_destroy+0x2c/0x70 [zfs]
[102363.729582] [] ? zfsdev_ioctl+0xef/0x190 [zfs]
[102363.729585] [] ? do_vfs_ioctl+0x2d4/0x4b0
[102363.729588] [] ? finish_task_switch+0x48/0xd0
[102363.729589] [] ? SyS_ioctl+0x80/0xa0
[102363.729592] [] ? system_call_fastpath+0x16/0x1b

@niekbergboer
Copy link
Author

I have no idea as to whether this is relevant, but the Debian system at hand runs as dom0 under Xen version 4.1.4 (Debian 4.1.4-4).

The reason for mentioning this is that I see a couple of xen_context_switch related lines in the above kernel messages.

@behlendorf
Copy link
Contributor

There's a decent chance this was caused by a stack overrun. The stack usage for this call path was already reduced in master by commit a168788.

@niekbergboer
Copy link
Author

I can give that a try; can I patch that into a clean 0.6.2, or does it depend on additional changes?

@behlendorf
Copy link
Contributor

@niekbergboer It should apply cleanly and it's safe the cherry pick.

@ryao
Copy link
Contributor

ryao commented May 16, 2014

@behlendorf I reproduced this today on my system. I do not think a stack overrun is the cause.

I did a replication stream to a new pool, killed it mid-way, destroyed the new pool and then tried destroying a dataset that was obsolete before restarting the stream. Something during this triggered the following:

[  843.106235] INFO: task spl_system_task:2300 blocked for more than 120 seconds.
[  843.106241]       Tainted: P           O 3.14.2 #1
[  843.106244] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  843.106247] spl_system_task D ffff881032a0f0f0     0  2300      2 0x00000000
[  843.106255]  ffff8810330e55f0 0000000000000002 000000000000a000 ffff881032a0f0f0
[  843.106263]  ffff8810330e5fd8 ffff881032a0f0f0 ffff8810330e5538 ffffffff814124c0
[  843.106270]  ffff880d141c3d98 ffff8810330e5568 ffffffffa0372893 ffff880d141c3d98
[  843.106277] Call Trace:
[  843.106288]  [<ffffffff814124c0>] ? mutex_lock+0x20/0x3d
[  843.106350]  [<ffffffffa0372893>] ? zio_wait_for_children+0x53/0x60 [zfs]
[  843.106392]  [<ffffffffa0373134>] ? zio_vdev_io_done+0x64/0x150 [zfs]
[  843.106399]  [<ffffffff8107ab9d>] ? get_parent_ip+0xd/0x50
[  843.106405]  [<ffffffff8141790b>] ? preempt_count_sub+0x6b/0xf0
[  843.106410]  [<ffffffff814135f8>] ? _raw_spin_unlock_irqrestore+0x28/0x60
[  843.106417]  [<ffffffff81410819>] schedule+0x29/0x70
[  843.106431]  [<ffffffffa0187e95>] cv_wait_common+0xb5/0xf0 [spl]
[  843.106438]  [<ffffffff8108b040>] ? abort_exclusive_wait+0xb0/0xb0
[  843.106448]  [<ffffffffa0187ee5>] __cv_wait+0x15/0x20 [spl]
[  843.106476]  [<ffffffffa02f735b>] traverse_prefetcher+0x9b/0x160 [zfs]
[  843.106503]  [<ffffffffa02f77ed>] traverse_visitbp+0x2cd/0x740 [zfs]
[  843.106509]  [<ffffffff8116022c>] ? __kmalloc+0x7c/0x1e0
[  843.106518]  [<ffffffffa0181bbb>] ? kmem_alloc_debug+0xcb/0x110 [spl]
[  843.106527]  [<ffffffffa0181bbb>] ? kmem_alloc_debug+0xcb/0x110 [spl]
[  843.106553]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106579]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106606]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106632]  [<ffffffffa02f82d5>] traverse_dnode+0x75/0x120 [zfs]
[  843.106657]  [<ffffffffa02f7a8c>] traverse_visitbp+0x56c/0x740 [zfs]
[  843.106684]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106710]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106736]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106762]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106787]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106813]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106840]  [<ffffffffa02f82d5>] traverse_dnode+0x75/0x120 [zfs]
[  843.106865]  [<ffffffffa02f7b51>] traverse_visitbp+0x631/0x740 [zfs]
[  843.106891]  [<ffffffffa02f8223>] traverse_prefetch_thread+0x83/0xc0 [zfs]
[  843.106918]  [<ffffffffa02f72c0>] ? prefetch_dnode_metadata+0xc0/0xc0 [zfs]
[  843.106927]  [<ffffffffa0184c8e>] taskq_thread+0x1ae/0x350 [spl]
[  843.106933]  [<ffffffff8107c4b0>] ? wake_up_process+0x40/0x40
[  843.106943]  [<ffffffffa0184ae0>] ? taskq_cancel_id+0x120/0x120 [spl]
[  843.106948]  [<ffffffff8106cde4>] kthread+0xe4/0x100
[  843.106953]  [<ffffffff8106cd00>] ? kthread_create_on_node+0x190/0x190
[  843.106959]  [<ffffffff8141bdac>] ret_from_fork+0x7c/0xb0
[  843.106963]  [<ffffffff8106cd00>] ? kthread_create_on_node+0x190/0x190

@ryao
Copy link
Contributor

ryao commented May 16, 2014

I seem to be able to reproduce this reliably when trying to migrate to a new root pool. FreeBSD has a possible fix:

freebsd/freebsd-src@4995789cde5

I am testing it on my system now.

@ryao
Copy link
Contributor

ryao commented May 16, 2014

That FreeBSD patch does not fix this problem.

@behlendorf
Copy link
Contributor

Closing, this should be resolved in 0.6.5.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants