ZFS/zpool hang on zfs destroying a ZVOL with a snapshot #1948

niekbergboer · 2013-12-10T18:46:28Z

Scenario:

Debian Jessie system kernel Linux batalix 3.11-2-amd64 #1 SMP Debian 3.11.8-1 (2013-11-13) x86_64 GNU/Linux . Running ZoL 0.6.2.

I moved a ZVOL by snapshotting it, zfs sending it elsewhere, after which I deleted the ZVOL from the source pool:

$ sudo zfs destroy -r -v vmvol/sb
will destroy vmvol/sb@move

... after which nothing happens anymore. zpool and zfs commands hang, although I can access files on already-mounted ZFS datasets. After a while, the following appears in dmesg:

[102003.731621] INFO: task spl_system_task:1447 blocked for more than 120 seconds.
[102003.731624] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102003.731625] spl_system_task D ffff880310994360 0 1447 2 0x00000000
[102003.731629] ffff880310994040 0000000000000246 0000000000014300 ffff88030e8dffd8
[102003.731631] 0000000000014300 ffff88030e8dffd8 ffff88030e0d3468 ffff88030e0d3440
[102003.731633] ffff88030e0d3470 0000000000000000 0000000000000002 0000000000000000
[102003.731635] Call Trace:
[102003.731649] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102003.731654] [] ? wake_up_atomic_t+0x30/0x30
[102003.731667] [] ? traverse_prefetcher+0x8b/0x140 [zfs]
[102003.731677] [] ? traverse_visitbp+0x2d7/0x6d0 [zfs]
[102003.731684] [] ? arc_read+0x549/0x8d0 [zfs]
[102003.731693] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731702] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731711] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731720] [] ? traverse_dnode+0x78/0x130 [zfs]
[102003.731729] [] ? traverse_visitbp+0x504/0x6d0 [zfs]
[102003.731738] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731747] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731755] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731764] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731773] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731781] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102003.731790] [] ? traverse_dnode+0x78/0x130 [zfs]
[102003.731798] [] ? traverse_visitbp+0x5bd/0x6d0 [zfs]
[102003.731802] [] ? xen_end_context_switch+0x9/0x20
[102003.731804] [] ? __switch_to+0x125/0x490
[102003.731813] [] ? traverse_prefetch_thread+0x86/0xc0 [zfs]
[102003.731822] [] ? dmu_recv_end+0x210/0x210 [zfs]
[102003.731826] [] ? taskq_thread+0x22c/0x4a0 [spl]
[102003.731829] [] ? wake_up_state+0x10/0x10
[102003.731833] [] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[102003.731835] [] ? kthread+0xaf/0xc0
[102003.731838] [] ? kthread_create_on_node+0x110/0x110
[102003.731841] [] ? ret_from_fork+0x7c/0xb0
[102003.731844] [] ? kthread_create_on_node+0x110/0x110

niekbergboer · 2013-12-10T18:48:11Z

After a little while more, considerably more noise appears:

[102123.730810] INFO: task spl_system_task:1447 blocked for more than 120 seconds.
[102123.730814] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102123.730815] spl_system_task D ffff880310994360 0 1447 2 0x00000000
[102123.730819] ffff880310994040 0000000000000246 0000000000014300 ffff88030e8dffd8
[102123.730821] 0000000000014300 ffff88030e8dffd8 ffff88030e0d3468 ffff88030e0d3440
[102123.730823] ffff88030e0d3470 0000000000000000 0000000000000002 0000000000000000
[102123.730825] Call Trace:
[102123.730838] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102123.730843] [] ? wake_up_atomic_t+0x30/0x30
[102123.730857] [] ? traverse_prefetcher+0x8b/0x140 [zfs]
[102123.730866] [] ? traverse_visitbp+0x2d7/0x6d0 [zfs]
[102123.730873] [] ? arc_read+0x549/0x8d0 [zfs]
[102123.730883] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730892] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730902] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730911] [] ? traverse_dnode+0x78/0x130 [zfs]
[102123.730920] [] ? traverse_visitbp+0x504/0x6d0 [zfs]
[102123.730929] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730938] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730946] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730955] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730964] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730972] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102123.730981] [] ? traverse_dnode+0x78/0x130 [zfs]
[102123.730989] [] ? traverse_visitbp+0x5bd/0x6d0 [zfs]
[102123.730992] [] ? xen_end_context_switch+0x9/0x20
[102123.730995] [] ? __switch_to+0x125/0x490
[102123.731004] [] ? traverse_prefetch_thread+0x86/0xc0 [zfs]
[102123.731013] [] ? dmu_recv_end+0x210/0x210 [zfs]
[102123.731017] [] ? taskq_thread+0x22c/0x4a0 [spl]
[102123.731020] [] ? wake_up_state+0x10/0x10
[102123.731023] [] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[102123.731026] [] ? kthread+0xaf/0xc0
[102123.731028] [] ? kthread_create_on_node+0x110/0x110
[102123.731032] [] ? ret_from_fork+0x7c/0xb0
[102123.731034] [] ? kthread_create_on_node+0x110/0x110
[102123.731091] INFO: task zfs:19108 blocked for more than 120 seconds.
[102123.731092] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102123.731093] zfs D ffff88025eb003e0 0 19108 19107 0x00000004
[102123.731095] ffff88025eb000c0 0000000000000282 0000000000014300 ffff8802a2d25fd8
[102123.731097] 0000000000014300 ffff8802a2d25fd8 ffff88025eb000c0 ffff8802a2d25d80
[102123.731099] ffff88022de145e8 ffff88022de145f0 ffffffff00000000 ffff88022de145f8
[102123.731101] Call Trace:
[102123.731105] [] ? rwsem_down_write_failed+0xfd/0x1d0
[102123.731115] [] ? dnode_rele+0x3a/0x70 [zfs]
[102123.731119] [] ? call_rwsem_down_write_failed+0x13/0x20
[102123.731122] [] ? down_write+0x24/0x26
[102123.731132] [] ? dsl_dataset_make_exclusive+0x93/0xc0 [zfs]
[102123.731143] [] ? dsl_dataset_destroy+0x78/0x4d0 [zfs]
[102123.731145] [] ? mutex_lock+0x9/0x25
[102123.731148] [] ? _raw_spin_unlock_irqrestore+0xe/0x10
[102123.731158] [] ? dsl_dataset_tryown+0x36/0x130 [zfs]
[102123.731167] [] ? dmu_objset_destroy+0x31/0x40 [zfs]
[102123.731178] [] ? zfs_ioc_destroy+0x2c/0x70 [zfs]
[102123.731187] [] ? zfsdev_ioctl+0xef/0x190 [zfs]
[102123.731191] [] ? do_vfs_ioctl+0x2d4/0x4b0
[102123.731193] [] ? finish_task_switch+0x48/0xd0
[102123.731194] [] ? SyS_ioctl+0x80/0xa0
[102123.731197] [] ? system_call_fastpath+0x16/0x1b
[102123.731199] INFO: task zpool:19203 blocked for more than 120 seconds.
[102123.731200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102123.731201] zpool D ffff880286aaeaa0 0 19203 19202 0x00000004
[102123.731203] ffff880286aae780 0000000000000286 0000000000014300 ffff8802adf25fd8
[102123.731205] 0000000000014300 ffff8802adf25fd8 ffff88022de14610 ffff88022de145b0
[102123.731207] ffff88022de14618 0000000000000000 0000000000000002 ffff88022de14400
[102123.731208] Call Trace:
[102123.731212] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102123.731215] [] ? wake_up_atomic_t+0x30/0x30
[102123.731225] [] ? dsl_dataset_hold_ref+0x102/0x1d0 [zfs]
[102123.731235] [] ? dsl_dataset_stats+0x1e3/0x300 [zfs]
[102123.731245] [] ? dmu_objset_stats+0x15/0x50 [zfs]
[102123.731255] [] ? zfs_ioc_objset_stats_impl+0x55/0xe0 [zfs]
[102123.731264] [] ? zfs_ioc_objset_stats+0x2f/0x50 [zfs]
[102123.731272] [] ? zfs_ioc_dataset_list_next+0x17b/0x1b0 [zfs]
[102123.731281] [] ? zfsdev_ioctl+0xef/0x190 [zfs]
[102123.731283] [] ? do_vfs_ioctl+0x2d4/0x4b0
[102123.731286] [] ? vfs_read+0xee/0x160
[102123.731288] [] ? SyS_ioctl+0x80/0xa0
[102123.731290] [] ? SyS_read+0x43/0xa0
[102123.731293] [] ? system_call_fastpath+0x16/0x1b
[102243.729992] INFO: task spl_system_task:1447 blocked for more than 120 seconds.
[102243.730005] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102243.730007] spl_system_task D ffff880310994360 0 1447 2 0x00000000
[102243.730011] ffff880310994040 0000000000000246 0000000000014300 ffff88030e8dffd8
[102243.730013] 0000000000014300 ffff88030e8dffd8 ffff88030e0d3468 ffff88030e0d3440
[102243.730015] ffff88030e0d3470 0000000000000000 0000000000000002 0000000000000000
[102243.730017] Call Trace:
[102243.730030] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102243.730035] [] ? wake_up_atomic_t+0x30/0x30
[102243.730048] [] ? traverse_prefetcher+0x8b/0x140 [zfs]
[102243.730057] [] ? traverse_visitbp+0x2d7/0x6d0 [zfs]
[102243.730064] [] ? arc_read+0x549/0x8d0 [zfs]
[102243.730074] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730083] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730092] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730101] [] ? traverse_dnode+0x78/0x130 [zfs]
[102243.730110] [] ? traverse_visitbp+0x504/0x6d0 [zfs]
[102243.730119] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730128] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730136] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730145] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730153] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730162] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102243.730171] [] ? traverse_dnode+0x78/0x130 [zfs]
[102243.730179] [] ? traverse_visitbp+0x5bd/0x6d0 [zfs]
[102243.730183] [] ? xen_end_context_switch+0x9/0x20
[102243.730186] [] ? __switch_to+0x125/0x490
[102243.730195] [] ? traverse_prefetch_thread+0x86/0xc0 [zfs]
[102243.730204] [] ? dmu_recv_end+0x210/0x210 [zfs]
[102243.730208] [] ? taskq_thread+0x22c/0x4a0 [spl]
[102243.730211] [] ? wake_up_state+0x10/0x10
[102243.730214] [] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[102243.730217] [] ? kthread+0xaf/0xc0
[102243.730220] [] ? kthread_create_on_node+0x110/0x110
[102243.730223] [] ? ret_from_fork+0x7c/0xb0
[102243.730225] [] ? kthread_create_on_node+0x110/0x110
[102243.730281] INFO: task zfs:19108 blocked for more than 120 seconds.
[102243.730281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102243.730282] zfs D ffff88025eb003e0 0 19108 19107 0x00000004
[102243.730285] ffff88025eb000c0 0000000000000282 0000000000014300 ffff8802a2d25fd8
[102243.730286] 0000000000014300 ffff8802a2d25fd8 ffff88025eb000c0 ffff8802a2d25d80
[102243.730288] ffff88022de145e8 ffff88022de145f0 ffffffff00000000 ffff88022de145f8
[102243.730290] Call Trace:
[102243.730294] [] ? rwsem_down_write_failed+0xfd/0x1d0
[102243.730304] [] ? dnode_rele+0x3a/0x70 [zfs]
[102243.730308] [] ? call_rwsem_down_write_failed+0x13/0x20
[102243.730311] [] ? down_write+0x24/0x26
[102243.730322] [] ? dsl_dataset_make_exclusive+0x93/0xc0 [zfs]
[102243.730332] [] ? dsl_dataset_destroy+0x78/0x4d0 [zfs]
[102243.730335] [] ? mutex_lock+0x9/0x25
[102243.730337] [] ? _raw_spin_unlock_irqrestore+0xe/0x10
[102243.730347] [] ? dsl_dataset_tryown+0x36/0x130 [zfs]
[102243.730356] [] ? dmu_objset_destroy+0x31/0x40 [zfs]
[102243.730367] [] ? zfs_ioc_destroy+0x2c/0x70 [zfs]
[102243.730376] [] ? zfsdev_ioctl+0xef/0x190 [zfs]
[102243.730379] [] ? do_vfs_ioctl+0x2d4/0x4b0
[102243.730381] [] ? finish_task_switch+0x48/0xd0
[102243.730383] [] ? SyS_ioctl+0x80/0xa0
[102243.730386] [] ? system_call_fastpath+0x16/0x1b
[102243.730388] INFO: task zpool:19203 blocked for more than 120 seconds.
[102243.730388] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102243.730389] zpool D ffff880286aaeaa0 0 19203 19202 0x00000004
[102243.730391] ffff880286aae780 0000000000000286 0000000000014300 ffff8802adf25fd8
[102243.730393] 0000000000014300 ffff8802adf25fd8 ffff88022de14610 ffff88022de145b0
[102243.730395] ffff88022de14618 0000000000000000 0000000000000002 ffff88022de14400
[102243.730397] Call Trace:
[102243.730400] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102243.730403] [] ? wake_up_atomic_t+0x30/0x30
[102243.730413] [] ? dsl_dataset_hold_ref+0x102/0x1d0 [zfs]
[102243.730423] [] ? dsl_dataset_stats+0x1e3/0x300 [zfs]
[102243.730433] [] ? dmu_objset_stats+0x15/0x50 [zfs]
[102243.730443] [] ? zfs_ioc_objset_stats_impl+0x55/0xe0 [zfs]
[102243.730451] [] ? zfs_ioc_objset_stats+0x2f/0x50 [zfs]
[102243.730460] [] ? zfs_ioc_dataset_list_next+0x17b/0x1b0 [zfs]
[102243.730469] [] ? zfsdev_ioctl+0xef/0x190 [zfs]
[102243.730471] [] ? do_vfs_ioctl+0x2d4/0x4b0
[102243.730474] [] ? vfs_read+0xee/0x160
[102243.730475] [] ? SyS_ioctl+0x80/0xa0
[102243.730478] [] ? SyS_read+0x43/0xa0
[102243.730480] [] ? system_call_fastpath+0x16/0x1b
[102243.730483] INFO: task zfs:19305 blocked for more than 120 seconds.
[102243.730483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102243.730484] zfs D ffff88031091f360 0 19305 19304 0x00000004
[102243.730486] ffff88031091f040 0000000000000286 0000000000014300 ffff880132e99fd8
[102243.730488] 0000000000014300 ffff880132e99fd8 ffff88022de14610 ffff88022de145b0
[102243.730490] ffff88022de14618 0000000000000000 0000000000000002 ffff88022de14400
[102243.730491] Call Trace:
[102243.730495] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102243.730497] [] ? wake_up_atomic_t+0x30/0x30
[102243.730507] [] ? dsl_dataset_hold_ref+0x102/0x1d0 [zfs]
[102243.730517] [] ? dsl_dataset_stats+0x1e3/0x300 [zfs]
[102243.730519] [] ? mutex_lock+0x9/0x25
[102243.730529] [] ? dnode_rele+0x3a/0x70 [zfs]
[102243.730531] [] ? kfree+0x131/0x230
[102243.730541] [] ? dmu_objset_stats+0x15/0x50 [zfs]
[102243.730550] [] ? zfs_ioc_objset_stats_impl+0x55/0xe0 [zfs]
[102243.730558] [] ? zfs_ioc_objset_stats+0x2f/0x50 [zfs]
[102243.730567] [] ? zfs_ioc_dataset_list_next+0x17b/0x1b0 [zfs]
[102243.730575] [] ? zfsdev_ioctl+0xef/0x190 [zfs]
[102243.730577] [] ? do_vfs_ioctl+0x2d4/0x4b0
[102243.730580] [] ? vfs_read+0xee/0x160
[102243.730581] [] ? SyS_ioctl+0x80/0xa0
[102243.730583] [] ? SyS_read+0x43/0xa0
[102243.730586] [] ? system_call_fastpath+0x16/0x1b
[102363.729201] INFO: task spl_system_task:1447 blocked for more than 120 seconds.
[102363.729204] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102363.729206] spl_system_task D ffff880310994360 0 1447 2 0x00000000
[102363.729209] ffff880310994040 0000000000000246 0000000000014300 ffff88030e8dffd8
[102363.729212] 0000000000014300 ffff88030e8dffd8 ffff88030e0d3468 ffff88030e0d3440
[102363.729213] ffff88030e0d3470 0000000000000000 0000000000000002 0000000000000000
[102363.729216] Call Trace:
[102363.729229] [] ? cv_wait_common+0xe5/0x1a0 [spl]
[102363.729234] [] ? wake_up_atomic_t+0x30/0x30
[102363.729248] [] ? traverse_prefetcher+0x8b/0x140 [zfs]
[102363.729258] [] ? traverse_visitbp+0x2d7/0x6d0 [zfs]
[102363.729266] [] ? arc_read+0x549/0x8d0 [zfs]
[102363.729276] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729285] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729294] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729303] [] ? traverse_dnode+0x78/0x130 [zfs]
[102363.729312] [] ? traverse_visitbp+0x504/0x6d0 [zfs]
[102363.729321] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729330] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729338] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729347] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729355] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729364] [] ? traverse_visitbp+0x414/0x6d0 [zfs]
[102363.729373] [] ? traverse_dnode+0x78/0x130 [zfs]
[102363.729382] [] ? traverse_visitbp+0x5bd/0x6d0 [zfs]
[102363.729385] [] ? xen_end_context_switch+0x9/0x20
[102363.729389] [] ? __switch_to+0x125/0x490
[102363.729397] [] ? traverse_prefetch_thread+0x86/0xc0 [zfs]
[102363.729406] [] ? dmu_recv_end+0x210/0x210 [zfs]
[102363.729410] [] ? taskq_thread+0x22c/0x4a0 [spl]
[102363.729414] [] ? wake_up_state+0x10/0x10
[102363.729418] [] ? taskq_cancel_id+0x1e0/0x1e0 [spl]
[102363.729421] [] ? kthread+0xaf/0xc0
[102363.729423] [] ? kthread_create_on_node+0x110/0x110
[102363.729427] [] ? ret_from_fork+0x7c/0xb0
[102363.729429] [] ? kthread_create_on_node+0x110/0x110
[102363.729486] INFO: task zfs:19108 blocked for more than 120 seconds.
[102363.729486] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102363.729488] zfs D ffff88025eb003e0 0 19108 19107 0x00000004
[102363.729490] ffff88025eb000c0 0000000000000282 0000000000014300 ffff8802a2d25fd8
[102363.729492] 0000000000014300 ffff8802a2d25fd8 ffff88025eb000c0 ffff8802a2d25d80
[102363.729494] ffff88022de145e8 ffff88022de145f0 ffffffff00000000 ffff88022de145f8
[102363.729495] Call Trace:
[102363.729499] [] ? rwsem_down_write_failed+0xfd/0x1d0
[102363.729510] [] ? dnode_rele+0x3a/0x70 [zfs]
[102363.729514] [] ? call_rwsem_down_write_failed+0x13/0x20
[102363.729517] [] ? down_write+0x24/0x26
[102363.729527] [] ? dsl_dataset_make_exclusive+0x93/0xc0 [zfs]
[102363.729538] [] ? dsl_dataset_destroy+0x78/0x4d0 [zfs]
[102363.729540] [] ? mutex_lock+0x9/0x25
[102363.729543] [] ? _raw_spin_unlock_irqrestore+0xe/0x10
[102363.729553] [] ? dsl_dataset_tryown+0x36/0x130 [zfs]
[102363.729562] [] ? dmu_objset_destroy+0x31/0x40 [zfs]
[102363.729573] [] ? zfs_ioc_destroy+0x2c/0x70 [zfs]
[102363.729582] [] ? zfsdev_ioctl+0xef/0x190 [zfs]
[102363.729585] [] ? do_vfs_ioctl+0x2d4/0x4b0
[102363.729588] [] ? finish_task_switch+0x48/0xd0
[102363.729589] [] ? SyS_ioctl+0x80/0xa0
[102363.729592] [] ? system_call_fastpath+0x16/0x1b

niekbergboer · 2013-12-10T18:52:13Z

I have no idea as to whether this is relevant, but the Debian system at hand runs as dom0 under Xen version 4.1.4 (Debian 4.1.4-4).

The reason for mentioning this is that I see a couple of xen_context_switch related lines in the above kernel messages.

behlendorf · 2013-12-10T21:38:42Z

There's a decent chance this was caused by a stack overrun. The stack usage for this call path was already reduced in master by commit a168788.

niekbergboer · 2013-12-16T16:50:43Z

I can give that a try; can I patch that into a clean 0.6.2, or does it depend on additional changes?

behlendorf · 2013-12-16T19:02:43Z

@niekbergboer It should apply cleanly and it's safe the cherry pick.

ryao · 2014-05-16T17:49:27Z

@behlendorf I reproduced this today on my system. I do not think a stack overrun is the cause.

I did a replication stream to a new pool, killed it mid-way, destroyed the new pool and then tried destroying a dataset that was obsolete before restarting the stream. Something during this triggered the following:

[  843.106235] INFO: task spl_system_task:2300 blocked for more than 120 seconds.
[  843.106241]       Tainted: P           O 3.14.2 #1
[  843.106244] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  843.106247] spl_system_task D ffff881032a0f0f0     0  2300      2 0x00000000
[  843.106255]  ffff8810330e55f0 0000000000000002 000000000000a000 ffff881032a0f0f0
[  843.106263]  ffff8810330e5fd8 ffff881032a0f0f0 ffff8810330e5538 ffffffff814124c0
[  843.106270]  ffff880d141c3d98 ffff8810330e5568 ffffffffa0372893 ffff880d141c3d98
[  843.106277] Call Trace:
[  843.106288]  [<ffffffff814124c0>] ? mutex_lock+0x20/0x3d
[  843.106350]  [<ffffffffa0372893>] ? zio_wait_for_children+0x53/0x60 [zfs]
[  843.106392]  [<ffffffffa0373134>] ? zio_vdev_io_done+0x64/0x150 [zfs]
[  843.106399]  [<ffffffff8107ab9d>] ? get_parent_ip+0xd/0x50
[  843.106405]  [<ffffffff8141790b>] ? preempt_count_sub+0x6b/0xf0
[  843.106410]  [<ffffffff814135f8>] ? _raw_spin_unlock_irqrestore+0x28/0x60
[  843.106417]  [<ffffffff81410819>] schedule+0x29/0x70
[  843.106431]  [<ffffffffa0187e95>] cv_wait_common+0xb5/0xf0 [spl]
[  843.106438]  [<ffffffff8108b040>] ? abort_exclusive_wait+0xb0/0xb0
[  843.106448]  [<ffffffffa0187ee5>] __cv_wait+0x15/0x20 [spl]
[  843.106476]  [<ffffffffa02f735b>] traverse_prefetcher+0x9b/0x160 [zfs]
[  843.106503]  [<ffffffffa02f77ed>] traverse_visitbp+0x2cd/0x740 [zfs]
[  843.106509]  [<ffffffff8116022c>] ? __kmalloc+0x7c/0x1e0
[  843.106518]  [<ffffffffa0181bbb>] ? kmem_alloc_debug+0xcb/0x110 [spl]
[  843.106527]  [<ffffffffa0181bbb>] ? kmem_alloc_debug+0xcb/0x110 [spl]
[  843.106553]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106579]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106606]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106632]  [<ffffffffa02f82d5>] traverse_dnode+0x75/0x120 [zfs]
[  843.106657]  [<ffffffffa02f7a8c>] traverse_visitbp+0x56c/0x740 [zfs]
[  843.106684]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106710]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106736]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106762]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106787]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106813]  [<ffffffffa02f795b>] traverse_visitbp+0x43b/0x740 [zfs]
[  843.106840]  [<ffffffffa02f82d5>] traverse_dnode+0x75/0x120 [zfs]
[  843.106865]  [<ffffffffa02f7b51>] traverse_visitbp+0x631/0x740 [zfs]
[  843.106891]  [<ffffffffa02f8223>] traverse_prefetch_thread+0x83/0xc0 [zfs]
[  843.106918]  [<ffffffffa02f72c0>] ? prefetch_dnode_metadata+0xc0/0xc0 [zfs]
[  843.106927]  [<ffffffffa0184c8e>] taskq_thread+0x1ae/0x350 [spl]
[  843.106933]  [<ffffffff8107c4b0>] ? wake_up_process+0x40/0x40
[  843.106943]  [<ffffffffa0184ae0>] ? taskq_cancel_id+0x120/0x120 [spl]
[  843.106948]  [<ffffffff8106cde4>] kthread+0xe4/0x100
[  843.106953]  [<ffffffff8106cd00>] ? kthread_create_on_node+0x190/0x190
[  843.106959]  [<ffffffff8141bdac>] ret_from_fork+0x7c/0xb0
[  843.106963]  [<ffffffff8106cd00>] ? kthread_create_on_node+0x190/0x190

ryao · 2014-05-16T18:40:22Z

I seem to be able to reproduce this reliably when trying to migrate to a new root pool. FreeBSD has a possible fix:

freebsd/freebsd-src@4995789cde5

I am testing it on my system now.

ryao · 2014-05-16T19:07:25Z

That FreeBSD patch does not fix this problem.

behlendorf · 2016-05-15T23:25:49Z

Closing, this should be resolved in 0.6.5.x.

ryao mentioned this issue May 16, 2014

Possible bug in traverse_visitbp #2060

Closed

FransUrbo mentioned this issue Nov 21, 2013

expandsize=16.0E #1468

Closed

behlendorf removed this from the 0.6.4 milestone Oct 29, 2014

behlendorf added Bug - Minor and removed Bug labels Oct 29, 2014

kernelOfTruth mentioned this issue Mar 5, 2015

kernel error on 3.16.7 during zfs send #2873

Closed

kernelOfTruth mentioned this issue Jul 25, 2015

random call trace on heavy load #3148

Closed

behlendorf closed this as completed May 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZFS/zpool hang on zfs destroying a ZVOL with a snapshot #1948

ZFS/zpool hang on zfs destroying a ZVOL with a snapshot #1948

niekbergboer commented Dec 10, 2013

niekbergboer commented Dec 10, 2013

niekbergboer commented Dec 10, 2013

behlendorf commented Dec 10, 2013

niekbergboer commented Dec 16, 2013

behlendorf commented Dec 16, 2013

ryao commented May 16, 2014

ryao commented May 16, 2014

ryao commented May 16, 2014

behlendorf commented May 15, 2016

ZFS/zpool hang on zfs destroying a ZVOL with a snapshot #1948

ZFS/zpool hang on zfs destroying a ZVOL with a snapshot #1948

Comments

niekbergboer commented Dec 10, 2013

niekbergboer commented Dec 10, 2013

niekbergboer commented Dec 10, 2013

behlendorf commented Dec 10, 2013

niekbergboer commented Dec 16, 2013

behlendorf commented Dec 16, 2013

ryao commented May 16, 2014

ryao commented May 16, 2014

ryao commented May 16, 2014

behlendorf commented May 15, 2016