0.6.5.1 still has problems with large rsync's and xattrs #3822

rgmiller · 2015-09-23T16:50:26Z

My CentOS 7 server running 0.6.5.1 is still having issues with large rsync jobs. The server itself doesn't lock-up (I can still log in and look around), but the rsync jobs hang and once that happens other I/O to the ZFS pools (such as a simple 'ls') also hangs. I'll paste the stack traces from 'dmesg' below.

This is possibly related to issue #3808. It's the same server running the same BackupPC jobs at least. The stack trace is different though, likely because I upgraded from 0.6.5 to 0.6.5.1.

rgmiller · 2015-09-23T16:51:51Z

dmesg output:

[19857.719842] perf interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[22898.322632] INFO: task spl_dynamic_tas:114 blocked for more than 120 seconds.
[22898.322713] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[22898.322790] spl_dynamic_tas D ffff88041e313680 0 114 2 0x00000000
[22898.322799] ffff8800368e7b70 0000000000000046 ffff880404586660 ffff8800368e7fd8
[22898.322806] ffff8800368e7fd8 ffff8800368e7fd8 ffff880404586660 ffff8800368e7cd8
[22898.322812] ffff8800368e7ce0 7fffffffffffffff ffff880404586660 ffff880404661b00
[22898.322819] Call Trace:
[22898.322834] [] schedule+0x29/0x70
[22898.322841] [] schedule_timeout+0x209/0x2d0
[22898.322852] [] ? check_preempt_curr+0x85/0xa0
[22898.322859] [] ? ttwu_do_wakeup+0x19/0xd0
[22898.322867] [] wait_for_completion+0x116/0x170
[22898.322872] [] ? wake_up_state+0x20/0x20
[22898.322919] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[22898.322926] [] kthread_create_on_node+0xa8/0x140
[22898.322936] [] ? string.isra.6+0x3b/0xf0
[22898.322954] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[22898.322977] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[22898.322995] [] spl_kthread_create+0x9c/0xf0 [spl]
[22898.323014] [] taskq_thread_create+0x6b/0x110 [spl]
[22898.323033] [] taskq_thread_spawn_task+0x12/0x30 [spl]
[22898.323051] [] taskq_thread+0x21e/0x420 [spl]
[22898.323056] [] ? wake_up_state+0x20/0x20
[22898.323074] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[22898.323079] [] kthread+0xcf/0xe0
[22898.323086] [] ? kthread_create_on_node+0x140/0x140
[22898.323094] [] ret_from_fork+0x58/0x90
[22898.323099] [] ? kthread_create_on_node+0x140/0x140
[23018.206899] INFO: task kthreadd:2 blocked for more than 120 seconds.
[23018.206971] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.207048] kthreadd D ffff88041e293680 0 2 0 0x00000000
[23018.207057] ffff88040806f550 0000000000000046 ffff880408058b60 ffff88040806ffd8
[23018.207064] ffff88040806ffd8 ffff88040806ffd8 ffff880408058b60 ffffffffa022d6e0
[23018.207070] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.207076] Call Trace:
[23018.207125] [] schedule+0x29/0x70
[23018.207149] [] cv_wait_common+0x125/0x150 [spl]
[23018.207158] [] ? wake_up_bit+0x30/0x30
[23018.207179] [] __cv_wait+0x15/0x20 [spl]
[23018.207224] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.207263] [] arc_buf_alloc+0x10f/0x190 [zfs]
[23018.207305] [] dbuf_fix_old_data.isra.6+0xc7/0x130 [zfs]
[23018.207346] [] dbuf_dirty+0x7c8/0x980 [zfs]
[23018.207388] [] dmu_buf_will_dirty+0x56/0xa0 [zfs]
[23018.207463] [] zap_get_leaf_byblk+0x132/0x2c0 [zfs]
[23018.207504] [] ? dmu_buf_rele+0xe/0x10 [zfs]
[23018.207573] [] ? zap_idx_to_blk+0x103/0x180 [zfs]
[23018.207639] [] zap_deref_leaf+0x7a/0xa0 [zfs]
[23018.207704] [] fzap_remove+0x3f/0xb0 [zfs]
[23018.207770] [] ? zap_name_alloc+0x73/0xd0 [zfs]
[23018.207849] [] zap_remove_norm+0x17b/0x1e0 [zfs]
[23018.207987] [] zap_remove+0x13/0x20 [zfs]
[23018.208069] [] zap_remove_int+0x54/0x80 [zfs]
[23018.208151] [] zfs_rmnode+0x224/0x350 [zfs]
[23018.208161] [] ? mutex_lock+0x12/0x2f
[23018.208239] [] zfs_zinactive+0x168/0x180 [zfs]
[23018.208320] [] zfs_inactive+0x67/0x240 [zfs]
[23018.208330] [] ? truncate_pagecache+0x59/0x60
[23018.208407] [] zpl_evict_inode+0x43/0x60 [zfs]
[23018.208426] [] evict+0xa7/0x170
[23018.208434] [] dispose_list+0x3e/0x50
[23018.208439] [] prune_icache_sb+0x163/0x320
[23018.208447] [] prune_super+0xd6/0x1a0
[23018.208456] [] shrink_slab+0x165/0x300
[23018.208464] [] ? vmpressure+0x21/0x90
[23018.208472] [] do_try_to_free_pages+0x3c2/0x4e0
[23018.208479] [] try_to_free_pages+0xfc/0x180
[23018.208489] [] __alloc_pages_nodemask+0x7fd/0xb90
[23018.208502] [] ? __account_cfs_rq_runtime+0x160/0x160
[23018.208512] [] copy_process.part.25+0x134/0x14d0
[23018.208519] [] ? sched_clock_cpu+0xb5/0x100
[23018.208525] [] ? kthread_create_on_node+0x140/0x140
[23018.208535] [] do_fork+0xbc/0x350
[23018.208542] [] kernel_thread+0x26/0x30
[23018.208548] [] kthreadd+0x2b2/0x2f0
[23018.208555] [] ? kthread_create_on_cpu+0x60/0x60
[23018.208564] [] ret_from_fork+0x58/0x90
[23018.208573] [] ? kthread_create_on_cpu+0x60/0x60
[23018.208583] INFO: task kswapd0:46 blocked for more than 120 seconds.
[23018.208660] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.208752] kswapd0 D ffff88041e293680 0 46 2 0x00000000
[23018.208759] ffff880404467690 0000000000000046 ffff880407d84fa0 ffff880404467fd8
[23018.208766] ffff880404467fd8 ffff880404467fd8 ffff880407d84fa0 ffffffffa022d6e0
[23018.208774] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.208781] Call Trace:
[23018.208805] [] schedule+0x29/0x70
[23018.208837] [] cv_wait_common+0x125/0x150 [spl]
[23018.208844] [] ? wake_up_bit+0x30/0x30
[23018.208867] [] __cv_wait+0x15/0x20 [spl]
[23018.208919] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.208972] [] arc_buf_alloc+0x10f/0x190 [zfs]
[23018.209026] [] arc_read+0x1c5/0xad0 [zfs]
[23018.209081] [] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
[23018.209137] [] dbuf_read+0x2ad/0x930 [zfs]
[23018.209200] [] dmu_buf_hold+0x50/0x80 [zfs]
[23018.209284] [] zap_get_leaf_byblk+0x5c/0x2c0 [zfs]
[23018.209339] [] ? dmu_buf_rele+0xe/0x10 [zfs]
[23018.209422] [] ? zap_idx_to_blk+0x103/0x180 [zfs]
[23018.209505] [] zap_deref_leaf+0x7a/0xa0 [zfs]
[23018.209588] [] fzap_remove+0x3f/0xb0 [zfs]
[23018.209671] [] ? zap_name_alloc+0x73/0xd0 [zfs]
[23018.209753] [] zap_remove_norm+0x17b/0x1e0 [zfs]
[23018.209834] [] zap_remove+0x13/0x20 [zfs]
[23018.209926] [] zap_remove_int+0x54/0x80 [zfs]
[23018.210006] [] zfs_rmnode+0x224/0x350 [zfs]
[23018.210016] [] ? mutex_lock+0x12/0x2f
[23018.210092] [] zfs_zinactive+0x168/0x180 [zfs]
[23018.210172] [] zfs_inactive+0x67/0x240 [zfs]
[23018.210181] [] ? truncate_pagecache+0x59/0x60
[23018.210257] [] zpl_evict_inode+0x43/0x60 [zfs]
[23018.210273] [] evict+0xa7/0x170
[23018.210281] [] dispose_list+0x3e/0x50
[23018.210287] [] prune_icache_sb+0x163/0x320
[23018.210293] [] prune_super+0xd6/0x1a0
[23018.210301] [] shrink_slab+0x165/0x300
[23018.210310] [] ? vmpressure+0x21/0x90
[23018.210317] [] balance_pgdat+0x4b1/0x5e0
[23018.210325] [] kswapd+0x173/0x450
[23018.210332] [] ? wake_up_bit+0x30/0x30
[23018.210339] [] ? balance_pgdat+0x5e0/0x5e0
[23018.210347] [] kthread+0xcf/0xe0
[23018.210353] [] ? kthread_create_on_node+0x140/0x140
[23018.210361] [] ret_from_fork+0x58/0x90
[23018.210367] [] ? kthread_create_on_node+0x140/0x140
[23018.210375] INFO: task spl_dynamic_tas:114 blocked for more than 120 seconds.
[23018.210460] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.210553] spl_dynamic_tas D ffff88041e313680 0 114 2 0x00000000
[23018.210560] ffff8800368e7b70 0000000000000046 ffff880404586660 ffff8800368e7fd8
[23018.210567] ffff8800368e7fd8 ffff8800368e7fd8 ffff880404586660 ffff8800368e7cd8
[23018.210573] ffff8800368e7ce0 7fffffffffffffff ffff880404586660 ffff880404661b00
[23018.210582] Call Trace:
[23018.210590] [] schedule+0x29/0x70
[23018.210597] [] schedule_timeout+0x209/0x2d0
[23018.210608] [] ? check_preempt_curr+0x85/0xa0
[23018.210615] [] ? ttwu_do_wakeup+0x19/0xd0
[23018.210625] [] wait_for_completion+0x116/0x170
[23018.210631] [] ? wake_up_state+0x20/0x20
[23018.210656] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[23018.210664] [] kthread_create_on_node+0xa8/0x140
[23018.210676] [] ? string.isra.6+0x3b/0xf0
[23018.210696] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[23018.210722] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[23018.210742] [] spl_kthread_create+0x9c/0xf0 [spl]
[23018.210765] [] taskq_thread_create+0x6b/0x110 [spl]
[23018.210785] [] taskq_thread_spawn_task+0x12/0x30 [spl]
[23018.210807] [] taskq_thread+0x21e/0x420 [spl]
[23018.210820] [] ? wake_up_state+0x20/0x20
[23018.210842] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[23018.210848] [] kthread+0xcf/0xe0
[23018.210854] [] ? kthread_create_on_node+0x140/0x140
[23018.210862] [] ret_from_fork+0x58/0x90
[23018.210870] [] ? kthread_create_on_node+0x140/0x140
[23018.210876] INFO: task arc_prune:130 blocked for more than 120 seconds.
[23018.210954] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.211046] arc_prune D ffff88041e393680 0 130 2 0x00000000
[23018.211052] ffff88040183b788 0000000000000046 ffff8804044c0b60 ffff88040183bfd8
[23018.211059] ffff88040183bfd8 ffff88040183bfd8 ffff8804044c0b60 ffffffffa022d6e0
[23018.211066] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.211074] Call Trace:
[23018.211097] [] schedule+0x29/0x70
[23018.211120] [] cv_wait_common+0x125/0x150 [spl]
[23018.211127] [] ? wake_up_bit+0x30/0x30
[23018.211149] [] __cv_wait+0x15/0x20 [spl]
[23018.211198] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.211253] [] arc_buf_alloc+0x10f/0x190 [zfs]
[23018.211306] [] arc_read+0x1c5/0xad0 [zfs]
[23018.211361] [] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
[23018.211417] [] dbuf_read+0x2ad/0x930 [zfs]
[23018.211480] [] dmu_buf_hold+0x50/0x80 [zfs]
[23018.211564] [] zap_get_leaf_byblk+0x5c/0x2c0 [zfs]
[23018.211619] [] ? dmu_buf_rele+0xe/0x10 [zfs]
[23018.211701] [] ? zap_idx_to_blk+0x103/0x180 [zfs]
[23018.211785] [] zap_deref_leaf+0x7a/0xa0 [zfs]
[23018.211868] [] fzap_remove+0x3f/0xb0 [zfs]
[23018.211956] [] ? zap_name_alloc+0x73/0xd0 [zfs]
[23018.212039] [] zap_remove_norm+0x17b/0x1e0 [zfs]
[23018.212122] [] zap_remove+0x13/0x20 [zfs]
[23018.212204] [] zap_remove_int+0x54/0x80 [zfs]
[23018.212284] [] zfs_rmnode+0x224/0x350 [zfs]
[23018.212294] [] ? mutex_lock+0x12/0x2f
[23018.212370] [] zfs_zinactive+0x168/0x180 [zfs]
[23018.212449] [] zfs_inactive+0x67/0x240 [zfs]
[23018.212458] [] ? truncate_pagecache+0x59/0x60
[23018.212534] [] zpl_evict_inode+0x43/0x60 [zfs]
[23018.212550] [] evict+0xa7/0x170
[23018.212558] [] dispose_list+0x3e/0x50
[23018.212563] [] prune_icache_sb+0x163/0x320
[23018.212570] [] prune_super+0xd6/0x1a0
[23018.212648] [] zfs_sb_prune+0xa2/0xc0 [zfs]
[23018.212725] [] zpl_prune_sb+0x35/0x50 [zfs]
[23018.212778] [] arc_prune_task+0x1f/0x50 [zfs]
[23018.212801] [] taskq_thread+0x21e/0x420 [spl]
[23018.212815] [] ? wake_up_state+0x20/0x20
[23018.212836] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[23018.212842] [] kthread+0xcf/0xe0
[23018.212849] [] ? kthread_create_on_node+0x140/0x140
[23018.212858] [] ret_from_fork+0x58/0x90
[23018.212865] [] ? kthread_create_on_node+0x140/0x140
[23018.212871] INFO: task arc_reclaim:131 blocked for more than 120 seconds.
[23018.212952] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.213045] arc_reclaim D ffff88041e213680 0 131 2 0x00000000
[23018.213051] ffff88040183fdb8 0000000000000046 ffff8804044c0000 ffff88040183ffd8
[23018.213060] ffff88040183ffd8 ffff88040183ffd8 ffff8804044c0000 ffff8804042cdf00
[23018.213066] 0000000000000000 0000000000000cb5 ffff8804042cdfc8 0000000000000cb7
[23018.213073] Call Trace:
[23018.213081] [] schedule+0x29/0x70
[23018.213102] [] taskq_wait_outstanding+0x9d/0x110 [spl]
[23018.213111] [] ? wake_up_bit+0x30/0x30
[23018.213160] [] arc_kmem_reap_now+0xf3/0x100 [zfs]
[23018.213213] [] arc_reclaim_thread+0x1f4/0x230 [zfs]
[23018.213266] [] ? arc_shrink+0xc0/0xc0 [zfs]
[23018.213289] [] thread_generic_wrapper+0x71/0x80 [spl]
[23018.213309] [] ? __thread_exit+0x20/0x20 [spl]
[23018.213317] [] kthread+0xcf/0xe0
[23018.213324] [] ? kthread_create_on_node+0x140/0x140
[23018.213332] [] ret_from_fork+0x58/0x90
[23018.213338] [] ? kthread_create_on_node+0x140/0x140
[23018.213365] INFO: task txg_sync:1074 blocked for more than 120 seconds.
[23018.213444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.213536] txg_sync D ffff88041e393680 0 1074 2 0x00000000
[23018.213542] ffff8803fca37548 0000000000000046 ffff88040477f1c0 ffff8803fca37fd8
[23018.213551] ffff8803fca37fd8 ffff8803fca37fd8 ffff88040477f1c0 ffffffffa022d6e0
[23018.213557] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.213564] Call Trace:
[23018.213587] [] schedule+0x29/0x70
[23018.213609] [] cv_wait_common+0x125/0x150 [spl]
[23018.213618] [] ? wake_up_bit+0x30/0x30
[23018.213643] [] __cv_wait+0x15/0x20 [spl]
[23018.213694] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.213747] [] arc_buf_alloc+0x10f/0x190 [zfs]
[23018.213800] [] arc_read+0x1c5/0xad0 [zfs]
[23018.213860] [] ? dbuf_read+0x69f/0x930 [zfs]
[23018.213915] [] ? __dbuf_hold_impl+0x240/0x500 [zfs]
[23018.213971] [] dbuf_prefetch+0x1de/0x2e0 [zfs]
[23018.213982] [] ? dequeue_task_fair+0x40e/0x620
[23018.214043] [] dmu_zfetch_dofetch.isra.6+0x114/0x180 [zfs]
[23018.214107] [] dmu_zfetch+0x4ef/0xe00 [zfs]
[23018.214163] [] dbuf_read+0x7bc/0x930 [zfs]
[23018.214232] [] dnode_hold_impl+0xca/0x600 [zfs]
[23018.214296] [] dnode_hold+0x19/0x20 [zfs]
[23018.214352] [] dmu_buf_try_add_ref+0xb0/0x1c0 [zfs]
[23018.214413] [] dmu_objset_userquota_get_ids+0x9d/0x530 [zfs]
[23018.214478] [] dnode_sync+0xef/0x9e0 [zfs]
[23018.214534] [] ? dbuf_sync_list+0x9d/0xd0 [zfs]
[23018.214599] [] ? dnode_sync+0x305/0x9e0 [zfs]
[23018.214609] [] ? mutex_lock+0x12/0x2f
[23018.214668] [] dmu_objset_sync_dnodes+0x9c/0xc0 [zfs]
[23018.214729] [] dmu_objset_sync+0x1d0/0x320 [zfs]
[23018.214789] [] ? recordsize_changed_cb+0x20/0x20 [zfs]
[23018.214857] [] ? dmu_objset_sync+0x320/0x320 [zfs]
[23018.214925] [] dsl_dataset_sync+0x4e/0x90 [zfs]
[23018.214994] [] dsl_pool_sync+0xa3/0x420 [zfs]
[23018.215075] [] spa_sync+0x388/0xb70 [zfs]
[23018.215085] [] ? autoremove_wake_function+0x2b/0x40
[23018.215163] [] txg_sync_thread+0x3cc/0x640 [zfs]
[23018.215246] [] ? txg_fini+0x2a0/0x2a0 [zfs]
[23018.215270] [] thread_generic_wrapper+0x71/0x80 [spl]
[23018.215290] [] ? __thread_exit+0x20/0x20 [spl]
[23018.215299] [] kthread+0xcf/0xe0
[23018.215305] [] ? kthread_create_on_node+0x140/0x140
[23018.215314] [] ret_from_fork+0x58/0x90
[23018.215320] [] ? kthread_create_on_node+0x140/0x140
[23018.215347] INFO: task slimserver.pl:3674 blocked for more than 120 seconds.
[23018.215433] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.215525] slimserver.pl D ffff88041e293680 0 3674 1 0x00000080
[23018.215531] ffff8803d8943530 0000000000000082 ffff8800c55538e0 ffff8803d8943fd8
[23018.215538] ffff8803d8943fd8 ffff8803d8943fd8 ffff8800c55538e0 ffffffffa022d6e0
[23018.215545] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.215551] Call Trace:
[23018.215577] [] schedule+0x29/0x70
[23018.215599] [] cv_wait_common+0x125/0x150 [spl]
[23018.215608] [] ? wake_up_bit+0x30/0x30
[23018.215629] [] __cv_wait+0x15/0x20 [spl]
[23018.215679] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.215733] [] arc_read+0x335/0xad0 [zfs]
[23018.215788] [] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
[23018.215852] [] dbuf_read+0x2ad/0x930 [zfs]
[23018.215892] [] __dbuf_hold_impl+0x478/0x500 [zfs]
[23018.215931] [] dbuf_hold_impl+0x73/0xa0 [zfs]
[23018.215968] [] dbuf_hold+0x2f/0x60 [zfs]
[23018.216011] [] dmu_buf_hold_noread+0x87/0x140 [zfs]
[23018.216053] [] dmu_buf_hold+0x2f/0x80 [zfs]
[23018.216119] [] zap_lockdir+0x58/0x900 [zfs]
[23018.216127] [] ? __remove_hrtimer+0x46/0xa0
[23018.216133] [] ? hrtimer_try_to_cancel+0x57/0x100
[23018.216195] [] zap_lookup_norm+0x5b/0x1d0 [zfs]
[23018.216255] [] zap_lookup+0x33/0x40 [zfs]
[23018.216317] [] zfs_dirent_lock+0x4d3/0x590 [zfs]
[23018.216377] [] zfs_dirlook+0x90/0x330 [zfs]
[23018.216385] [] ? kmem_cache_alloc+0x1ba/0x1d0
[23018.216445] [] zfs_lookup+0x2f1/0x340 [zfs]
[23018.216506] [] zpl_lookup+0xae/0x130 [zfs]
[23018.216516] [] lookup_real+0x1d/0x50
[23018.216522] [] __lookup_hash+0x42/0x60
[23018.216529] [] lookup_slow+0x42/0xa7
[23018.216534] [] link_path_walk+0x80f/0x8b0
[23018.216542] [] ? type_attribute_bounds_av.isra.10+0x66/0x2e0
[23018.216551] [] ? _raw_spin_unlock_bh+0x1b/0x40
[23018.216556] [] path_lookupat+0x6b/0x7a0
[23018.216562] [] ? context_struct_compute_av+0x34d/0x470
[23018.216567] [] ? kmem_cache_alloc+0x35/0x1d0
[23018.216571] [] ? getname_flags+0x4f/0x1a0
[23018.216578] [] filename_lookup+0x2b/0xc0
[23018.216583] [] user_path_at_empty+0x67/0xc0
[23018.216591] [] ? __posix_lock_file+0x226/0x560
[23018.216596] [] user_path_at+0x11/0x20
[23018.216602] [] vfs_fstatat+0x63/0xc0
[23018.216609] [] SYSC_newstat+0x2e/0x60
[23018.216616] [] ? __audit_syscall_exit+0x1e6/0x280
[23018.216621] [] SyS_newstat+0xe/0x10
[23018.216628] [

rgmiller · 2015-09-23T17:09:14Z

Just realized not all of my stack traces were appearing in the comment above. Here's the rest of the output:

[23018.216637] INFO: task BackupPC_nightl:11059 blocked for more than 120 seconds.
[23018.216700] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.216768] BackupPC_nightl D ffff88041e293680 0 11059 3608 0x00000080
[23018.216773] ffff880194ed7598 0000000000000082 ffff880404b40b60 ffff880194ed7fd8
[23018.216781] ffff880194ed7fd8 ffff880194ed7fd8 ffff880404b40b60 ffffffffa022d6e0
[23018.216786] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.216792] Call Trace:
[23018.216820] [] schedule+0x29/0x70
[23018.216839] [] cv_wait_common+0x125/0x150 [spl]
[23018.216845] [] ? wake_up_bit+0x30/0x30
[23018.216862] [] __cv_wait+0x15/0x20 [spl]
[23018.216894] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.216928] [] arc_buf_alloc+0x10f/0x190 [zfs]
[23018.216960] [] arc_read+0x1c5/0xad0 [zfs]
[23018.216996] [] ? dbuf_create+0x36c/0x4e0 [zfs]
[23018.217013] [] ? spl_kmem_cache_alloc+0xa1/0x160 [spl]
[23018.217019] [] ? mutex_lock+0x12/0x2f
[23018.217054] [] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
[23018.217089] [] dbuf_read+0x2ad/0x930 [zfs]
[23018.217129] [] dmu_buf_hold_array_by_dnode+0x123/0x4d0 [zfs]
[23018.217172] [] dmu_read_uio_dnode+0x52/0x100 [zfs]
[23018.217212] [] dmu_read_uio_dbuf+0x47/0x60 [zfs]
[23018.217273] [] zfs_read+0x13d/0x400 [zfs]
[23018.217333] [] zpl_read_common_iovec.constprop.10+0x80/0xd0 [zfs]
[23018.217391] [] zpl_read_common+0x40/0x60 [zfs]
[23018.217446] [] __zpl_xattr_get+0x225/0x2a0 [zfs]
[23018.217500] [] zpl_xattr_get+0x75/0x120 [zfs]
[23018.217553] [] zpl_xattr_security_get+0x50/0x70 [zfs]
[23018.217561] [] generic_getxattr+0x4c/0x70
[23018.217568] [] inode_doinit_with_dentry+0x16f/0x680
[23018.217625] [] ? rrw_exit+0x68/0x170 [zfs]
[23018.217632] [] selinux_d_instantiate+0x1c/0x20
[23018.217639] [] security_d_instantiate+0x1b/0x30
[23018.217645] [] d_instantiate+0x50/0x70
[23018.217651] [] d_splice_alias+0x44/0x120
[23018.217705] [] zpl_lookup+0x11c/0x130 [zfs]
[23018.217712] [] lookup_real+0x1d/0x50
[23018.217718] [] __lookup_hash+0x42/0x60
[23018.217724] [] lookup_slow+0x42/0xa7
[23018.217729] [] path_lookupat+0x773/0x7a0
[23018.217735] [] ? kmem_cache_alloc+0x35/0x1d0
[23018.217739] [] ? getname_flags+0x4f/0x1a0
[23018.217744] [] filename_lookup+0x2b/0xc0
[23018.217748] [] user_path_at_empty+0x67/0xc0
[23018.217756] [] ? from_kgid_munged+0x12/0x20
[23018.217761] [] ? cp_new_stat+0x14f/0x180
[23018.217766] [] user_path_at+0x11/0x20
[23018.217771] [] vfs_fstatat+0x63/0xc0
[23018.217777] [] SYSC_newstat+0x2e/0x60
[23018.217783] [] ? __audit_syscall_exit+0x1e6/0x280
[23018.217788] [] SyS_newstat+0xe/0x10
[23018.217795] [] system_call_fastpath+0x16/0x1b
[23018.217809] INFO: task BackupPC_nightl:11060 blocked for more than 120 seconds.
[23018.217873] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.217939] BackupPC_nightl D ffff88041e393680 0 11060 3608 0x00000080
[23018.217944] ffff8802f7d3f4d8 0000000000000086 ffff880404b44fa0 ffff8802f7d3ffd8
[23018.217950] ffff8802f7d3ffd8 ffff8802f7d3ffd8 ffff880404b44fa0 ffffffffa022d6e0
[23018.217955] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.217960] Call Trace:
[23018.217981] [] schedule+0x29/0x70
[23018.217999] [] cv_wait_common+0x125/0x150 [spl]
[23018.218005] [] ? wake_up_bit+0x30/0x30
[23018.218022] [] __cv_wait+0x15/0x20 [spl]
[23018.218054] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.218087] [] arc_read+0x335/0xad0 [zfs]
[23018.218123] [] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
[23018.218162] [] dbuf_read+0x2ad/0x930 [zfs]
[23018.218206] [] dmu_buf_hold+0x50/0x80 [zfs]
[23018.218269] [] zap_lockdir+0x58/0x900 [zfs]
[23018.218335] [] ? sa_lookup+0x9c/0xc0 [zfs]
[23018.218408] [] zap_lookup_norm+0x5b/0x1d0 [zfs]
[23018.218481] [] zap_lookup+0x33/0x40 [zfs]
[23018.218552] [] zfs_dirent_lock+0x4d3/0x590 [zfs]
[23018.218624] [] zfs_dirlook+0x90/0x330 [zfs]
[23018.218694] [] zfs_lookup+0x2f1/0x340 [zfs]
[23018.218763] [] __zpl_xattr_get+0x108/0x2a0 [zfs]
[23018.218832] [] zpl_xattr_get+0x75/0x120 [zfs]
[23018.218908] [] zpl_xattr_security_get+0x50/0x70 [zfs]
[23018.218918] [] generic_getxattr+0x4c/0x70
[23018.218925] [] inode_doinit_with_dentry+0x16f/0x680
[23018.218992] [] ? rrw_exit+0x68/0x170 [zfs]
[23018.219002] [] selinux_d_instantiate+0x1c/0x20
[23018.219008] [] security_d_instantiate+0x1b/0x30
[23018.219015] [] d_instantiate+0x50/0x70
[23018.219021] [] d_splice_alias+0x44/0x120
[23018.219089] [] zpl_lookup+0x11c/0x130 [zfs]
[23018.219098] [] lookup_real+0x1d/0x50
[23018.219105] [] __lookup_hash+0x42/0x60
[23018.219112] [] lookup_slow+0x42/0xa7
[23018.219117] [] path_lookupat+0x773/0x7a0
[23018.219124] [] ? ttwu_do_wakeup+0x19/0xd0
[23018.219132] [] ? kmem_cache_alloc+0x35/0x1d0
[23018.219137] [] ? getname_flags+0x4f/0x1a0
[23018.219142] [] filename_lookup+0x2b/0xc0
[23018.219147] [] user_path_at_empty+0x67/0xc0
[23018.219155] [] ? from_kgid_munged+0x12/0x20
[23018.219163] [] ? cp_new_stat+0x14f/0x180
[23018.219168] [] user_path_at+0x11/0x20
[23018.219174] [] vfs_fstatat+0x63/0xc0
[23018.219179] [] SYSC_newstat+0x2e/0x60
[23018.219186] [] ? __audit_syscall_exit+0x1e6/0x280
[23018.219194] [] SyS_newstat+0xe/0x10
[23018.219201] [] system_call_fastpath+0x16/0x1b

dweeezil · 2015-09-23T17:23:47Z

Yes, it's the same problem as #3808.

behlendorf · 2015-09-23T18:10:22Z

Closing as duplicate of #3808 to avoid confusion.

As described in the comment above arc_reclaim_thread() it's critical that the reclaim thread be careful about blocking. Just like it must never wait on a hash lock, it must never wait on a task which can in turn wait on the CV in arc_get_data_buf(). This will deadlock, see issue openzfs#3822 for full backtraces showing the problem. To resolve this issue arc_kmem_reap_now() has been updated to use the asynchronous arc prune function. This means that arc_prune_async() may now be called while there are still outstanding arc_prune_tasks. However, this isn't a problem because arc_prune_async() already keeps a reference count preventing multiple outstanding tasks per registered consumer. Functionally, this behavior is the same as the counterpart illumos function dnlc_reduce_cache(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#3808 Issue openzfs#3822

As described in the comment above arc_reclaim_thread() it's critical that the reclaim thread be careful about blocking. Just like it must never wait on a hash lock, it must never wait on a task which can in turn wait on the CV in arc_get_data_buf(). This will deadlock, see issue #3822 for full backtraces showing the problem. To resolve this issue arc_kmem_reap_now() has been updated to use the asynchronous arc prune function. This means that arc_prune_async() may now be called while there are still outstanding arc_prune_tasks. However, this isn't a problem because arc_prune_async() already keeps a reference count preventing multiple outstanding tasks per registered consumer. Functionally, this behavior is the same as the counterpart illumos function dnlc_reduce_cache(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Issue #3808 Issue #3834 Issue #3822

rgmiller mentioned this issue Sep 23, 2015

Deadlock due to filesystem re-entry during zfs_evict_inode() in 0.6.5 #3808

Closed

rgmiller changed the title ~~0.6.5.1 still have problems with large rsync's and xattrs~~ 0.6.5.1 still has problems with large rsync's and xattrs Sep 23, 2015

behlendorf closed this as completed Sep 23, 2015

behlendorf mentioned this issue Sep 23, 2015

Avoid blocking in arc_reclaim_thread() #3826

Closed

behlendorf added this to the 0.6.5.2 milestone Sep 25, 2015

behlendorf added the Bug - Major label Sep 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.6.5.1 still has problems with large rsync's and xattrs #3822

0.6.5.1 still has problems with large rsync's and xattrs #3822

rgmiller commented Sep 23, 2015

rgmiller commented Sep 23, 2015

rgmiller commented Sep 23, 2015

dweeezil commented Sep 23, 2015

behlendorf commented Sep 23, 2015

0.6.5.1 still has problems with large rsync's and xattrs #3822

0.6.5.1 still has problems with large rsync's and xattrs #3822

Comments

rgmiller commented Sep 23, 2015

rgmiller commented Sep 23, 2015

rgmiller commented Sep 23, 2015

dweeezil commented Sep 23, 2015

behlendorf commented Sep 23, 2015