Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.6.5.1 still has problems with large rsync's and xattrs #3822

Closed
rgmiller opened this issue Sep 23, 2015 · 4 comments
Closed

0.6.5.1 still has problems with large rsync's and xattrs #3822

rgmiller opened this issue Sep 23, 2015 · 4 comments
Milestone

Comments

@rgmiller
Copy link

My CentOS 7 server running 0.6.5.1 is still having issues with large rsync jobs. The server itself doesn't lock-up (I can still log in and look around), but the rsync jobs hang and once that happens other I/O to the ZFS pools (such as a simple 'ls') also hangs. I'll paste the stack traces from 'dmesg' below.

This is possibly related to issue #3808. It's the same server running the same BackupPC jobs at least. The stack trace is different though, likely because I upgraded from 0.6.5 to 0.6.5.1.

@rgmiller
Copy link
Author

dmesg output:

[19857.719842] perf interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[22898.322632] INFO: task spl_dynamic_tas:114 blocked for more than 120 seconds.
[22898.322713] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[22898.322790] spl_dynamic_tas D ffff88041e313680 0 114 2 0x00000000
[22898.322799] ffff8800368e7b70 0000000000000046 ffff880404586660 ffff8800368e7fd8
[22898.322806] ffff8800368e7fd8 ffff8800368e7fd8 ffff880404586660 ffff8800368e7cd8
[22898.322812] ffff8800368e7ce0 7fffffffffffffff ffff880404586660 ffff880404661b00
[22898.322819] Call Trace:
[22898.322834] [] schedule+0x29/0x70
[22898.322841] [] schedule_timeout+0x209/0x2d0
[22898.322852] [] ? check_preempt_curr+0x85/0xa0
[22898.322859] [] ? ttwu_do_wakeup+0x19/0xd0
[22898.322867] [] wait_for_completion+0x116/0x170
[22898.322872] [] ? wake_up_state+0x20/0x20
[22898.322919] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[22898.322926] [] kthread_create_on_node+0xa8/0x140
[22898.322936] [] ? string.isra.6+0x3b/0xf0
[22898.322954] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[22898.322977] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[22898.322995] [] spl_kthread_create+0x9c/0xf0 [spl]
[22898.323014] [] taskq_thread_create+0x6b/0x110 [spl]
[22898.323033] [] taskq_thread_spawn_task+0x12/0x30 [spl]
[22898.323051] [] taskq_thread+0x21e/0x420 [spl]
[22898.323056] [] ? wake_up_state+0x20/0x20
[22898.323074] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[22898.323079] [] kthread+0xcf/0xe0
[22898.323086] [] ? kthread_create_on_node+0x140/0x140
[22898.323094] [] ret_from_fork+0x58/0x90
[22898.323099] [] ? kthread_create_on_node+0x140/0x140
[23018.206899] INFO: task kthreadd:2 blocked for more than 120 seconds.
[23018.206971] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.207048] kthreadd D ffff88041e293680 0 2 0 0x00000000
[23018.207057] ffff88040806f550 0000000000000046 ffff880408058b60 ffff88040806ffd8
[23018.207064] ffff88040806ffd8 ffff88040806ffd8 ffff880408058b60 ffffffffa022d6e0
[23018.207070] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.207076] Call Trace:
[23018.207125] [] schedule+0x29/0x70
[23018.207149] [] cv_wait_common+0x125/0x150 [spl]
[23018.207158] [] ? wake_up_bit+0x30/0x30
[23018.207179] [] __cv_wait+0x15/0x20 [spl]
[23018.207224] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.207263] [] arc_buf_alloc+0x10f/0x190 [zfs]
[23018.207305] [] dbuf_fix_old_data.isra.6+0xc7/0x130 [zfs]
[23018.207346] [] dbuf_dirty+0x7c8/0x980 [zfs]
[23018.207388] [] dmu_buf_will_dirty+0x56/0xa0 [zfs]
[23018.207463] [] zap_get_leaf_byblk+0x132/0x2c0 [zfs]
[23018.207504] [] ? dmu_buf_rele+0xe/0x10 [zfs]
[23018.207573] [] ? zap_idx_to_blk+0x103/0x180 [zfs]
[23018.207639] [] zap_deref_leaf+0x7a/0xa0 [zfs]
[23018.207704] [] fzap_remove+0x3f/0xb0 [zfs]
[23018.207770] [] ? zap_name_alloc+0x73/0xd0 [zfs]
[23018.207849] [] zap_remove_norm+0x17b/0x1e0 [zfs]
[23018.207987] [] zap_remove+0x13/0x20 [zfs]
[23018.208069] [] zap_remove_int+0x54/0x80 [zfs]
[23018.208151] [] zfs_rmnode+0x224/0x350 [zfs]
[23018.208161] [] ? mutex_lock+0x12/0x2f
[23018.208239] [] zfs_zinactive+0x168/0x180 [zfs]
[23018.208320] [] zfs_inactive+0x67/0x240 [zfs]
[23018.208330] [] ? truncate_pagecache+0x59/0x60
[23018.208407] [] zpl_evict_inode+0x43/0x60 [zfs]
[23018.208426] [] evict+0xa7/0x170
[23018.208434] [] dispose_list+0x3e/0x50
[23018.208439] [] prune_icache_sb+0x163/0x320
[23018.208447] [] prune_super+0xd6/0x1a0
[23018.208456] [] shrink_slab+0x165/0x300
[23018.208464] [] ? vmpressure+0x21/0x90
[23018.208472] [] do_try_to_free_pages+0x3c2/0x4e0
[23018.208479] [] try_to_free_pages+0xfc/0x180
[23018.208489] [] __alloc_pages_nodemask+0x7fd/0xb90
[23018.208502] [] ? __account_cfs_rq_runtime+0x160/0x160
[23018.208512] [] copy_process.part.25+0x134/0x14d0
[23018.208519] [] ? sched_clock_cpu+0xb5/0x100
[23018.208525] [] ? kthread_create_on_node+0x140/0x140
[23018.208535] [] do_fork+0xbc/0x350
[23018.208542] [] kernel_thread+0x26/0x30
[23018.208548] [] kthreadd+0x2b2/0x2f0
[23018.208555] [] ? kthread_create_on_cpu+0x60/0x60
[23018.208564] [] ret_from_fork+0x58/0x90
[23018.208573] [] ? kthread_create_on_cpu+0x60/0x60
[23018.208583] INFO: task kswapd0:46 blocked for more than 120 seconds.
[23018.208660] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.208752] kswapd0 D ffff88041e293680 0 46 2 0x00000000
[23018.208759] ffff880404467690 0000000000000046 ffff880407d84fa0 ffff880404467fd8
[23018.208766] ffff880404467fd8 ffff880404467fd8 ffff880407d84fa0 ffffffffa022d6e0
[23018.208774] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.208781] Call Trace:
[23018.208805] [] schedule+0x29/0x70
[23018.208837] [] cv_wait_common+0x125/0x150 [spl]
[23018.208844] [] ? wake_up_bit+0x30/0x30
[23018.208867] [] __cv_wait+0x15/0x20 [spl]
[23018.208919] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.208972] [] arc_buf_alloc+0x10f/0x190 [zfs]
[23018.209026] [] arc_read+0x1c5/0xad0 [zfs]
[23018.209081] [] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
[23018.209137] [] dbuf_read+0x2ad/0x930 [zfs]
[23018.209200] [] dmu_buf_hold+0x50/0x80 [zfs]
[23018.209284] [] zap_get_leaf_byblk+0x5c/0x2c0 [zfs]
[23018.209339] [] ? dmu_buf_rele+0xe/0x10 [zfs]
[23018.209422] [] ? zap_idx_to_blk+0x103/0x180 [zfs]
[23018.209505] [] zap_deref_leaf+0x7a/0xa0 [zfs]
[23018.209588] [] fzap_remove+0x3f/0xb0 [zfs]
[23018.209671] [] ? zap_name_alloc+0x73/0xd0 [zfs]
[23018.209753] [] zap_remove_norm+0x17b/0x1e0 [zfs]
[23018.209834] [] zap_remove+0x13/0x20 [zfs]
[23018.209926] [] zap_remove_int+0x54/0x80 [zfs]
[23018.210006] [] zfs_rmnode+0x224/0x350 [zfs]
[23018.210016] [] ? mutex_lock+0x12/0x2f
[23018.210092] [] zfs_zinactive+0x168/0x180 [zfs]
[23018.210172] [] zfs_inactive+0x67/0x240 [zfs]
[23018.210181] [] ? truncate_pagecache+0x59/0x60
[23018.210257] [] zpl_evict_inode+0x43/0x60 [zfs]
[23018.210273] [] evict+0xa7/0x170
[23018.210281] [] dispose_list+0x3e/0x50
[23018.210287] [] prune_icache_sb+0x163/0x320
[23018.210293] [] prune_super+0xd6/0x1a0
[23018.210301] [] shrink_slab+0x165/0x300
[23018.210310] [] ? vmpressure+0x21/0x90
[23018.210317] [] balance_pgdat+0x4b1/0x5e0
[23018.210325] [] kswapd+0x173/0x450
[23018.210332] [] ? wake_up_bit+0x30/0x30
[23018.210339] [] ? balance_pgdat+0x5e0/0x5e0
[23018.210347] [] kthread+0xcf/0xe0
[23018.210353] [] ? kthread_create_on_node+0x140/0x140
[23018.210361] [] ret_from_fork+0x58/0x90
[23018.210367] [] ? kthread_create_on_node+0x140/0x140
[23018.210375] INFO: task spl_dynamic_tas:114 blocked for more than 120 seconds.
[23018.210460] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.210553] spl_dynamic_tas D ffff88041e313680 0 114 2 0x00000000
[23018.210560] ffff8800368e7b70 0000000000000046 ffff880404586660 ffff8800368e7fd8
[23018.210567] ffff8800368e7fd8 ffff8800368e7fd8 ffff880404586660 ffff8800368e7cd8
[23018.210573] ffff8800368e7ce0 7fffffffffffffff ffff880404586660 ffff880404661b00
[23018.210582] Call Trace:
[23018.210590] [] schedule+0x29/0x70
[23018.210597] [] schedule_timeout+0x209/0x2d0
[23018.210608] [] ? check_preempt_curr+0x85/0xa0
[23018.210615] [] ? ttwu_do_wakeup+0x19/0xd0
[23018.210625] [] wait_for_completion+0x116/0x170
[23018.210631] [] ? wake_up_state+0x20/0x20
[23018.210656] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[23018.210664] [] kthread_create_on_node+0xa8/0x140
[23018.210676] [] ? string.isra.6+0x3b/0xf0
[23018.210696] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[23018.210722] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[23018.210742] [] spl_kthread_create+0x9c/0xf0 [spl]
[23018.210765] [] taskq_thread_create+0x6b/0x110 [spl]
[23018.210785] [] taskq_thread_spawn_task+0x12/0x30 [spl]
[23018.210807] [] taskq_thread+0x21e/0x420 [spl]
[23018.210820] [] ? wake_up_state+0x20/0x20
[23018.210842] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[23018.210848] [] kthread+0xcf/0xe0
[23018.210854] [] ? kthread_create_on_node+0x140/0x140
[23018.210862] [] ret_from_fork+0x58/0x90
[23018.210870] [] ? kthread_create_on_node+0x140/0x140
[23018.210876] INFO: task arc_prune:130 blocked for more than 120 seconds.
[23018.210954] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.211046] arc_prune D ffff88041e393680 0 130 2 0x00000000
[23018.211052] ffff88040183b788 0000000000000046 ffff8804044c0b60 ffff88040183bfd8
[23018.211059] ffff88040183bfd8 ffff88040183bfd8 ffff8804044c0b60 ffffffffa022d6e0
[23018.211066] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.211074] Call Trace:
[23018.211097] [] schedule+0x29/0x70
[23018.211120] [] cv_wait_common+0x125/0x150 [spl]
[23018.211127] [] ? wake_up_bit+0x30/0x30
[23018.211149] [] __cv_wait+0x15/0x20 [spl]
[23018.211198] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.211253] [] arc_buf_alloc+0x10f/0x190 [zfs]
[23018.211306] [] arc_read+0x1c5/0xad0 [zfs]
[23018.211361] [] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
[23018.211417] [] dbuf_read+0x2ad/0x930 [zfs]
[23018.211480] [] dmu_buf_hold+0x50/0x80 [zfs]
[23018.211564] [] zap_get_leaf_byblk+0x5c/0x2c0 [zfs]
[23018.211619] [] ? dmu_buf_rele+0xe/0x10 [zfs]
[23018.211701] [] ? zap_idx_to_blk+0x103/0x180 [zfs]
[23018.211785] [] zap_deref_leaf+0x7a/0xa0 [zfs]
[23018.211868] [] fzap_remove+0x3f/0xb0 [zfs]
[23018.211956] [] ? zap_name_alloc+0x73/0xd0 [zfs]
[23018.212039] [] zap_remove_norm+0x17b/0x1e0 [zfs]
[23018.212122] [] zap_remove+0x13/0x20 [zfs]
[23018.212204] [] zap_remove_int+0x54/0x80 [zfs]
[23018.212284] [] zfs_rmnode+0x224/0x350 [zfs]
[23018.212294] [] ? mutex_lock+0x12/0x2f
[23018.212370] [] zfs_zinactive+0x168/0x180 [zfs]
[23018.212449] [] zfs_inactive+0x67/0x240 [zfs]
[23018.212458] [] ? truncate_pagecache+0x59/0x60
[23018.212534] [] zpl_evict_inode+0x43/0x60 [zfs]
[23018.212550] [] evict+0xa7/0x170
[23018.212558] [] dispose_list+0x3e/0x50
[23018.212563] [] prune_icache_sb+0x163/0x320
[23018.212570] [] prune_super+0xd6/0x1a0
[23018.212648] [] zfs_sb_prune+0xa2/0xc0 [zfs]
[23018.212725] [] zpl_prune_sb+0x35/0x50 [zfs]
[23018.212778] [] arc_prune_task+0x1f/0x50 [zfs]
[23018.212801] [] taskq_thread+0x21e/0x420 [spl]
[23018.212815] [] ? wake_up_state+0x20/0x20
[23018.212836] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[23018.212842] [] kthread+0xcf/0xe0
[23018.212849] [] ? kthread_create_on_node+0x140/0x140
[23018.212858] [] ret_from_fork+0x58/0x90
[23018.212865] [] ? kthread_create_on_node+0x140/0x140
[23018.212871] INFO: task arc_reclaim:131 blocked for more than 120 seconds.
[23018.212952] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.213045] arc_reclaim D ffff88041e213680 0 131 2 0x00000000
[23018.213051] ffff88040183fdb8 0000000000000046 ffff8804044c0000 ffff88040183ffd8
[23018.213060] ffff88040183ffd8 ffff88040183ffd8 ffff8804044c0000 ffff8804042cdf00
[23018.213066] 0000000000000000 0000000000000cb5 ffff8804042cdfc8 0000000000000cb7
[23018.213073] Call Trace:
[23018.213081] [] schedule+0x29/0x70
[23018.213102] [] taskq_wait_outstanding+0x9d/0x110 [spl]
[23018.213111] [] ? wake_up_bit+0x30/0x30
[23018.213160] [] arc_kmem_reap_now+0xf3/0x100 [zfs]
[23018.213213] [] arc_reclaim_thread+0x1f4/0x230 [zfs]
[23018.213266] [] ? arc_shrink+0xc0/0xc0 [zfs]
[23018.213289] [] thread_generic_wrapper+0x71/0x80 [spl]
[23018.213309] [] ? __thread_exit+0x20/0x20 [spl]
[23018.213317] [] kthread+0xcf/0xe0
[23018.213324] [] ? kthread_create_on_node+0x140/0x140
[23018.213332] [] ret_from_fork+0x58/0x90
[23018.213338] [] ? kthread_create_on_node+0x140/0x140
[23018.213365] INFO: task txg_sync:1074 blocked for more than 120 seconds.
[23018.213444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.213536] txg_sync D ffff88041e393680 0 1074 2 0x00000000
[23018.213542] ffff8803fca37548 0000000000000046 ffff88040477f1c0 ffff8803fca37fd8
[23018.213551] ffff8803fca37fd8 ffff8803fca37fd8 ffff88040477f1c0 ffffffffa022d6e0
[23018.213557] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.213564] Call Trace:
[23018.213587] [] schedule+0x29/0x70
[23018.213609] [] cv_wait_common+0x125/0x150 [spl]
[23018.213618] [] ? wake_up_bit+0x30/0x30
[23018.213643] [] __cv_wait+0x15/0x20 [spl]
[23018.213694] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.213747] [] arc_buf_alloc+0x10f/0x190 [zfs]
[23018.213800] [] arc_read+0x1c5/0xad0 [zfs]
[23018.213860] [] ? dbuf_read+0x69f/0x930 [zfs]
[23018.213915] [] ? __dbuf_hold_impl+0x240/0x500 [zfs]
[23018.213971] [] dbuf_prefetch+0x1de/0x2e0 [zfs]
[23018.213982] [] ? dequeue_task_fair+0x40e/0x620
[23018.214043] [] dmu_zfetch_dofetch.isra.6+0x114/0x180 [zfs]
[23018.214107] [] dmu_zfetch+0x4ef/0xe00 [zfs]
[23018.214163] [] dbuf_read+0x7bc/0x930 [zfs]
[23018.214232] [] dnode_hold_impl+0xca/0x600 [zfs]
[23018.214296] [] dnode_hold+0x19/0x20 [zfs]
[23018.214352] [] dmu_buf_try_add_ref+0xb0/0x1c0 [zfs]
[23018.214413] [] dmu_objset_userquota_get_ids+0x9d/0x530 [zfs]
[23018.214478] [] dnode_sync+0xef/0x9e0 [zfs]
[23018.214534] [] ? dbuf_sync_list+0x9d/0xd0 [zfs]
[23018.214599] [] ? dnode_sync+0x305/0x9e0 [zfs]
[23018.214609] [] ? mutex_lock+0x12/0x2f
[23018.214668] [] dmu_objset_sync_dnodes+0x9c/0xc0 [zfs]
[23018.214729] [] dmu_objset_sync+0x1d0/0x320 [zfs]
[23018.214789] [] ? recordsize_changed_cb+0x20/0x20 [zfs]
[23018.214857] [] ? dmu_objset_sync+0x320/0x320 [zfs]
[23018.214925] [] dsl_dataset_sync+0x4e/0x90 [zfs]
[23018.214994] [] dsl_pool_sync+0xa3/0x420 [zfs]
[23018.215075] [] spa_sync+0x388/0xb70 [zfs]
[23018.215085] [] ? autoremove_wake_function+0x2b/0x40
[23018.215163] [] txg_sync_thread+0x3cc/0x640 [zfs]
[23018.215246] [] ? txg_fini+0x2a0/0x2a0 [zfs]
[23018.215270] [] thread_generic_wrapper+0x71/0x80 [spl]
[23018.215290] [] ? __thread_exit+0x20/0x20 [spl]
[23018.215299] [] kthread+0xcf/0xe0
[23018.215305] [] ? kthread_create_on_node+0x140/0x140
[23018.215314] [] ret_from_fork+0x58/0x90
[23018.215320] [] ? kthread_create_on_node+0x140/0x140
[23018.215347] INFO: task slimserver.pl:3674 blocked for more than 120 seconds.
[23018.215433] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.215525] slimserver.pl D ffff88041e293680 0 3674 1 0x00000080
[23018.215531] ffff8803d8943530 0000000000000082 ffff8800c55538e0 ffff8803d8943fd8
[23018.215538] ffff8803d8943fd8 ffff8803d8943fd8 ffff8800c55538e0 ffffffffa022d6e0
[23018.215545] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.215551] Call Trace:
[23018.215577] [] schedule+0x29/0x70
[23018.215599] [] cv_wait_common+0x125/0x150 [spl]
[23018.215608] [] ? wake_up_bit+0x30/0x30
[23018.215629] [] __cv_wait+0x15/0x20 [spl]
[23018.215679] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.215733] [] arc_read+0x335/0xad0 [zfs]
[23018.215788] [] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
[23018.215852] [] dbuf_read+0x2ad/0x930 [zfs]
[23018.215892] [] __dbuf_hold_impl+0x478/0x500 [zfs]
[23018.215931] [] dbuf_hold_impl+0x73/0xa0 [zfs]
[23018.215968] [] dbuf_hold+0x2f/0x60 [zfs]
[23018.216011] [] dmu_buf_hold_noread+0x87/0x140 [zfs]
[23018.216053] [] dmu_buf_hold+0x2f/0x80 [zfs]
[23018.216119] [] zap_lockdir+0x58/0x900 [zfs]
[23018.216127] [] ? __remove_hrtimer+0x46/0xa0
[23018.216133] [] ? hrtimer_try_to_cancel+0x57/0x100
[23018.216195] [] zap_lookup_norm+0x5b/0x1d0 [zfs]
[23018.216255] [] zap_lookup+0x33/0x40 [zfs]
[23018.216317] [] zfs_dirent_lock+0x4d3/0x590 [zfs]
[23018.216377] [] zfs_dirlook+0x90/0x330 [zfs]
[23018.216385] [] ? kmem_cache_alloc+0x1ba/0x1d0
[23018.216445] [] zfs_lookup+0x2f1/0x340 [zfs]
[23018.216506] [] zpl_lookup+0xae/0x130 [zfs]
[23018.216516] [] lookup_real+0x1d/0x50
[23018.216522] [] __lookup_hash+0x42/0x60
[23018.216529] [] lookup_slow+0x42/0xa7
[23018.216534] [] link_path_walk+0x80f/0x8b0
[23018.216542] [] ? type_attribute_bounds_av.isra.10+0x66/0x2e0
[23018.216551] [] ? _raw_spin_unlock_bh+0x1b/0x40
[23018.216556] [] path_lookupat+0x6b/0x7a0
[23018.216562] [] ? context_struct_compute_av+0x34d/0x470
[23018.216567] [] ? kmem_cache_alloc+0x35/0x1d0
[23018.216571] [] ? getname_flags+0x4f/0x1a0
[23018.216578] [] filename_lookup+0x2b/0xc0
[23018.216583] [] user_path_at_empty+0x67/0xc0
[23018.216591] [] ? __posix_lock_file+0x226/0x560
[23018.216596] [] user_path_at+0x11/0x20
[23018.216602] [] vfs_fstatat+0x63/0xc0
[23018.216609] [] SYSC_newstat+0x2e/0x60
[23018.216616] [] ? __audit_syscall_exit+0x1e6/0x280
[23018.216621] [] SyS_newstat+0xe/0x10
[23018.216628] [

@rgmiller rgmiller changed the title 0.6.5.1 still have problems with large rsync's and xattrs 0.6.5.1 still has problems with large rsync's and xattrs Sep 23, 2015
@rgmiller
Copy link
Author

Just realized not all of my stack traces were appearing in the comment above. Here's the rest of the output:

[23018.216637] INFO: task BackupPC_nightl:11059 blocked for more than 120 seconds.
[23018.216700] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.216768] BackupPC_nightl D ffff88041e293680 0 11059 3608 0x00000080
[23018.216773] ffff880194ed7598 0000000000000082 ffff880404b40b60 ffff880194ed7fd8
[23018.216781] ffff880194ed7fd8 ffff880194ed7fd8 ffff880404b40b60 ffffffffa022d6e0
[23018.216786] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.216792] Call Trace:
[23018.216820] [] schedule+0x29/0x70
[23018.216839] [] cv_wait_common+0x125/0x150 [spl]
[23018.216845] [] ? wake_up_bit+0x30/0x30
[23018.216862] [] __cv_wait+0x15/0x20 [spl]
[23018.216894] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.216928] [] arc_buf_alloc+0x10f/0x190 [zfs]
[23018.216960] [] arc_read+0x1c5/0xad0 [zfs]
[23018.216996] [] ? dbuf_create+0x36c/0x4e0 [zfs]
[23018.217013] [] ? spl_kmem_cache_alloc+0xa1/0x160 [spl]
[23018.217019] [] ? mutex_lock+0x12/0x2f
[23018.217054] [] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
[23018.217089] [] dbuf_read+0x2ad/0x930 [zfs]
[23018.217129] [] dmu_buf_hold_array_by_dnode+0x123/0x4d0 [zfs]
[23018.217172] [] dmu_read_uio_dnode+0x52/0x100 [zfs]
[23018.217212] [] dmu_read_uio_dbuf+0x47/0x60 [zfs]
[23018.217273] [] zfs_read+0x13d/0x400 [zfs]
[23018.217333] [] zpl_read_common_iovec.constprop.10+0x80/0xd0 [zfs]
[23018.217391] [] zpl_read_common+0x40/0x60 [zfs]
[23018.217446] [] __zpl_xattr_get+0x225/0x2a0 [zfs]
[23018.217500] [] zpl_xattr_get+0x75/0x120 [zfs]
[23018.217553] [] zpl_xattr_security_get+0x50/0x70 [zfs]
[23018.217561] [] generic_getxattr+0x4c/0x70
[23018.217568] [] inode_doinit_with_dentry+0x16f/0x680
[23018.217625] [] ? rrw_exit+0x68/0x170 [zfs]
[23018.217632] [] selinux_d_instantiate+0x1c/0x20
[23018.217639] [] security_d_instantiate+0x1b/0x30
[23018.217645] [] d_instantiate+0x50/0x70
[23018.217651] [] d_splice_alias+0x44/0x120
[23018.217705] [] zpl_lookup+0x11c/0x130 [zfs]
[23018.217712] [] lookup_real+0x1d/0x50
[23018.217718] [] __lookup_hash+0x42/0x60
[23018.217724] [] lookup_slow+0x42/0xa7
[23018.217729] [] path_lookupat+0x773/0x7a0
[23018.217735] [] ? kmem_cache_alloc+0x35/0x1d0
[23018.217739] [] ? getname_flags+0x4f/0x1a0
[23018.217744] [] filename_lookup+0x2b/0xc0
[23018.217748] [] user_path_at_empty+0x67/0xc0
[23018.217756] [] ? from_kgid_munged+0x12/0x20
[23018.217761] [] ? cp_new_stat+0x14f/0x180
[23018.217766] [] user_path_at+0x11/0x20
[23018.217771] [] vfs_fstatat+0x63/0xc0
[23018.217777] [] SYSC_newstat+0x2e/0x60
[23018.217783] [] ? __audit_syscall_exit+0x1e6/0x280
[23018.217788] [] SyS_newstat+0xe/0x10
[23018.217795] [] system_call_fastpath+0x16/0x1b
[23018.217809] INFO: task BackupPC_nightl:11060 blocked for more than 120 seconds.
[23018.217873] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[23018.217939] BackupPC_nightl D ffff88041e393680 0 11060 3608 0x00000080
[23018.217944] ffff8802f7d3f4d8 0000000000000086 ffff880404b44fa0 ffff8802f7d3ffd8
[23018.217950] ffff8802f7d3ffd8 ffff8802f7d3ffd8 ffff880404b44fa0 ffffffffa022d6e0
[23018.217955] ffffffffa022d7a0 ffffffffa022d6e8 ffffffffa022d7c8 0000000000000000
[23018.217960] Call Trace:
[23018.217981] [] schedule+0x29/0x70
[23018.217999] [] cv_wait_common+0x125/0x150 [spl]
[23018.218005] [] ? wake_up_bit+0x30/0x30
[23018.218022] [] __cv_wait+0x15/0x20 [spl]
[23018.218054] [] arc_get_data_buf.isra.21+0x3e7/0x3f0 [zfs]
[23018.218087] [] arc_read+0x335/0xad0 [zfs]
[23018.218123] [] ? dbuf_rele_and_unlock+0x460/0x460 [zfs]
[23018.218162] [] dbuf_read+0x2ad/0x930 [zfs]
[23018.218206] [] dmu_buf_hold+0x50/0x80 [zfs]
[23018.218269] [] zap_lockdir+0x58/0x900 [zfs]
[23018.218335] [] ? sa_lookup+0x9c/0xc0 [zfs]
[23018.218408] [] zap_lookup_norm+0x5b/0x1d0 [zfs]
[23018.218481] [] zap_lookup+0x33/0x40 [zfs]
[23018.218552] [] zfs_dirent_lock+0x4d3/0x590 [zfs]
[23018.218624] [] zfs_dirlook+0x90/0x330 [zfs]
[23018.218694] [] zfs_lookup+0x2f1/0x340 [zfs]
[23018.218763] [] __zpl_xattr_get+0x108/0x2a0 [zfs]
[23018.218832] [] zpl_xattr_get+0x75/0x120 [zfs]
[23018.218908] [] zpl_xattr_security_get+0x50/0x70 [zfs]
[23018.218918] [] generic_getxattr+0x4c/0x70
[23018.218925] [] inode_doinit_with_dentry+0x16f/0x680
[23018.218992] [] ? rrw_exit+0x68/0x170 [zfs]
[23018.219002] [] selinux_d_instantiate+0x1c/0x20
[23018.219008] [] security_d_instantiate+0x1b/0x30
[23018.219015] [] d_instantiate+0x50/0x70
[23018.219021] [] d_splice_alias+0x44/0x120
[23018.219089] [] zpl_lookup+0x11c/0x130 [zfs]
[23018.219098] [] lookup_real+0x1d/0x50
[23018.219105] [] __lookup_hash+0x42/0x60
[23018.219112] [] lookup_slow+0x42/0xa7
[23018.219117] [] path_lookupat+0x773/0x7a0
[23018.219124] [] ? ttwu_do_wakeup+0x19/0xd0
[23018.219132] [] ? kmem_cache_alloc+0x35/0x1d0
[23018.219137] [] ? getname_flags+0x4f/0x1a0
[23018.219142] [] filename_lookup+0x2b/0xc0
[23018.219147] [] user_path_at_empty+0x67/0xc0
[23018.219155] [] ? from_kgid_munged+0x12/0x20
[23018.219163] [] ? cp_new_stat+0x14f/0x180
[23018.219168] [] user_path_at+0x11/0x20
[23018.219174] [] vfs_fstatat+0x63/0xc0
[23018.219179] [] SYSC_newstat+0x2e/0x60
[23018.219186] [] ? __audit_syscall_exit+0x1e6/0x280
[23018.219194] [] SyS_newstat+0xe/0x10
[23018.219201] [] system_call_fastpath+0x16/0x1b

@dweeezil
Copy link
Contributor

Yes, it's the same problem as #3808.

@behlendorf
Copy link
Contributor

Closing as duplicate of #3808 to avoid confusion.

behlendorf added a commit to behlendorf/zfs that referenced this issue Sep 23, 2015
As described in the comment above arc_reclaim_thread() it's critical
that the reclaim thread be careful about blocking.  Just like it must
never wait on a hash lock, it must never wait on a task which can in
turn wait on the CV in arc_get_data_buf().  This will deadlock, see
issue openzfs#3822 for full backtraces showing the problem.

To resolve this issue arc_kmem_reap_now() has been updated to use the
asynchronous arc prune function.  This means that arc_prune_async()
may now be called while there are still outstanding arc_prune_tasks.
However, this isn't a problem because arc_prune_async() already
keeps a reference count preventing multiple outstanding tasks per
registered consumer.  Functionally, this behavior is the same as
the counterpart illumos function dnlc_reduce_cache().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3808
Issue openzfs#3822
@behlendorf behlendorf added this to the 0.6.5.2 milestone Sep 25, 2015
behlendorf added a commit that referenced this issue Sep 25, 2015
As described in the comment above arc_reclaim_thread() it's critical
that the reclaim thread be careful about blocking.  Just like it must
never wait on a hash lock, it must never wait on a task which can in
turn wait on the CV in arc_get_data_buf().  This will deadlock, see
issue #3822 for full backtraces showing the problem.

To resolve this issue arc_kmem_reap_now() has been updated to use the
asynchronous arc prune function.  This means that arc_prune_async()
may now be called while there are still outstanding arc_prune_tasks.
However, this isn't a problem because arc_prune_async() already
keeps a reference count preventing multiple outstanding tasks per
registered consumer.  Functionally, this behavior is the same as
the counterpart illumos function dnlc_reduce_cache().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #3808
Issue #3834
Issue #3822
behlendorf added a commit that referenced this issue Sep 30, 2015
As described in the comment above arc_reclaim_thread() it's critical
that the reclaim thread be careful about blocking.  Just like it must
never wait on a hash lock, it must never wait on a task which can in
turn wait on the CV in arc_get_data_buf().  This will deadlock, see
issue #3822 for full backtraces showing the problem.

To resolve this issue arc_kmem_reap_now() has been updated to use the
asynchronous arc prune function.  This means that arc_prune_async()
may now be called while there are still outstanding arc_prune_tasks.
However, this isn't a problem because arc_prune_async() already
keeps a reference count preventing multiple outstanding tasks per
registered consumer.  Functionally, this behavior is the same as
the counterpart illumos function dnlc_reduce_cache().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Issue #3808
Issue #3834
Issue #3822
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants