Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txg_sync, zfs blocked for more than 120s on debian jessie/zfs 0.6.5.2-2 #3903

Closed
Pentium100MHz opened this issue Oct 9, 2015 · 2 comments
Closed
Labels
Status: Inactive Not being actively updated

Comments

@Pentium100MHz
Copy link

Hi,
I sometimes get this error message in dmesg:

[143865.393582] INFO: task txg_sync:810 blocked for more than 120 seconds.
[143865.393782]       Tainted: P           O  3.16.0-4-amd64 #1
[143865.393931] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[143865.394077] txg_sync        D ffff880ffabb7ac8     0   810      2 0x00000000
[143865.394082]  ffff880ffabb7670 0000000000000046 0000000000012f00 ffff880fee017fd8
[143865.394085]  0000000000012f00 ffff880ffabb7670 ffff88103fc937b0 ffff880127cb6bf0
[143865.394088]  ffff880127cb6c30 0000000000000001 ffff880ffaf9a000 0000000000000000
[143865.394091] Call Trace:
[143865.394097]  [<ffffffff8150e0d9>] ? io_schedule+0x99/0x120
[143865.394111]  [<ffffffffa0544760>] ? cv_wait_common+0x90/0x100 [spl]
[143865.394116]  [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
[143865.394134]  [<ffffffffa08e779b>] ? zio_wait+0x10b/0x1e0 [zfs]
[143865.394153]  [<ffffffffa088fa69>] ? spa_sync+0x719/0xb30 [zfs]
[143865.394172]  [<ffffffffa08a1131>] ? txg_sync_thread+0x3d1/0x680 [zfs]
[143865.394190]  [<ffffffffa08a0d60>] ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
[143865.394195]  [<ffffffffa053fdeb>] ? thread_generic_wrapper+0x6b/0x80 [spl]
[143865.394200]  [<ffffffffa053fd80>] ? __thread_exit+0x20/0x20 [spl]
[143865.394204]  [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
[143865.394207]  [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[143865.394210]  [<ffffffff81511598>] ? ret_from_fork+0x58/0x90
[143865.394213]  [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[143865.394264] INFO: task zfs:11021 blocked for more than 120 seconds.
[143865.394428]       Tainted: P           O  3.16.0-4-amd64 #1
[143865.394577] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[143865.394762] zfs             D ffff880ddd0927e8     0 11021  11015 0x00000000
[143865.394766]  ffff880ddd092390 0000000000000086 0000000000012f00 ffff88004cad3fd8
[143865.394769]  0000000000012f00 ffff880ddd092390 ffff880ffd80f800 000000000027bd3a
[143865.394772]  ffff880ffd80f8c8 0000000000000000 ffff880ffa126300 0000000000000017
[143865.394775] Call Trace:
[143865.394783]  [<ffffffffa0540a35>] ? taskq_wait_id+0x45/0x80 [spl]
[143865.394787]  [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
[143865.394805]  [<ffffffffa0857e90>] ? dmu_send_impl+0x5a0/0x5a0 [zfs]
[143865.394828]  [<ffffffffa089089e>] ? spa_taskq_dispatch_sync+0x7e/0xa0 [zfs]
[143865.394845]  [<ffffffffa085710f>] ? dump_bytes+0x3f/0x50 [zfs]
[143865.394862]  [<ffffffffa085788d>] ? dump_write+0x25d/0x2c0 [zfs]
[143865.394880]  [<ffffffffa0859798>] ? backup_cb+0x238/0x850 [zfs]
[143865.394896]  [<ffffffffa085bd97>] ? traverse_visitbp+0x4b7/0x7f0 [zfs]
[143865.394913]  [<ffffffffa085bbe2>] ? traverse_visitbp+0x302/0x7f0 [zfs]
[143865.394932]  [<ffffffffa085bbe2>] ? traverse_visitbp+0x302/0x7f0 [zfs]
[143865.394965]  [<ffffffffa085bbe2>] ? traverse_visitbp+0x302/0x7f0 [zfs]
[143865.394992]  [<ffffffffa085c861>] ? traverse_dnode+0x71/0xe0 [zfs]
[143865.395018]  [<ffffffffa085bfd6>] ? traverse_visitbp+0x6f6/0x7f0 [zfs]
[143865.395045]  [<ffffffffa085bbe2>] ? traverse_visitbp+0x302/0x7f0 [zfs]
[143865.395071]  [<ffffffffa085bbe2>] ? traverse_visitbp+0x302/0x7f0 [zfs]
[143865.395097]  [<ffffffffa085bbe2>] ? traverse_visitbp+0x302/0x7f0 [zfs]
[143865.395124]  [<ffffffffa085bbe2>] ? traverse_visitbp+0x302/0x7f0 [zfs]
[143865.395151]  [<ffffffffa085bbe2>] ? traverse_visitbp+0x302/0x7f0 [zfs]
[143865.395179]  [<ffffffffa085bbe2>] ? traverse_visitbp+0x302/0x7f0 [zfs]
[143865.395207]  [<ffffffffa085c861>] ? traverse_dnode+0x71/0xe0 [zfs]
[143865.395233]  [<ffffffffa085beaa>] ? traverse_visitbp+0x5ca/0x7f0 [zfs]
[143865.395261]  [<ffffffffa085c250>] ? traverse_impl+0x180/0x3f0 [zfs]
[143865.395295]  [<ffffffffa085c50e>] ? traverse_dataset+0x4e/0x60 [zfs]
[143865.395312]  [<ffffffffa0859560>] ? dmu_recv_end_check+0x220/0x220 [zfs]
[143865.395329]  [<ffffffffa0857cdc>] ? dmu_send_impl+0x3ec/0x5a0 [zfs]
[143865.395363]  [<ffffffffa085a081>] ? dmu_send_obj+0x171/0x1f0 [zfs]
[143865.395395]  [<ffffffffa08c3275>] ? zfs_ioc_send+0xc5/0x2c0 [zfs]
[143865.395435]  [<ffffffffa08c7079>] ? zfsdev_ioctl+0x489/0x4c0 [zfs]
[143865.395441]  [<ffffffff811ba4ef>] ? do_vfs_ioctl+0x2cf/0x4b0
[143865.395446]  [<ffffffff811ba751>] ? SyS_ioctl+0x81/0xa0
[143865.395451]  [<ffffffff8151164d>] ? system_call_fast_compare_end+0x10/0x15
[143865.395457]  [<ffffffff8151164d>] ? system_call_fast_compare_end+0x10/0x15

Iostat shows the pool SSDs are not overloaded, there is enough free space (66% utilized). The CPU is also not very loaded.

Versions:
Debian 8.2
zfs 0.6.5.2-2
spl 0.6.5-1

@dasjoe
Copy link
Contributor

dasjoe commented Oct 16, 2015

I see possibly related traces from ZoL 0.6.5.2 on up-to-date Ubuntu 14.04 during a scrub with minor other load:

[101962.669448] INFO: task txg_sync:7966 blocked for more than 120 seconds.
[101962.669494]       Tainted: P           OE  3.19.0-30-generic #34~14.04.1-Ubuntu
[101962.669532] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[101962.669572] txg_sync        D ffff8800738731c8     0  7966      2 0x00000000
[101962.669577]  ffff8800738731c8 ffff881031df0000 0000000000013e80 ffff880073873fd8
[101962.669580]  0000000000013e80 ffff881038e94260 ffff881031df0000 ffff8800738731d8
[101962.669582]  ffff880d1a6ffba0 ffffffffc05c8890 ffff880d1a6ffba8 ffffffffc05c88b8
[101962.669586] Call Trace:
[101962.669597]  [<ffffffff817b2a99>] schedule+0x29/0x70
[101962.669607]  [<ffffffffc0370d15>] cv_wait_common+0xe5/0x120 [spl]
[101962.669612]  [<ffffffff810b4e30>] ? prepare_to_wait_event+0x110/0x110
[101962.669618]  [<ffffffffc0370d65>] __cv_wait+0x15/0x20 [spl]
[101962.669640]  [<ffffffffc04b26ac>] arc_read+0xbc/0xa50 [zfs]
[101962.669644]  [<ffffffff811d09ec>] ? __kmalloc_node+0xac/0x2e0
[101962.669679]  [<ffffffffc055caff>] ? zio_taskq_dispatch+0x8f/0xa0 [zfs]
[101962.669684]  [<ffffffffc03699c7>] ? spl_kmem_alloc+0xd7/0x190 [spl]
[101962.669689]  [<ffffffffc03699c7>] ? spl_kmem_alloc+0xd7/0x190 [spl]
[101962.669702]  [<ffffffffc04b2570>] ? arc_buf_remove_ref+0x140/0x140 [zfs]
[101962.669724]  [<ffffffffc04eefc6>] dsl_scan_visitbp+0x266/0xbf0 [zfs]
[101962.669744]  [<ffffffffc04ee8ca>] ? dsl_scan_prefetch.isra.5+0xba/0xc0 [zfs]
[101962.669764]  [<ffffffffc04ef3e4>] dsl_scan_visitbp+0x684/0xbf0 [zfs]
[101962.669783]  [<ffffffffc04ef0e1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
[101962.669802]  [<ffffffffc04ef0e1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
[101962.669820]  [<ffffffffc04ef0e1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
[101962.669839]  [<ffffffffc04ef0e1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
[101962.669857]  [<ffffffffc04ef0e1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
[101962.669876]  [<ffffffffc04ef0e1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
[101962.669879]  [<ffffffff811d09ec>] ? __kmalloc_node+0xac/0x2e0
[101962.669901]  [<ffffffffc04ef5db>] dsl_scan_visitbp+0x87b/0xbf0 [zfs]
[101962.669926]  [<ffffffffc04efa23>] dsl_scan_visitds+0xd3/0x4b0 [zfs]
[101962.669933]  [<ffffffffc0371bf0>] ? tsd_hash_search.isra.1+0x90/0xa0 [spl]
[101962.669957]  [<ffffffffc04f0ea8>] dsl_scan_sync+0x2a8/0xc50 [zfs]
[101962.669989]  [<ffffffffc055cdb1>] ? zio_destroy+0xd1/0xe0 [zfs]
[101962.670015]  [<ffffffffc0504e35>] spa_sync+0x3d5/0xb50 [zfs]
[101962.670018]  [<ffffffff810b4668>] ? __wake_up_common+0x58/0x90
[101962.670023]  [<ffffffff8101df59>] ? read_tsc+0x9/0x10
[101962.670051]  [<ffffffffc051732f>] txg_sync_thread+0x3bf/0x630 [zfs]
[101962.670080]  [<ffffffffc0516f70>] ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[101962.670087]  [<ffffffffc036bfd1>] thread_generic_wrapper+0x71/0x80 [spl]
[101962.670092]  [<ffffffffc036bf60>] ? __thread_exit+0x20/0x20 [spl]
[101962.670097]  [<ffffffff81093822>] kthread+0xd2/0xf0
[101962.670115]  [<ffffffffc04ba8b8>] ? dbuf_read+0x2a8/0x950 [zfs]
[101962.670145]  [<ffffffffc052b8b1>] ? zap_lockdir+0x61/0x920 [zfs]
[101962.670149]  [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0
[101962.670152]  [<ffffffff817b6d98>] ret_from_fork+0x58/0x90
[101962.670155]  [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0
[101962.670255] INFO: task htop:7341 blocked for more than 120 seconds.
[101962.670293]       Tainted: P           OE  3.19.0-30-generic #34~14.04.1-Ubuntu
[101962.670335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[101962.670379] htop            D ffff880fa8e4b708     0  7341   1402 0x00000000
[101962.670383]  ffff880fa8e4b708 ffff880fa6edb750 0000000000013e80 ffff880fa8e4bfd8
[101962.670386]  0000000000013e80 ffff881038e96ea0 ffff880fa6edb750 ffff88067bf7a228
[101962.670388]  ffff88067bf7a208 ffff88067bf7a20c ffff880fa6edb750 00000000ffffffff
[101962.670391] Call Trace:
[101962.670395]  [<ffffffff817b2fa9>] schedule_preempt_disabled+0x29/0x70
[101962.670398]  [<ffffffff817b4c95>] __mutex_lock_slowpath+0x95/0x100
[101962.670402]  [<ffffffff811cb891>] ? set_track+0x61/0x1a0
[101962.670404]  [<ffffffff817b4d23>] mutex_lock+0x23/0x37
[101962.670426]  [<ffffffffc04d3182>] dmu_zfetch+0x302/0xe80 [zfs]
[101962.670445]  [<ffffffffc04baded>] dbuf_read+0x7dd/0x950 [zfs]
[101962.670467]  [<ffffffffc04d4eba>] dnode_hold_impl+0xca/0x600 [zfs]
[101962.670489]  [<ffffffffc04d5409>] dnode_hold+0x19/0x20 [zfs]
[101962.670508]  [<ffffffffc04c39a3>] dmu_read+0x33/0x180 [zfs]
[101962.670533]  [<ffffffffc04fb361>] ? rrw_enter_read_impl+0xc1/0x170 [zfs]
[101962.670564]  [<ffffffffc054a51f>] zfs_getpage+0x11f/0x1e0 [zfs]
[101962.670594]  [<ffffffffc0566230>] ? zpl_writepages+0x180/0x180 [zfs]
[101962.670624]  [<ffffffffc056627c>] zpl_readpage+0x4c/0x90 [zfs]
[101962.670628]  [<ffffffff8118263a>] read_cache_pages+0xba/0x120
[101962.670659]  [<ffffffffc05660ae>] zpl_readpages+0x1e/0x20 [zfs]
[101962.670662]  [<ffffffff8118282e>] __do_page_cache_readahead+0x18e/0x220
[101962.670667]  [<ffffffff81178777>] filemap_fault+0x397/0x430
[101962.670671]  [<ffffffff811a3ac6>] ? do_set_pte+0xf6/0x120
[101962.670674]  [<ffffffff811a1339>] __do_fault+0x39/0x90
[101962.670676]  [<ffffffff811a3cc7>] do_read_fault.isra.55+0x1d7/0x2f0
[101962.670680]  [<ffffffff811a5884>] handle_mm_fault+0x604/0x10e0
[101962.670684]  [<ffffffff812014b6>] ? core_sys_select+0x236/0x2e0
[101962.670689]  [<ffffffff81061e94>] __do_page_fault+0x1c4/0x5b0
[101962.670694]  [<ffffffff810a5318>] ? __enqueue_entity+0x78/0x80
[101962.670697]  [<ffffffff810a725d>] ? set_next_entity+0x9d/0xb0
[101962.670700]  [<ffffffff810af6ab>] ? pick_next_task_fair+0x61b/0x880
[101962.670704]  [<ffffffff8101df59>] ? read_tsc+0x9/0x10
[101962.670710]  [<ffffffff810e2b0e>] ? ktime_get_ts64+0x4e/0xf0
[101962.670712]  [<ffffffff812007fc>] ? poll_select_copy_remaining+0xdc/0x130
[101962.670715]  [<ffffffff810622b1>] do_page_fault+0x31/0x70
[101962.670719]  [<ffffffff817b8fa8>] page_fault+0x28/0x30
[101962.670728] INFO: task watch:20777 blocked for more than 120 seconds.
[101962.670766]       Tainted: P           OE  3.19.0-30-generic #34~14.04.1-Ubuntu
[101962.670807] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[101962.670849] watch           D ffff8816b4eff708     0 20777  11910 0x00000000
[101962.670852]  ffff8816b4eff708 ffff88131ef04d70 0000000000013e80 ffff8816b4efffd8
[101962.670855]  0000000000013e80 ffff881038e96ea0 ffff88131ef04d70 ffff88067bf7a228
[101962.670858]  ffff88067bf7a208 ffff88067bf7a20c ffff88131ef04d70 00000000ffffffff
[101962.670861] Call Trace:
[101962.670865]  [<ffffffff817b2fa9>] schedule_preempt_disabled+0x29/0x70
[101962.670868]  [<ffffffff817b4c95>] __mutex_lock_slowpath+0x95/0x100
[101962.670871]  [<ffffffff811cb891>] ? set_track+0x61/0x1a0
[101962.670874]  [<ffffffff817b4d23>] mutex_lock+0x23/0x37
[101962.670896]  [<ffffffffc04d3182>] dmu_zfetch+0x302/0xe80 [zfs]
[101962.670914]  [<ffffffffc04baded>] dbuf_read+0x7dd/0x950 [zfs]
[101962.670936]  [<ffffffffc04d4eba>] dnode_hold_impl+0xca/0x600 [zfs]
[101962.670960]  [<ffffffffc04d5409>] dnode_hold+0x19/0x20 [zfs]
[101962.670980]  [<ffffffffc04c39a3>] dmu_read+0x33/0x180 [zfs]
[101962.671005]  [<ffffffffc04fb361>] ? rrw_enter_read_impl+0xc1/0x170 [zfs]
[101962.671036]  [<ffffffffc054a51f>] zfs_getpage+0x11f/0x1e0 [zfs]
[101962.671067]  [<ffffffffc0566230>] ? zpl_writepages+0x180/0x180 [zfs]
[101962.671097]  [<ffffffffc056627c>] zpl_readpage+0x4c/0x90 [zfs]
[101962.671100]  [<ffffffff8118263a>] read_cache_pages+0xba/0x120
[101962.671130]  [<ffffffffc05660ae>] zpl_readpages+0x1e/0x20 [zfs]
[101962.671133]  [<ffffffff8118282e>] __do_page_cache_readahead+0x18e/0x220
[101962.671135]  [<ffffffff81178777>] filemap_fault+0x397/0x430
[101962.671138]  [<ffffffff811a3ac6>] ? do_set_pte+0xf6/0x120
[101962.671140]  [<ffffffff811a1339>] __do_fault+0x39/0x90
[101962.671143]  [<ffffffff811a3cc7>] do_read_fault.isra.55+0x1d7/0x2f0
[101962.671146]  [<ffffffff811cf193>] ? kmem_cache_alloc+0x1c3/0x220
[101962.671149]  [<ffffffff811a5884>] handle_mm_fault+0x604/0x10e0
[101962.671175]  [<ffffffffc04fb68c>] ? rrw_exit+0x6c/0x180 [zfs]
[101962.671178]  [<ffffffff81061e94>] __do_page_fault+0x1c4/0x5b0
[101962.671184]  [<ffffffff8110b1a2>] ? from_kgid_munged+0x12/0x20
[101962.671188]  [<ffffffff811f175f>] ? SYSC_newstat+0x2f/0x40
[101962.671190]  [<ffffffff810622b1>] do_page_fault+0x31/0x70
[101962.671193]  [<ffffffff817b8fa8>] page_fault+0x28/0x30
[101962.671196] INFO: task watch:22521 blocked for more than 120 seconds.
[101962.671234]       Tainted: P           OE  3.19.0-30-generic #34~14.04.1-Ubuntu
[101962.671275] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[101962.671317] watch           D ffff8816ba96f708     0 22521  21243 0x00000000
[101962.671320]  ffff8816ba96f708 ffff881b63769620 0000000000013e80 ffff8816ba96ffd8
[101962.671323]  0000000000013e80 ffff881038e91620 ffff881b63769620 ffff88067bf7a228
[101962.671325]  ffff88067bf7a208 ffff88067bf7a20c ffff881b63769620 00000000ffffffff
[101962.671328] Call Trace:
[101962.671332]  [<ffffffff817b2fa9>] schedule_preempt_disabled+0x29/0x70
[101962.671335]  [<ffffffff817b4c95>] __mutex_lock_slowpath+0x95/0x100
[101962.671337]  [<ffffffff811cb891>] ? set_track+0x61/0x1a0
[101962.671340]  [<ffffffff817b4d23>] mutex_lock+0x23/0x37
[101962.671362]  [<ffffffffc04d3182>] dmu_zfetch+0x302/0xe80 [zfs]
[101962.671380]  [<ffffffffc04baded>] dbuf_read+0x7dd/0x950 [zfs]
[101962.671402]  [<ffffffffc04d4eba>] dnode_hold_impl+0xca/0x600 [zfs]
[101962.671423]  [<ffffffffc04d5409>] dnode_hold+0x19/0x20 [zfs]
[101962.671442]  [<ffffffffc04c39a3>] dmu_read+0x33/0x180 [zfs]
[101962.671467]  [<ffffffffc04fb361>] ? rrw_enter_read_impl+0xc1/0x170 [zfs]
[101962.671499]  [<ffffffffc054a51f>] zfs_getpage+0x11f/0x1e0 [zfs]
[101962.671530]  [<ffffffffc0566230>] ? zpl_writepages+0x180/0x180 [zfs]
[101962.671560]  [<ffffffffc056627c>] zpl_readpage+0x4c/0x90 [zfs]
[101962.671563]  [<ffffffff8118263a>] read_cache_pages+0xba/0x120
[101962.671593]  [<ffffffffc05660ae>] zpl_readpages+0x1e/0x20 [zfs]
[101962.671596]  [<ffffffff8118282e>] __do_page_cache_readahead+0x18e/0x220
[101962.671599]  [<ffffffff81178777>] filemap_fault+0x397/0x430
[101962.671601]  [<ffffffff811a3ac6>] ? do_set_pte+0xf6/0x120
[101962.671604]  [<ffffffff811a1339>] __do_fault+0x39/0x90
[101962.671606]  [<ffffffff811a3cc7>] do_read_fault.isra.55+0x1d7/0x2f0
[101962.671609]  [<ffffffff811a5884>] handle_mm_fault+0x604/0x10e0
[101962.671635]  [<ffffffffc04fb68c>] ? rrw_exit+0x6c/0x180 [zfs]
[101962.671638]  [<ffffffff81061e94>] __do_page_fault+0x1c4/0x5b0
[101962.671641]  [<ffffffff8110b1a2>] ? from_kgid_munged+0x12/0x20
[101962.671644]  [<ffffffff811f175f>] ? SYSC_newstat+0x2f/0x40
[101962.671646]  [<ffffffff810622b1>] do_page_fault+0x31/0x70
[101962.671649]  [<ffffffff817b8fa8>] page_fault+0x28/0x30
[102082.763099] INFO: task txg_sync:6176 blocked for more than 120 seconds.
[102082.763137]       Tainted: P           OE  3.19.0-30-generic #34~14.04.1-Ubuntu
[102082.763173] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102082.763213] txg_sync        D ffff88201f94fb48     0  6176      2 0x00000000
[102082.763218]  ffff88201f94fb48 ffff882034e04260 0000000000013e80 ffff88201f94ffd8
[102082.763221]  0000000000013e80 ffff8820314f5880 ffff882034e04260 ffff880bd3f67448
[102082.763224]  ffff88207fd74778 ffff880bd3f67448 ffff880bd3f67488 ffff880bd3f67470
[102082.763227] Call Trace:
[102082.763236]  [<ffffffff817b2da0>] io_schedule+0xa0/0x130
[102082.763247]  [<ffffffffc0370ccf>] cv_wait_common+0x9f/0x120 [spl]
[102082.763252]  [<ffffffff810b4e30>] ? prepare_to_wait_event+0x110/0x110
[102082.763258]  [<ffffffffc0370da8>] __cv_wait_io+0x18/0x20 [spl]
[102082.763296]  [<ffffffffc0560663>] zio_wait+0x123/0x210 [zfs]
[102082.763318]  [<ffffffffc04e9f71>] dsl_pool_sync+0xb1/0x470 [zfs]
[102082.763343]  [<ffffffffc05104c0>] ? spa_lookup+0x60/0x60 [zfs]
[102082.763367]  [<ffffffffc0504dd8>] spa_sync+0x378/0xb50 [zfs]
[102082.763370]  [<ffffffff810b4668>] ? __wake_up_common+0x58/0x90
[102082.763376]  [<ffffffff8101df59>] ? read_tsc+0x9/0x10
[102082.763401]  [<ffffffffc051732f>] txg_sync_thread+0x3bf/0x630 [zfs]
[102082.763426]  [<ffffffffc0516f70>] ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[102082.763431]  [<ffffffffc036bfd1>] thread_generic_wrapper+0x71/0x80 [spl]
[102082.763437]  [<ffffffffc036bf60>] ? __thread_exit+0x20/0x20 [spl]
[102082.763441]  [<ffffffff81093822>] kthread+0xd2/0xf0
[102082.763444]  [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0
[102082.763447]  [<ffffffff817b6d98>] ret_from_fork+0x58/0x90
[102082.763450]  [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0
[102082.763463] INFO: task txg_sync:7966 blocked for more than 120 seconds.
[102082.763497]       Tainted: P           OE  3.19.0-30-generic #34~14.04.1-Ubuntu
[102082.763532] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102082.763569] txg_sync        D ffff8800738731c8     0  7966      2 0x00000000
[102082.763572]  ffff8800738731c8 ffff881031df0000 0000000000013e80 ffff880073873fd8
[102082.763574]  0000000000013e80 ffff881038e94260 ffff881031df0000 ffff8800738731d8
[102082.763577]  ffff880d1a6ffba0 ffffffffc05c8890 ffff880d1a6ffba8 ffffffffc05c88b8
[102082.763580] Call Trace:
[102082.763585]  [<ffffffff817b2a99>] schedule+0x29/0x70
[102082.763591]  [<ffffffffc0370d15>] cv_wait_common+0xe5/0x120 [spl]
[102082.763594]  [<ffffffff810b4e30>] ? prepare_to_wait_event+0x110/0x110
[102082.763600]  [<ffffffffc0370d65>] __cv_wait+0x15/0x20 [spl]
[102082.763614]  [<ffffffffc04b26ac>] arc_read+0xbc/0xa50 [zfs]
[102082.763619]  [<ffffffff811d09ec>] ? __kmalloc_node+0xac/0x2e0
[102082.763648]  [<ffffffffc055caff>] ? zio_taskq_dispatch+0x8f/0xa0 [zfs]
[102082.763653]  [<ffffffffc03699c7>] ? spl_kmem_alloc+0xd7/0x190 [spl]
[102082.763658]  [<ffffffffc03699c7>] ? spl_kmem_alloc+0xd7/0x190 [spl]
[102082.763671]  [<ffffffffc04b2570>] ? arc_buf_remove_ref+0x140/0x140 [zfs]
[102082.763693]  [<ffffffffc04eefc6>] dsl_scan_visitbp+0x266/0xbf0 [zfs]
[102082.763714]  [<ffffffffc04ee8ca>] ? dsl_scan_prefetch.isra.5+0xba/0xc0 [zfs]
[102082.763733]  [<ffffffffc04ef3e4>] dsl_scan_visitbp+0x684/0xbf0 [zfs]
[102082.763752]  [<ffffffffc04ef0e1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
[102082.763771]  [<ffffffffc04ef0e1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
[102082.763790]  [<ffffffffc04ef0e1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
[102082.763808]  [<ffffffffc04ef0e1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
[102082.763827]  [<ffffffffc04ef0e1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
[102082.763845]  [<ffffffffc04ef0e1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
[102082.763848]  [<ffffffff811d09ec>] ? __kmalloc_node+0xac/0x2e0
[102082.763866]  [<ffffffffc04ef5db>] dsl_scan_visitbp+0x87b/0xbf0 [zfs]
[102082.763884]  [<ffffffffc04efa23>] dsl_scan_visitds+0xd3/0x4b0 [zfs]
[102082.763891]  [<ffffffffc0371bf0>] ? tsd_hash_search.isra.1+0x90/0xa0 [spl]
[102082.763909]  [<ffffffffc04f0ea8>] dsl_scan_sync+0x2a8/0xc50 [zfs]
[102082.763937]  [<ffffffffc055cdb1>] ? zio_destroy+0xd1/0xe0 [zfs]
[102082.763961]  [<ffffffffc0504e35>] spa_sync+0x3d5/0xb50 [zfs]
[102082.763964]  [<ffffffff810b4668>] ? __wake_up_common+0x58/0x90
[102082.763967]  [<ffffffff8101df59>] ? read_tsc+0x9/0x10
[102082.763992]  [<ffffffffc051732f>] txg_sync_thread+0x3bf/0x630 [zfs]
[102082.764016]  [<ffffffffc0516f70>] ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[102082.764022]  [<ffffffffc036bfd1>] thread_generic_wrapper+0x71/0x80 [spl]
[102082.764027]  [<ffffffffc036bf60>] ? __thread_exit+0x20/0x20 [spl]
[102082.764030]  [<ffffffff81093822>] kthread+0xd2/0xf0
[102082.764044]  [<ffffffffc04ba8b8>] ? dbuf_read+0x2a8/0x950 [zfs]
[102082.764070]  [<ffffffffc052b8b1>] ? zap_lockdir+0x61/0x920 [zfs]
[102082.764074]  [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0
[102082.764077]  [<ffffffff817b6d98>] ret_from_fork+0x58/0x90
[102082.764079]  [<ffffffff81093750>] ? kthread_create_on_node+0x1c0/0x1c0
[102082.764182] INFO: task htop:7341 blocked for more than 120 seconds.
[102082.764214]       Tainted: P           OE  3.19.0-30-generic #34~14.04.1-Ubuntu
[102082.764250] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102082.764287] htop            D ffff880fa8e4b708     0  7341   1402 0x00000000
[102082.764290]  ffff880fa8e4b708 ffff880fa6edb750 0000000000013e80 ffff880fa8e4bfd8
[102082.764293]  0000000000013e80 ffff881038e96ea0 ffff880fa6edb750 ffff88067bf7a228
[102082.764296]  ffff88067bf7a208 ffff88067bf7a20c ffff880fa6edb750 00000000ffffffff
[102082.764299] Call Trace:
[102082.764302]  [<ffffffff817b2fa9>] schedule_preempt_disabled+0x29/0x70
[102082.764306]  [<ffffffff817b4c95>] __mutex_lock_slowpath+0x95/0x100
[102082.764309]  [<ffffffff811cb891>] ? set_track+0x61/0x1a0
[102082.764311]  [<ffffffff817b4d23>] mutex_lock+0x23/0x37
[102082.764331]  [<ffffffffc04d3182>] dmu_zfetch+0x302/0xe80 [zfs]
[102082.764346]  [<ffffffffc04baded>] dbuf_read+0x7dd/0x950 [zfs]
[102082.764365]  [<ffffffffc04d4eba>] dnode_hold_impl+0xca/0x600 [zfs]
[102082.764383]  [<ffffffffc04d5409>] dnode_hold+0x19/0x20 [zfs]
[102082.764399]  [<ffffffffc04c39a3>] dmu_read+0x33/0x180 [zfs]
[102082.764421]  [<ffffffffc04fb361>] ? rrw_enter_read_impl+0xc1/0x170 [zfs]
[102082.764450]  [<ffffffffc054a51f>] zfs_getpage+0x11f/0x1e0 [zfs]
[102082.764479]  [<ffffffffc0566230>] ? zpl_writepages+0x180/0x180 [zfs]
[102082.764505]  [<ffffffffc056627c>] zpl_readpage+0x4c/0x90 [zfs]
[102082.764509]  [<ffffffff8118263a>] read_cache_pages+0xba/0x120
[102082.764535]  [<ffffffffc05660ae>] zpl_readpages+0x1e/0x20 [zfs]
[102082.764538]  [<ffffffff8118282e>] __do_page_cache_readahead+0x18e/0x220
[102082.764541]  [<ffffffff81178777>] filemap_fault+0x397/0x430
[102082.764546]  [<ffffffff811a3ac6>] ? do_set_pte+0xf6/0x120
[102082.764548]  [<ffffffff811a1339>] __do_fault+0x39/0x90
[102082.764550]  [<ffffffff811a3cc7>] do_read_fault.isra.55+0x1d7/0x2f0
[102082.764554]  [<ffffffff811a5884>] handle_mm_fault+0x604/0x10e0
[102082.764558]  [<ffffffff812014b6>] ? core_sys_select+0x236/0x2e0
[102082.764562]  [<ffffffff81061e94>] __do_page_fault+0x1c4/0x5b0
[102082.764567]  [<ffffffff810a5318>] ? __enqueue_entity+0x78/0x80
[102082.764570]  [<ffffffff810a725d>] ? set_next_entity+0x9d/0xb0
[102082.764574]  [<ffffffff810af6ab>] ? pick_next_task_fair+0x61b/0x880
[102082.764577]  [<ffffffff8101df59>] ? read_tsc+0x9/0x10
[102082.764582]  [<ffffffff810e2b0e>] ? ktime_get_ts64+0x4e/0xf0
[102082.764585]  [<ffffffff812007fc>] ? poll_select_copy_remaining+0xdc/0x130
[102082.764587]  [<ffffffff810622b1>] do_page_fault+0x31/0x70
[102082.764590]  [<ffffffff817b8fa8>] page_fault+0x28/0x30
[102082.764598] INFO: task watch:20777 blocked for more than 120 seconds.
[102082.764632]       Tainted: P           OE  3.19.0-30-generic #34~14.04.1-Ubuntu
[102082.764667] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102082.764704] watch           D ffff8816b4eff708     0 20777  11910 0x00000000
[102082.764707]  ffff8816b4eff708 ffff88131ef04d70 0000000000013e80 ffff8816b4efffd8
[102082.764710]  0000000000013e80 ffff881038e96ea0 ffff88131ef04d70 ffff88067bf7a228
[102082.764713]  ffff88067bf7a208 ffff88067bf7a20c ffff88131ef04d70 00000000ffffffff
[102082.764716] Call Trace:
[102082.764719]  [<ffffffff817b2fa9>] schedule_preempt_disabled+0x29/0x70
[102082.764721]  [<ffffffff817b4c95>] __mutex_lock_slowpath+0x95/0x100
[102082.764724]  [<ffffffff811cb891>] ? set_track+0x61/0x1a0
[102082.764727]  [<ffffffff817b4d23>] mutex_lock+0x23/0x37
[102082.764746]  [<ffffffffc04d3182>] dmu_zfetch+0x302/0xe80 [zfs]
[102082.764762]  [<ffffffffc04baded>] dbuf_read+0x7dd/0x950 [zfs]
[102082.764780]  [<ffffffffc04d4eba>] dnode_hold_impl+0xca/0x600 [zfs]
[102082.764797]  [<ffffffffc04d5409>] dnode_hold+0x19/0x20 [zfs]
[102082.764813]  [<ffffffffc04c39a3>] dmu_read+0x33/0x180 [zfs]
[102082.764835]  [<ffffffffc04fb361>] ? rrw_enter_read_impl+0xc1/0x170 [zfs]
[102082.764863]  [<ffffffffc054a51f>] zfs_getpage+0x11f/0x1e0 [zfs]
[102082.764890]  [<ffffffffc0566230>] ? zpl_writepages+0x180/0x180 [zfs]
[102082.764916]  [<ffffffffc056627c>] zpl_readpage+0x4c/0x90 [zfs]
[102082.764919]  [<ffffffff8118263a>] read_cache_pages+0xba/0x120
[102082.764945]  [<ffffffffc05660ae>] zpl_readpages+0x1e/0x20 [zfs]
[102082.764947]  [<ffffffff8118282e>] __do_page_cache_readahead+0x18e/0x220
[102082.764950]  [<ffffffff81178777>] filemap_fault+0x397/0x430
[102082.764952]  [<ffffffff811a3ac6>] ? do_set_pte+0xf6/0x120
[102082.764955]  [<ffffffff811a1339>] __do_fault+0x39/0x90
[102082.764957]  [<ffffffff811a3cc7>] do_read_fault.isra.55+0x1d7/0x2f0
[102082.764961]  [<ffffffff811cf193>] ? kmem_cache_alloc+0x1c3/0x220
[102082.764963]  [<ffffffff811a5884>] handle_mm_fault+0x604/0x10e0
[102082.764985]  [<ffffffffc04fb68c>] ? rrw_exit+0x6c/0x180 [zfs]
[102082.764989]  [<ffffffff81061e94>] __do_page_fault+0x1c4/0x5b0
[102082.764994]  [<ffffffff8110b1a2>] ? from_kgid_munged+0x12/0x20
[102082.764997]  [<ffffffff811f175f>] ? SYSC_newstat+0x2f/0x40
[102082.765000]  [<ffffffff810622b1>] do_page_fault+0x31/0x70
[102082.765002]  [<ffffffff817b8fa8>] page_fault+0x28/0x30
[102082.765004] INFO: task watch:22521 blocked for more than 120 seconds.
[102082.765038]       Tainted: P           OE  3.19.0-30-generic #34~14.04.1-Ubuntu
[102082.765073] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102082.765110] watch           D ffff8816ba96f708     0 22521  21243 0x00000000
[102082.765112]  ffff8816ba96f708 ffff881b63769620 0000000000013e80 ffff8816ba96ffd8
[102082.765115]  0000000000013e80 ffff881038e91620 ffff881b63769620 ffff88067bf7a228
[102082.765117]  ffff88067bf7a208 ffff88067bf7a20c ffff881b63769620 00000000ffffffff
[102082.765120] Call Trace:
[102082.765123]  [<ffffffff817b2fa9>] schedule_preempt_disabled+0x29/0x70
[102082.765126]  [<ffffffff817b4c95>] __mutex_lock_slowpath+0x95/0x100
[102082.765129]  [<ffffffff811cb891>] ? set_track+0x61/0x1a0
[102082.765131]  [<ffffffff817b4d23>] mutex_lock+0x23/0x37
[102082.765150]  [<ffffffffc04d3182>] dmu_zfetch+0x302/0xe80 [zfs]
[102082.765165]  [<ffffffffc04baded>] dbuf_read+0x7dd/0x950 [zfs]
[102082.765184]  [<ffffffffc04d4eba>] dnode_hold_impl+0xca/0x600 [zfs]
[102082.765201]  [<ffffffffc04d5409>] dnode_hold+0x19/0x20 [zfs]
[102082.765217]  [<ffffffffc04c39a3>] dmu_read+0x33/0x180 [zfs]
[102082.765238]  [<ffffffffc04fb361>] ? rrw_enter_read_impl+0xc1/0x170 [zfs]
[102082.765267]  [<ffffffffc054a51f>] zfs_getpage+0x11f/0x1e0 [zfs]
[102082.765295]  [<ffffffffc0566230>] ? zpl_writepages+0x180/0x180 [zfs]
[102082.765322]  [<ffffffffc056627c>] zpl_readpage+0x4c/0x90 [zfs]
[102082.765325]  [<ffffffff8118263a>] read_cache_pages+0xba/0x120
[102082.765351]  [<ffffffffc05660ae>] zpl_readpages+0x1e/0x20 [zfs]
[102082.765353]  [<ffffffff8118282e>] __do_page_cache_readahead+0x18e/0x220
[102082.765357]  [<ffffffff81178777>] filemap_fault+0x397/0x430
[102082.765359]  [<ffffffff811a3ac6>] ? do_set_pte+0xf6/0x120
[102082.765361]  [<ffffffff811a1339>] __do_fault+0x39/0x90
[102082.765364]  [<ffffffff811a3cc7>] do_read_fault.isra.55+0x1d7/0x2f0
[102082.765366]  [<ffffffff811a5884>] handle_mm_fault+0x604/0x10e0
[102082.765388]  [<ffffffffc04fb68c>] ? rrw_exit+0x6c/0x180 [zfs]
[102082.765391]  [<ffffffff81061e94>] __do_page_fault+0x1c4/0x5b0
[102082.765394]  [<ffffffff8110b1a2>] ? from_kgid_munged+0x12/0x20
[102082.765397]  [<ffffffff811f175f>] ? SYSC_newstat+0x2f/0x40
[102082.765400]  [<ffffffff810622b1>] do_page_fault+0x31/0x70
[102082.765402]  [<ffffffff817b8fa8>] page_fault+0x28/0x30
[102082.765413] INFO: task cron:17910 blocked for more than 120 seconds.
[102082.765444]       Tainted: P           OE  3.19.0-30-generic #34~14.04.1-Ubuntu
[102082.765478] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[102082.765515] cron            D ffff880c506c7738     0 17910  12284 0x00000000
[102082.765518]  ffff880c506c7738 ffff880fb58f6390 0000000000013e80 ffff880c506c7fd8
[102082.765521]  0000000000013e80 ffff881038ee6390 ffff880fb58f6390 000000003f803480
[102082.765524]  ffff88067bf7a208 ffff88067bf7a20c ffff880fb58f6390 00000000ffffffff
[102082.765527] Call Trace:
[102082.765530]  [<ffffffff817b2fa9>] schedule_preempt_disabled+0x29/0x70
[102082.765533]  [<ffffffff817b4c95>] __mutex_lock_slowpath+0x95/0x100
[102082.765536]  [<ffffffff811cb891>] ? set_track+0x61/0x1a0
[102082.765538]  [<ffffffff817b4d23>] mutex_lock+0x23/0x37
[102082.765558]  [<ffffffffc04d3182>] dmu_zfetch+0x302/0xe80 [zfs]
[102082.765572]  [<ffffffffc04baded>] dbuf_read+0x7dd/0x950 [zfs]
[102082.765590]  [<ffffffffc04d4eba>] dnode_hold_impl+0xca/0x600 [zfs]
[102082.765607]  [<ffffffffc04d5409>] dnode_hold+0x19/0x20 [zfs]
[102082.765622]  [<ffffffffc04c39a3>] dmu_read+0x33/0x180 [zfs]
[102082.765644]  [<ffffffffc04fb361>] ? rrw_enter_read_impl+0xc1/0x170 [zfs]
[102082.765671]  [<ffffffffc054a51f>] zfs_getpage+0x11f/0x1e0 [zfs]
[102082.765699]  [<ffffffffc0566230>] ? zpl_writepages+0x180/0x180 [zfs]
[102082.765726]  [<ffffffffc056627c>] zpl_readpage+0x4c/0x90 [zfs]
[102082.765728]  [<ffffffff8118263a>] read_cache_pages+0xba/0x120
[102082.765754]  [<ffffffffc05660ae>] zpl_readpages+0x1e/0x20 [zfs]
[102082.765757]  [<ffffffff8118282e>] __do_page_cache_readahead+0x18e/0x220
[102082.765760]  [<ffffffff81178777>] filemap_fault+0x397/0x430
[102082.765763]  [<ffffffff811a1339>] __do_fault+0x39/0x90
[102082.765765]  [<ffffffff811a3e72>] do_cow_fault+0x92/0x220
[102082.765768]  [<ffffffff811a56f4>] handle_mm_fault+0x474/0x10e0
[102082.765771]  [<ffffffff811aaf3f>] ? vma_set_page_prot+0x3f/0x60
[102082.765773]  [<ffffffff811ac101>] ? mmap_region+0x1b1/0x620
[102082.765776]  [<ffffffff81061e94>] __do_page_fault+0x1c4/0x5b0
[102082.765780]  [<ffffffff81192139>] ? vm_mmap_pgoff+0x99/0xc0
[102082.765782]  [<ffffffff810622b1>] do_page_fault+0x31/0x70
[102082.765784]  [<ffffffff817b8fa8>] page_fault+0x28/0x30

@dweeezil
Copy link
Contributor

Dynamic taskqs have been implicated in these types of hangs. To that end, they've been disabled by default in the recently tagged 0.6.5.3 version of SPL. @dasjoe In the mean time, if you can reproduce this fairly reliably, could you try setting spl_taskq_thread_dynamic=0 at module load time (it won't quite have the desired effect it it's set after the fact) and see if the problem goes away.

My current theory is that a deadlock condition similar to that which precipitated openzfs/spl@076821e can occur when dynamic taskqs are enabled. These and related hangs appear to be waiting for zio which never completes and I think it's never completing because it's never launched due to dynamic taskq. I'll also note that even though Illumos does support dynamic taskqs, it does not use them for zfs taskqs.

I'm going to try to add some kstats to allow visibility to the taskqs which ought to help find the root cause of these problems. According to my theory, we'll see a taskq with some pending sequential tasks on which the queue's currently-running task depends for its exit condition.

Another interesting experiment would be to disable dynamic taskq behavior for ``taskq_dispatch_ent()` which is used for for the zio operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Inactive Not being actively updated
Projects
None yet
Development

No branches or pull requests

5 participants
@behlendorf @dasjoe @dweeezil @Pentium100MHz and others