Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task txg_quiesce:4004 blocked for more than 120 seconds #10370

Closed
sayap opened this issue May 26, 2020 · 4 comments
Closed

task txg_quiesce:4004 blocked for more than 120 seconds #10370

sayap opened this issue May 26, 2020 · 4 comments

Comments

@sayap
Copy link

sayap commented May 26, 2020

System information

Type Version/Name
Distribution Name Debian
Distribution Version 9
Linux Kernel 4.9.144-3.1
Architecture amd64
ZFS Version 0.8.1
SPL Version 0.8.1

Describe the problem you're observing

We started using zfs on one of our mysql clusters (with percona-xtradb-cluster) since Aug 2019, and it had been working fine for months. Then, on Feb 19th 2020, the mysqld process hung on one of the nodes, and eventually killed itself. From dmesg, we saw the message "task txg_quiesce:4004 blocked for more than 120 seconds", along with a bunch of other blocked mysqld tasks.

Describe how to reproduce the problem

Problem only happened once, when the mysql traffic was relatively high, due to an optimized cron job that did a bunch of heavy SELECT with sub-optimal index (high reads) and batch UPDATE (high writes) in a loop.

We disabled the cron job after the incident, and the problem didn't happen again in the last 3+ months.

Include any warning/errors/backtraces from the system logs

From top -H -p <mysql pid>, the txg_quiesce thread was in D state:

  4004 root      20   0       0      0      0 D   0.0  0.0   5:01.07 txg_quiesce

From dmesg (all logged in the same second):

INFO: task txg_quiesce:4004 blocked for more than 120 seconds.
      Tainted: P           O    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_quiesce     D    0  4004      2 0x00000000
 ffff95fb576df9c0 0000000000000000 ffff96b9401ec0c0 ffff953bbe9d8980
 ffff953b5b17a400 ffffa7996a227d68 ffffffff998144b9 ffff9479db70e480
 00ffa7996a227d28 ffff953bbe9d8980 ffffa7996a227d88 ffff96b9401ec0c0
Call Trace:
 [<ffffffff998144b9>] ? __schedule+0x239/0x6f0
 [<ffffffff998149a2>] ? schedule+0x32/0x80
 [<ffffffffc1b872af>] ? cv_wait_common+0x11f/0x140 [spl]
 [<ffffffff992bd350>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffffc1658ab6>] ? txg_quiesce_thread+0x2a6/0x390 [zfs]
 [<ffffffffc1658810>] ? txg_do_callbacks+0x30/0x30 [zfs]
 [<ffffffffc1b8e5b0>] ? __thread_exit+0x20/0x20 [spl]
 [<ffffffffc1b8e61f>] ? thread_generic_wrapper+0x6f/0x80 [spl]
 [<ffffffff9929a5d9>] ? kthread+0xd9/0xf0
 [<ffffffff99819364>] ? __switch_to_asm+0x34/0x70
 [<ffffffff9929a500>] ? kthread_park+0x60/0x60
 [<ffffffff998193f7>] ? ret_from_fork+0x57/0x70

INFO: task mysqld:321513 blocked for more than 120 seconds.
      Tainted: P           O    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld          D    0 321513 304666 0x00000000
 ffff95fb576df9c0 0000000000000000 ffff953700fc69c0 ffff947bbfb18980
 ffff947b4954a840 ffffa799bee0f980 ffffffff998144b9 ffffa799bee0f930
 00ffa799bee0f940 ffff947bbfb18980 80abe237812dc2f9 ffff953700fc69c0
Call Trace:
 [<ffffffff998144b9>] ? __schedule+0x239/0x6f0
 [<ffffffff998149a2>] ? schedule+0x32/0x80
 [<ffffffff99817330>] ? rwsem_down_read_failed+0xf0/0x150
 [<ffffffff99542f24>] ? call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff99816bdc>] ? down_read+0x1c/0x30
 [<ffffffffc15fdb7a>] ? dmu_zfetch+0x9a/0x560 [zfs]
 [<ffffffffc15e8724>] ? dmu_buf_hold_array_by_dnode+0x414/0x470 [zfs]
 [<ffffffffc15e9c30>] ? dmu_read_uio_dnode+0x50/0xf0 [zfs]
 [<ffffffffc16a3562>] ? rangelock_enter+0x292/0x540 [zfs]
 [<ffffffffc15e9d14>] ? dmu_read_uio_dbuf+0x44/0x60 [zfs]
 [<ffffffffc16a9e45>] ? zfs_read+0x135/0x460 [zfs]
 [<ffffffffc16cf12b>] ? zpl_read_common_iovec+0x9b/0xe0 [zfs]
 [<ffffffffc16cf522>] ? zpl_iter_read+0x102/0x170 [zfs]
 [<ffffffff9945bcee>] ? aio_read+0xde/0x120
 [<ffffffff992fd415>] ? do_futex+0x2c5/0xb60
 [<ffffffff993e8b7c>] ? kmem_cache_alloc+0xbc/0x530
 [<ffffffff9945ce01>] ? do_io_submit+0x4d1/0x620
 [<ffffffff99203b7d>] ? do_syscall_64+0x8d/0xf0
 [<ffffffff9981924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

INFO: task mysqld:321519 blocked for more than 120 seconds.
      Tainted: P           O    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld          D    0 321519 304666 0x00000000
 ffff95fb576df9c0 0000000000000000 ffff9537a2380900 ffff95fbbeb58980
 ffff95fb5b17e580 ffffa79b12bcf800 ffffffff998144b9 0000000000000286
 00ff953ec342f9d0 ffff95fbbeb58980 ffffffff9981673e ffff9537a2380900
Call Trace:
 [<ffffffff998144b9>] ? __schedule+0x239/0x6f0
 [<ffffffff9981673e>] ? mutex_lock+0xe/0x30
 [<ffffffff998149a2>] ? schedule+0x32/0x80
 [<ffffffff99817330>] ? rwsem_down_read_failed+0xf0/0x150
 [<ffffffffc15dd6d3>] ? dbuf_rele_and_unlock+0x283/0x5f0 [zfs]
 [<ffffffff99542f24>] ? call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff99816bdc>] ? down_read+0x1c/0x30
 [<ffffffffc15fdb7a>] ? dmu_zfetch+0x9a/0x560 [zfs]
 [<ffffffffc15e8724>] ? dmu_buf_hold_array_by_dnode+0x414/0x470 [zfs]
 [<ffffffffc15e9e16>] ? dmu_write_uio_dnode+0x56/0x140 [zfs]
 [<ffffffffc1658636>] ? txg_rele_to_quiesce+0x26/0x40 [zfs]
 [<ffffffffc15e9f4c>] ? dmu_write_uio_dbuf+0x4c/0x70 [zfs]
 [<ffffffffc16b22e6>] ? zfs_write+0xc46/0xdd0 [zfs]
 [<ffffffff9981673e>] ? mutex_lock+0xe/0x30
 [<ffffffff992b3142>] ? enqueue_task_fair+0x82/0x940
 [<ffffffff9922f8a5>] ? sched_clock+0x5/0x10
 [<ffffffff992a49fe>] ? check_preempt_curr+0x4e/0x90
 [<ffffffff992a5674>] ? try_to_wake_up+0x54/0x3c0
 [<ffffffffc16cf21b>] ? zpl_write_common_iovec+0xab/0x100 [zfs]
 [<ffffffffc16cf3e9>] ? zpl_iter_write+0xf9/0x130 [zfs]
 [<ffffffff9945bbbb>] ? aio_write+0xfb/0x150
 [<ffffffff993e8b7c>] ? kmem_cache_alloc+0xbc/0x530
 [<ffffffff9945cbe9>] ? do_io_submit+0x2b9/0x620
 [<ffffffffc16b143c>] ? zfs_fsync+0x8c/0xe0 [zfs]
 [<ffffffff99203b7d>] ? do_syscall_64+0x8d/0xf0
 [<ffffffff9981924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

INFO: task mysqld:187144 blocked for more than 120 seconds.
      Tainted: P           O    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld          D    0 187144 304666 0x00000000
 ffff95fb576df9c0 0000000000000000 ffff96474f310740 ffff947bbfcd8980
 ffff947b5bef85c0 ffffa799e1987a70 ffffffff998144b9 0000000000ae9c99
 001fb145ad54b23b ffff947bbfcd8980 ffffa799e1987b40 ffff96474f310740
Call Trace:
 [<ffffffff998144b9>] ? __schedule+0x239/0x6f0
 [<ffffffff998149a2>] ? schedule+0x32/0x80
 [<ffffffff99817330>] ? rwsem_down_read_failed+0xf0/0x150
 [<ffffffff99542f24>] ? call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff99816bdc>] ? down_read+0x1c/0x30
 [<ffffffffc15fdb7a>] ? dmu_zfetch+0x9a/0x560 [zfs]
 [<ffffffffc15e8724>] ? dmu_buf_hold_array_by_dnode+0x414/0x470 [zfs]
 [<ffffffffc15e9c30>] ? dmu_read_uio_dnode+0x50/0xf0 [zfs]
 [<ffffffffc16a3562>] ? rangelock_enter+0x292/0x540 [zfs]
 [<ffffffffc15e9d14>] ? dmu_read_uio_dbuf+0x44/0x60 [zfs]
 [<ffffffffc16a9e45>] ? zfs_read+0x135/0x460 [zfs]
 [<ffffffffc16cf12b>] ? zpl_read_common_iovec+0x9b/0xe0 [zfs]
 [<ffffffffc16cf522>] ? zpl_iter_read+0x102/0x170 [zfs]
 [<ffffffff9940aacd>] ? new_sync_read+0xdd/0x130
 [<ffffffff9940b261>] ? vfs_read+0x91/0x130
 [<ffffffff9940c8f0>] ? SyS_pread64+0x90/0xb0
 [<ffffffff99203b7d>] ? do_syscall_64+0x8d/0xf0
 [<ffffffff9981924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

INFO: task mysqld:199650 blocked for more than 120 seconds.
      Tainted: P           O    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld          D    0 199650 304666 0x00000000
 ffff95fb576df9c0 0000000000000000 ffff95d188b4a600 ffff953bbe6d8980
 ffff953b5b120100 ffffa7998934fa70 ffffffff998144b9 0000000000ae9c6d
 0046221d124651ea ffff953bbe6d8980 ffffa7998934fb40 ffff95d188b4a600
Call Trace:
 [<ffffffff998144b9>] ? __schedule+0x239/0x6f0
 [<ffffffff998149a2>] ? schedule+0x32/0x80
 [<ffffffff99817330>] ? rwsem_down_read_failed+0xf0/0x150
 [<ffffffff99542f24>] ? call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff99816bdc>] ? down_read+0x1c/0x30
 [<ffffffffc15fdb7a>] ? dmu_zfetch+0x9a/0x560 [zfs]
 [<ffffffffc15e8724>] ? dmu_buf_hold_array_by_dnode+0x414/0x470 [zfs]
 [<ffffffffc15e9c30>] ? dmu_read_uio_dnode+0x50/0xf0 [zfs]
 [<ffffffffc16a3562>] ? rangelock_enter+0x292/0x540 [zfs]
 [<ffffffffc15e9d14>] ? dmu_read_uio_dbuf+0x44/0x60 [zfs]
 [<ffffffffc16a9e45>] ? zfs_read+0x135/0x460 [zfs]
 [<ffffffffc16cf12b>] ? zpl_read_common_iovec+0x9b/0xe0 [zfs]
 [<ffffffffc16cf522>] ? zpl_iter_read+0x102/0x170 [zfs]
 [<ffffffff9940aacd>] ? new_sync_read+0xdd/0x130
 [<ffffffff9940b261>] ? vfs_read+0x91/0x130
 [<ffffffff9940c8f0>] ? SyS_pread64+0x90/0xb0
 [<ffffffff99203b7d>] ? do_syscall_64+0x8d/0xf0
 [<ffffffff9981924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

INFO: task mysqld:199698 blocked for more than 120 seconds.
      Tainted: P           O    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld          D    0 199698 304666 0x00000000
 ffff95fb576df9c0 0000000000000000 ffff95390e32e340 ffff947bbfbd8980
 ffff947b5bef04c0 ffffa79989c3ba70 ffffffff998144b9 0000000000ae9c6d
 00fcea487af05f1b ffff947bbfbd8980 ffffa79989c3bb40 ffff95390e32e340
Call Trace:
 [<ffffffff998144b9>] ? __schedule+0x239/0x6f0
 [<ffffffff998149a2>] ? schedule+0x32/0x80
 [<ffffffff99817330>] ? rwsem_down_read_failed+0xf0/0x150
 [<ffffffff99542f24>] ? call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff99816bdc>] ? down_read+0x1c/0x30
 [<ffffffffc15fdb7a>] ? dmu_zfetch+0x9a/0x560 [zfs]
 [<ffffffffc15e8724>] ? dmu_buf_hold_array_by_dnode+0x414/0x470 [zfs]
 [<ffffffffc15e9c30>] ? dmu_read_uio_dnode+0x50/0xf0 [zfs]
 [<ffffffffc16a3562>] ? rangelock_enter+0x292/0x540 [zfs]
 [<ffffffffc15e9d14>] ? dmu_read_uio_dbuf+0x44/0x60 [zfs]
 [<ffffffffc16a9e45>] ? zfs_read+0x135/0x460 [zfs]
 [<ffffffffc16cf12b>] ? zpl_read_common_iovec+0x9b/0xe0 [zfs]
 [<ffffffffc16cf522>] ? zpl_iter_read+0x102/0x170 [zfs]
 [<ffffffff9940aacd>] ? new_sync_read+0xdd/0x130
 [<ffffffff9940b261>] ? vfs_read+0x91/0x130
 [<ffffffff9940c8f0>] ? SyS_pread64+0x90/0xb0
 [<ffffffff99203b7d>] ? do_syscall_64+0x8d/0xf0
 [<ffffffff9981924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

INFO: task mysqld:201071 blocked for more than 120 seconds.
      Tainted: P           O    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld          D    0 201071 304666 0x00000000
 ffff95fb576df9c0 ffff95fb576df9c0 ffff94402d636680 ffff953bbed58980
 ffff9539bf6ea2c0 ffffa799ab39ba70 ffffffff998144b9 0000000000ae9c6d
 00281a34d5ad96c9 ffff953bbed58980 ffffa799ab39bb40 ffff94402d636680
Call Trace:
 [<ffffffff998144b9>] ? __schedule+0x239/0x6f0
 [<ffffffff998149a2>] ? schedule+0x32/0x80
 [<ffffffff99817330>] ? rwsem_down_read_failed+0xf0/0x150
 [<ffffffff99542f24>] ? call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff99816bdc>] ? down_read+0x1c/0x30
 [<ffffffffc15fdb7a>] ? dmu_zfetch+0x9a/0x560 [zfs]
 [<ffffffffc15e8724>] ? dmu_buf_hold_array_by_dnode+0x414/0x470 [zfs]
 [<ffffffffc15e9c30>] ? dmu_read_uio_dnode+0x50/0xf0 [zfs]
 [<ffffffffc16a3562>] ? rangelock_enter+0x292/0x540 [zfs]
 [<ffffffffc15e9d14>] ? dmu_read_uio_dbuf+0x44/0x60 [zfs]
 [<ffffffffc16a9e45>] ? zfs_read+0x135/0x460 [zfs]
 [<ffffffffc16cf12b>] ? zpl_read_common_iovec+0x9b/0xe0 [zfs]
 [<ffffffffc16cf522>] ? zpl_iter_read+0x102/0x170 [zfs]
 [<ffffffff9940aacd>] ? new_sync_read+0xdd/0x130
 [<ffffffff9940b261>] ? vfs_read+0x91/0x130
 [<ffffffff9940c8f0>] ? SyS_pread64+0x90/0xb0
 [<ffffffff99203b7d>] ? do_syscall_64+0x8d/0xf0
 [<ffffffff9981924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

INFO: task mysqld:203083 blocked for more than 120 seconds.
      Tainted: P           O    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld          D    0 203083 304666 0x00000000
 ffff95fb576df9c0 0000000000000000 ffff953a5a5627c0 ffff953bbe858980
 ffff953b5b14c280 ffffa79761fbba70 ffffffff998144b9 0000000000ae9c6d
 003bc76da643a8c4 ffff953bbe858980 ffffa79761fbbb40 ffff953a5a5627c0
Call Trace:
 [<ffffffff998144b9>] ? __schedule+0x239/0x6f0
 [<ffffffff998149a2>] ? schedule+0x32/0x80
 [<ffffffff99817330>] ? rwsem_down_read_failed+0xf0/0x150
 [<ffffffff99542f24>] ? call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff99816bdc>] ? down_read+0x1c/0x30
 [<ffffffffc15fdb7a>] ? dmu_zfetch+0x9a/0x560 [zfs]
 [<ffffffffc15e8724>] ? dmu_buf_hold_array_by_dnode+0x414/0x470 [zfs]
 [<ffffffffc15e9c30>] ? dmu_read_uio_dnode+0x50/0xf0 [zfs]
 [<ffffffffc16a3562>] ? rangelock_enter+0x292/0x540 [zfs]
 [<ffffffffc15e9d14>] ? dmu_read_uio_dbuf+0x44/0x60 [zfs]
 [<ffffffffc16a9e45>] ? zfs_read+0x135/0x460 [zfs]
 [<ffffffffc16cf12b>] ? zpl_read_common_iovec+0x9b/0xe0 [zfs]
 [<ffffffffc16cf522>] ? zpl_iter_read+0x102/0x170 [zfs]
 [<ffffffff9940aacd>] ? new_sync_read+0xdd/0x130
 [<ffffffff9940b261>] ? vfs_read+0x91/0x130
 [<ffffffff9940c8f0>] ? SyS_pread64+0x90/0xb0
 [<ffffffff99203b7d>] ? do_syscall_64+0x8d/0xf0
 [<ffffffff9981924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

INFO: task mysqld:204097 blocked for more than 120 seconds.
      Tainted: P           O    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld          D    0 204097 304666 0x00000000
 ffff95fb576df9c0 0000000000000000 ffff9400598b0000 ffff953bbed18980
 ffff953b5b1e4740 ffffa79763a8fa70 ffffffff998144b9 0000000000ae9c73
 00ee2afa50a5e8d1 ffff953bbed18980 ffffa79763a8fb40 ffff9400598b0000
Call Trace:
 [<ffffffff998144b9>] ? __schedule+0x239/0x6f0
 [<ffffffff998149a2>] ? schedule+0x32/0x80
 [<ffffffff99817330>] ? rwsem_down_read_failed+0xf0/0x150
 [<ffffffff99542f24>] ? call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff99816bdc>] ? down_read+0x1c/0x30
 [<ffffffffc15fdb7a>] ? dmu_zfetch+0x9a/0x560 [zfs]
 [<ffffffffc15e8724>] ? dmu_buf_hold_array_by_dnode+0x414/0x470 [zfs]
 [<ffffffffc15e9c30>] ? dmu_read_uio_dnode+0x50/0xf0 [zfs]
 [<ffffffffc16a3562>] ? rangelock_enter+0x292/0x540 [zfs]
 [<ffffffffc15e9d14>] ? dmu_read_uio_dbuf+0x44/0x60 [zfs]
 [<ffffffffc16a9e45>] ? zfs_read+0x135/0x460 [zfs]
 [<ffffffffc16cf12b>] ? zpl_read_common_iovec+0x9b/0xe0 [zfs]
 [<ffffffffc16cf522>] ? zpl_iter_read+0x102/0x170 [zfs]
 [<ffffffff9940aacd>] ? new_sync_read+0xdd/0x130
 [<ffffffff9940b261>] ? vfs_read+0x91/0x130
 [<ffffffff9940c8f0>] ? SyS_pread64+0x90/0xb0
 [<ffffffff99203b7d>] ? do_syscall_64+0x8d/0xf0
 [<ffffffff9981924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

INFO: task mysqld:206855 blocked for more than 120 seconds.
      Tainted: P           O    4.9.0-8-amd64 #1 Debian 4.9.144-3.1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld          D    0 206855 304666 0x00000000
 ffff95fb576df9c0 0000000000000000 ffff957555704140 ffff95fbbed18980
 ffff95fb459961c0 ffffa799b095ba70 ffffffff998144b9 0000000000ae9c6d
 00a5eeb00a2feeaf ffff95fbbed18980 ffffa799b095bb40 ffff957555704140
Call Trace:
 [<ffffffff998144b9>] ? __schedule+0x239/0x6f0
 [<ffffffff998149a2>] ? schedule+0x32/0x80
 [<ffffffff99817330>] ? rwsem_down_read_failed+0xf0/0x150
 [<ffffffff99542f24>] ? call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff99816bdc>] ? down_read+0x1c/0x30
 [<ffffffffc15fdb7a>] ? dmu_zfetch+0x9a/0x560 [zfs]
 [<ffffffffc15e8724>] ? dmu_buf_hold_array_by_dnode+0x414/0x470 [zfs]
 [<ffffffffc15e9c30>] ? dmu_read_uio_dnode+0x50/0xf0 [zfs]
 [<ffffffffc16a3562>] ? rangelock_enter+0x292/0x540 [zfs]
 [<ffffffffc15e9d14>] ? dmu_read_uio_dbuf+0x44/0x60 [zfs]
 [<ffffffffc16a9e45>] ? zfs_read+0x135/0x460 [zfs]
 [<ffffffffc16cf12b>] ? zpl_read_common_iovec+0x9b/0xe0 [zfs]
 [<ffffffffc16cf522>] ? zpl_iter_read+0x102/0x170 [zfs]
 [<ffffffff9940aacd>] ? new_sync_read+0xdd/0x130
 [<ffffffff9940b261>] ? vfs_read+0x91/0x130
 [<ffffffff9940c8f0>] ? SyS_pread64+0x90/0xb0
 [<ffffffff99203b7d>] ? do_syscall_64+0x8d/0xf0
 [<ffffffff9981924e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

Other information

The trace looks similar to #7924 (comment)

Also, we are using a bunch of sm863a disks in a 10 x 2 striped mirror setup. The datasets look like this:

zfs set compression=lz4 ssd
zfs set xattr=sa ssd
zfs set atime=off ssd

zfs create ssd/mysql

zfs create ssd/mysql/data
zfs set recordsize=16k ssd/mysql/data
zfs set primarycache=metadata ssd/mysql/data

zfs create ssd/mysql/log
zfs set recordsize=128k ssd/mysql/log

zfs create ssd/mysql/tmp
zfs set recordsize=128k ssd/mysql/tmp
zfs set sync=disabled ssd/mysql/tmp

We did consciously create the pool with the controversial ashift=9 to get a much better compression ratio, after seeing no performance loss in load testing. Some more context in https://www.reddit.com/r/zfs/comments/cl3gr4/confused_about_conventional_wisdom_on_running/

@adamdmoss
Copy link
Contributor

If it helps, I've only seen this on failing disks. The one time I saw this and thought my disk wasn't failing, I was wrong. ;)

(Alternatively it could be a controller stall of some sort...)

Either way I'd suggest looking for an issue at a layer lower than ZFS to start with.

@sayap
Copy link
Author

sayap commented May 27, 2020

Thanks @adamdmoss for the suggestion. There was nothing in dmesg / zpool events, will check smartctl.

@sayap
Copy link
Author

sayap commented May 27, 2020

Trace looks similar to #10186, which also has mysqld as victim.

@sayap
Copy link
Author

sayap commented Feb 1, 2021

Most likely related to #11527, as here txg_quiesce also got stuck in D state when under heavy I/O. Closing this.

@sayap sayap closed this as completed Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants