You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We started using zfs on one of our mysql clusters (with percona-xtradb-cluster) since Aug 2019, and it had been working fine for months. Then, on Feb 19th 2020, the mysqld process hung on one of the nodes, and eventually killed itself. From dmesg, we saw the message "task txg_quiesce:4004 blocked for more than 120 seconds", along with a bunch of other blocked mysqld tasks.
Describe how to reproduce the problem
Problem only happened once, when the mysql traffic was relatively high, due to an optimized cron job that did a bunch of heavy SELECT with sub-optimal index (high reads) and batch UPDATE (high writes) in a loop.
We disabled the cron job after the incident, and the problem didn't happen again in the last 3+ months.
Include any warning/errors/backtraces from the system logs
From top -H -p <mysql pid>, the txg_quiesce thread was in D state:
System information
Describe the problem you're observing
We started using zfs on one of our mysql clusters (with percona-xtradb-cluster) since Aug 2019, and it had been working fine for months. Then, on Feb 19th 2020, the mysqld process hung on one of the nodes, and eventually killed itself. From dmesg, we saw the message "task txg_quiesce:4004 blocked for more than 120 seconds", along with a bunch of other blocked mysqld tasks.
Describe how to reproduce the problem
Problem only happened once, when the mysql traffic was relatively high, due to an optimized cron job that did a bunch of heavy SELECT with sub-optimal index (high reads) and batch UPDATE (high writes) in a loop.
We disabled the cron job after the incident, and the problem didn't happen again in the last 3+ months.
Include any warning/errors/backtraces from the system logs
From
top -H -p <mysql pid>
, thetxg_quiesce
thread was in D state:From dmesg (all logged in the same second):
Other information
The trace looks similar to #7924 (comment)
Also, we are using a bunch of sm863a disks in a 10 x 2 striped mirror setup. The datasets look like this:
We did consciously create the pool with the controversial
ashift=9
to get a much better compression ratio, after seeing no performance loss in load testing. Some more context in https://www.reddit.com/r/zfs/comments/cl3gr4/confused_about_conventional_wisdom_on_running/The text was updated successfully, but these errors were encountered: