-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS writers go into d-state #7924
Comments
We have 3k iops per disk we are getting about 1500 read and write combined per disk. If I hit that limit will it hang the box? This is a log server so it will peak with load. |
Wouldn't I see that? If it is a bug... what next? |
Dang will do.. been freezing every about 2 days. should be able to grab. Do I need all the stacks? or just for the writers in dstate? |
This would be my guess as well given the bug description and stack trace above. The IO pipeline appears to be waiting on the disks which is what prompted the console message. Both a @kpande by default no, we don't need all of the stacks. The most relevant ones are typically automatically dumped to the system logs, if we need more it's easy to ask. |
Does the system ever come back from these situations, or does it hang indefinitely? I had a problem once on virtualbox where it would just decide the guest had no disks and the guest would hang. If it never comes back, it could be a EBS bug. I've been setting |
The instance type we are using is r4.8xlarge. Once it hangs, it doesn't come back until we start bouncing services or the instance itself. |
@kpanda What version of zfs are you running? We also know we have a thing, I can't remember what it's called, but it's something like burst forgiveness where it allows us to go over the limits. Some of the people on the team said that we would see if we were bursting. We haven't seen a burst. We run at half our limit. Our instance size is quite large. It has 32 procs and 256 GB ram. I only have 4 ebs drives which I am in process of scaling out a little. going to 8 drives. I thought I put the zpool iostat in... but I am running consistently, day and night 6-7k of each read and write and about 1.5k iops per disk of each read and write. The drives are running at 90+% busy, which is why I am rolling out the larger set. I have seen this several times. I will see if I can get a new kernel. We have been compiling the zfs mod's, I have heard that the rpm's are pretty safe to use? Thanks for your help. Need to code freeze by Oct 1st and would hate to lose the zfs footprint here. |
Oh I should add that when the z_wr_iss go into dstate there is no io to the pool. |
We had this happen again. This is the info that was captured. I was also thinking back to times I have seen this and wondered if cache flushing might cause this. |
Since we have a pretty massive infrastructure, we don't have the ability to roll outside of "standards" into aws. Have to run standard centos 7.5 |
I did see cache flushing is on.. In my zfs history have run into almost every bug with that.. |
Still seeing hitting this bug almost everyday. Ideas? |
Hi, NFS 4.1 server, kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. |
Hi, p.s. |
15:31:16 0 0 0 0 0 0 0 0 0 261G 261G |
3 x raidz1(each vdev is comprised of 4 X 5000GB EBS SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT |
pool properties: |
Hi |
@kpande Why did you ask about "are you hitting any metadata memory limits?" We have lots of small files. We are hitting on demand metadata at 75%, but I can't see a limit? |
HI, |
Some further information for original issue. We resized the instance to an r5.12xlarge in AWS just over 24 hours ago. We grew the L2ARC to ~300GB. Ran into another crash in the system similar to what we had seen before with the exception that this time we actually had something written into /var/log/messages about it:
|
We moved to a new instance type with nvme, hoping that might help. I now have moved to a new error. We are hung on txg_quiesce now. I did notice that txg_sync sits in D state most of the time and then quiesce will hang every few days. This looks like it is from a known bug but we are fairly current 0.7.11 Here is the txg_quiesce |
I would like to point out why I would want raidz on ebs, I want to be able to correct for bad checksums. That was actually what I thought my error might be. I believe all checksum errors show up as events though. |
@gmelikov Are you saying that a box could potentially hang due to a poor metadata to data ratio? All of my zfs has been on freebsd and solaris and there are very few things that can hang a box. usually only driver or physical hardware issues. Even in those cases you can usually recover. |
We have adjusted, instance size and type. We have gone nitro. Tuning arc_size and zfs_nocacheflush. Turned off prefetch. Stayed up for 2 days with an instance with 48vcpus and 192GB ram. We have the calling traces from go-carbon. Still hangin on txg_quiesce INFO: task txg_quiesce:12980 blocked for more than 120 seconds. |
I have exactly the same problem. Raspberry pi-4, 8 GiB RAM, Ubuntu 20.04 64bit, latest patches. Works fine, but any sustained (moderately) heavy load - like 30-40 MiB/sec for half an hour - causes this error. Since my docker files are also on ZFS, if basically locks the system up. Any ideas/hopes? |
I have almost the same identical system configuration as bogorad, and I see the exact problem as well, FWIW. Works amazingly well most of the time, but when I set up a filesystem on my raidz pool for Time Machine backups via smbd, the first big backup (200+ GB) locks up zfs with all tasks blocked and the "Tainted: P C OE" message. The traceback for each hung task in each case ends with schedule() -> __schedule() -> __switch_to() |
Funny thing - after a couple of updates it just stabilized. I've been downloading heavy torrents, and also re-mixing stuff with mkvmerge (20-60 gigs in and out) and all is good. Knock on wood ;) |
Luckily the desk I'm sitting at is solid wood, so I just literally did that. 👍 I also just did Thanks for the report. |
I also found a PPA that someone used to build the latest zfs kmod (0.8.4), that showed some promise, but I'm still getting lockups after about 250GB of data transfer. It's quite reliable, actually, at what point it fails. I wonder if it's a counter or buffer reuse issue. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
I have run into this issue several times and only with virtual disks. This on is with 4 ebs disk on aws.
This has been hanging the instance tight every few days.
I have also seen this several times with VMware and native Xen but was not able to capture this quality of data.
System information
[root@ip-10-1-10-62 etc]# modinfo zfs
filename: /lib/modules/3.10.0-862.11.6.el7.x86_64/extra/zfs.ko
version: 0.7.9-1
license: CDDL
author: OpenZFS on Linux
description: ZFS
retpoline: Y
rhelversion: 7.5
Describe the problem you're observing
ZFS IO hangs in d-state
[422180.591613] INFO: task z_wr_iss:1645 blocked for more than 120 seconds.
[422180.597666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[422180.605080] z_wr_iss D ffff89c033f5bf40 0 1645 2 0x00000000
[422180.612627] Call Trace:
[422180.616032] [] ? mutex_lock+0x12/0x2f
[422180.621871] [] schedule+0x29/0x70
[422180.627354] [] rwsem_down_write_failed+0x225/0x3a0
[422180.634310] [] call_rwsem_down_write_failed+0x17/0x30
[422180.641826] [] ? spl_kmem_free+0x35/0x40 [spl]
[422180.648809] [] down_write+0x2d/0x3d
[422180.654544] [] dbuf_write_ready+0x1e7/0x2f0 [zfs]
[422180.661888] [] arc_write_ready+0xac/0x300 [zfs]
[422180.668970] [] ? mutex_lock+0x12/0x2f
[422180.675001] [] zio_ready+0x65/0x3d0 [zfs]
[422180.681564] [] ? tsd_get_by_thread+0x2e/0x50 [spl]
[422180.688748] [] ? taskq_member+0x18/0x30 [spl]
[422180.695932] [] zio_execute+0xa2/0x100 [zfs]
[422180.702588] [] taskq_thread+0x2ac/0x4f0 [spl]
[422180.709225] [] ? wake_up_state+0x20/0x20
[422180.715461] [] ? zio_taskq_member.isra.7.constprop.10+0x80/0x80 [zfs]
[422180.724316] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[422180.731705] [] kthread+0xd1/0xe0
[422180.737123] [] ? insert_kthread_work+0x40/0x40
[422180.744223] [] ret_from_fork_nospec_begin+0x21/0x21
[422180.751584] [] ? insert_kthread_work+0x40/0x40
root 1636 6.8 0.0 0 0 ? D< Sep13 484:46 [z_wr_iss]
root 1637 6.8 0.0 0 0 ? D< Sep13 484:32 [z_wr_iss]
root 1638 6.8 0.0 0 0 ? D< Sep13 484:20 [z_wr_iss]
root 1639 6.8 0.0 0 0 ? D< Sep13 484:33 [z_wr_iss]
root 1640 6.8 0.0 0 0 ? D< Sep13 484:06 [z_wr_iss]
root 1641 6.8 0.0 0 0 ? D< Sep13 484:24 [z_wr_iss]
root 1642 6.8 0.0 0 0 ? D< Sep13 484:33 [z_wr_iss]
root 1643 6.8 0.0 0 0 ? D< Sep13 484:15 [z_wr_iss]
root 1644 6.8 0.0 0 0 ? D< Sep13 484:26 [z_wr_iss]
root 1645 6.8 0.0 0 0 ? D< Sep13 484:34 [z_wr_iss]
root 1646 6.8 0.0 0 0 ? D< Sep13 484:23 [z_wr_iss]
root 1647 6.8 0.0 0 0 ? D< Sep13 484:41 [z_wr_iss]
root 1648 6.8 0.0 0 0 ? D< Sep13 484:27 [z_wr_iss]
root 1649 6.8 0.0 0 0 ? D< Sep13 484:36 [z_wr_iss]
root 1650 6.8 0.0 0 0 ? D< Sep13 484:27 [z_wr_iss]
root 1651 6.8 0.0 0 0 ? D< Sep13 484:29 [z_wr_iss]
root 1652 6.8 0.0 0 0 ? D< Sep13 484:05 [z_wr_iss]
root 1653 6.8 0.0 0 0 ? D< Sep13 484:20 [z_wr_iss]
root 1654 6.8 0.0 0 0 ? D< Sep13 484:20 [z_wr_iss]
root 1655 6.8 0.0 0 0 ? D< Sep13 484:28 [z_wr_iss]
root 1656 6.8 0.0 0 0 ? D< Sep13 484:32 [z_wr_iss]
root 1657 6.8 0.0 0 0 ? D< Sep13 484:18 [z_wr_iss]
root 1658 6.8 0.0 0 0 ? D< Sep13 484:23 [z_wr_iss]
root 1659 6.8 0.0 0 0 ? D< Sep13 484:36 [z_wr_iss]
root 1636 6.8 0.0 0 0 ? D< Sep13 484:46 [z_wr_iss]
root 1637 6.8 0.0 0 0 ? D< Sep13 484:32 [z_wr_iss]
root 1638 6.8 0.0 0 0 ? D< Sep13 484:20 [z_wr_iss]
root 1639 6.8 0.0 0 0 ? D< Sep13 484:33 [z_wr_iss]
root 1640 6.8 0.0 0 0 ? D< Sep13 484:06 [z_wr_iss]
root 1641 6.8 0.0 0 0 ? D< Sep13 484:24 [z_wr_iss]
root 1642 6.8 0.0 0 0 ? D< Sep13 484:33 [z_wr_iss]
root 1643 6.8 0.0 0 0 ? D< Sep13 484:15 [z_wr_iss]
root 1644 6.8 0.0 0 0 ? D< Sep13 484:26 [z_wr_iss]
root 1645 6.8 0.0 0 0 ? D< Sep13 484:34 [z_wr_iss]
root 1646 6.8 0.0 0 0 ? D< Sep13 484:23 [z_wr_iss]
root 1647 6.8 0.0 0 0 ? D< Sep13 484:41 [z_wr_iss]
root 1648 6.8 0.0 0 0 ? D< Sep13 484:27 [z_wr_iss]
root 1649 6.8 0.0 0 0 ? D< Sep13 484:36 [z_wr_iss]
root 1650 6.8 0.0 0 0 ? D< Sep13 484:27 [z_wr_iss]
root 1651 6.8 0.0 0 0 ? D< Sep13 484:29 [z_wr_iss]
root 1652 6.8 0.0 0 0 ? D< Sep13 484:05 [z_wr_iss]
root 1653 6.8 0.0 0 0 ? D< Sep13 484:20 [z_wr_iss]
root 1654 6.8 0.0 0 0 ? D< Sep13 484:20 [z_wr_iss]
root 1655 6.8 0.0 0 0 ? D< Sep13 484:28 [z_wr_iss]
root 1656 6.8 0.0 0 0 ? D< Sep13 484:32 [z_wr_iss]
root 1657 6.8 0.0 0 0 ? D< Sep13 484:18 [z_wr_iss]
root 1658 6.8 0.0 0 0 ? D< Sep13 484:23 [z_wr_iss]
root 1659 6.8 0.0 0 0 ? D< Sep13 484:36 [z_wr_iss]
Describe how to reproduce the problem
This does seem to happen at close to this same amount of of cpu time.. I know it doesn't make sense but it seems like I have seen this at 480' something... of total cpu time.
It happens every couple days on this log server. I have seen this similar in very write centric workloads onto virtual disk.
The text was updated successfully, but these errors were encountered: