-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
l2arc_feed stuck at 100% CPU #3259
Comments
@odoucet could you please run your tests with
that would be the other option left to test and run with ? I've already suspected from the reports that there must have been some underlying issue with l2arc (which is, btw, also mentioned in FreeBSD, etc. mailing list that performance or locking behavior on occasion is better without an l2arc device rather than using one) Reducing lock contention and tweaking l2arc access seems to be the next logical step to tackle (which partly already has been done with recent zfs changes) I'm just curious - do you have any form of compression enabled ? if no - how does that change the success rate with/without l2arc ? |
Have any ZFS parameters been tuned? Eg:
Some people have tried increasing this in the hope that the L2ARC warms up faster, but increasing it can make the L2ARC feed thread burn a CPU as it tries and fails to find fresh buffers to cache. |
No parameter tuned except zfs_arc_min / zfs_arc_max.
I'll test with secondarycache=metadata |
@kernelOfTruth test finished with success with secondarycache=metadata :
So only a 1.06% perf difference, so I think we could say perfs are equal. There was a few minutes where system appears less responsive, but it was at least always working (no big stuck like with secondarycache=all). For the last hour of the test, here are the number of times processes appeared using more than 90% of one core (5s interval) :
L2Arc stats :
|
Another benchmark with secondarycache=metadata :
So this time I have a single process doing the hard work, meaning just reading metadata. I was able to catch what happened when system hang for a few minutes. find task is blocked with this stack :
I also have my monitoring process blocked with this stacktrace :
arcstats shows that counter l2_abort_lowmem incremented during the process. |
removing cache drives leads to other stalls :
|
lowering ARC size from 160G to 100G (on a 196GB RAM server) seems to do the trick and no more stalls. Will try some additonal testing with L2ARC back on. |
That's unfortunate that ARC can't be that huge - during those loads most of the work is being spent in kernel/system space instead of user space ? So the benchmark succeeded: How is the system's responsiveness ? Did it take significantly longer or even the same time (compared to e.g. 160G of the ARC) ? Memory pressure appears significantly higher with l2arc and moreso with more than one drive (room for improvements ?): There must be a sweet spot for the size of the ARC related to performance (e.g. 65% of RAM for ARC - maybe lower with l2arc) - so it's not sure whether the ARC's size of 50% of RAM is rather a sane & safe default or rather a conservative value to prevent (locking, swapping, latency) issues. Do you have some stats whether throughput was negatively affected with a smaller ARC ? Lowering latency and/or improving responsiveness of the system during those heavy load scenarios could be improved via raising the values of [1] /proc/sys/vm/min_free_kbytes [1] http://blog.nitrous.io/2014/03/10/stability-and-a-linux-oom-killer-bug.html , You tried raising /sys/module/zfs/parameters/l2arc_write_max with a smaller ARC how that affects performance and stability? |
95% is iowait, 5% is system, <1% is user. Total CPU usage is ~ 50%
System stays perfectly responsive with smaller ARC. But note that 100G ARC used up to 188GB of memory !
Perfs are ... better !
(All with 0.6.3-trunk)
my benchmark is emulating reading all files on 16 descriptors, and it was faster, so I think I can say thoughput was better with smaller ARC. Oh and yes, I will definitely use 0.6.4 version for the new tests :) |
did/do the reading results get faster after several passes and a l2arc ? Thanks for providing those stats, glad responsiveness improved that ARC overgrowth is disturbing 😲 FYI:
Could you please try again with a large ARC (e.g. 160G) - with/without l2arc - and the patch provided from whether you see changes ? I'm sure @dweeezil would appreciate it - since it appears related ;) Thank you |
Yes, second path is faster. For just metadata read (time find /vol |wc) it is ~ 50% faster on the second pass. |
Close as stale. If it's actual - feel free to reopen. |
The problem I reported on several pull requests (#3190 (comment) and #3216 (comment)) was a system hang when doing a 100% read pattern with 16 threads. After some digging, I found the problem is located on l2arc_feed, so I think it's time creating a bug report on its own and stop polluting other pull requests / bug reports.
System :
196GB RAM - dual CPU CPU E5-2430
11 mirror (22 drives) + mirror on logs (2 drives SSD) + L2ARC (2 drives)
CentOS 6.5 - Kernel 3.10.73 (elrepo)
Zpool :
ZFS tuning
Benchmark
Results
System hang after ~ 1 hour
System hang after ~ 2 hours
System hang after ~ 1 hour
System hang after ~ 2 hour
System hang after ~ 2 hour
For all these results, stacktrace is the same :
All processes accessing the zpool are dumping this message (seen arc_adapt, 'cat', 'zfs', etc.)
Various informations are available on this gist :
https://gist.github.com/odoucet/6d18e4a91ad1fe90d181
including arcstats, slabinfo and top
"Why are you saying this comes from l2arc_feed" ?
The 'top' command running when system hang shows that process l2arc_feed was using 99.9% CPU. So I made a new test, on the same SPL+ZFS version (so trunk + pull #3216) but with
and ... benchmark runs fine (finished in 4h06mn, first time I succeed in finishing it)
The text was updated successfully, but these errors were encountered: