-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS IO can stall for a second or more under memory pressure #3616
Comments
I was trying a bit too hard to retain the pre-ca0bf58 code structure. I've worked up a patch which adopts the illumos structure in dweeezil/zfs@9970711. |
@dweeezil Excellent ! Thanks a bunch, let's see if this could help @sempervictus and @DeHackEd with their problems, too |
I'm going to run some metadata-intensive tests in tight memory environments to make sure the arc size doesn't blow up. EDIT: @siebenmann, your analysis certainly seems reasonable. There was so much involved with that patch that I don't remember exactly what I was thinking when working on |
@siebenmann nice analysis on this issue. I'd suggest we focus on finalizing #3533 which in my mind is the second half of the ca0bf58 changes. It brings ZoL ARC back in sync with the illumos ARC in all the core areas leaving just a few places where we needed to diverge. Specifically, it includes this fix which nicely explains why it resolved @sempervictus's issue. |
…no_grow 5376 arc_kmem_reap_now() should not result in clearing arc_no_grow Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5376 illumos/illumos-gate@2ec99e3 Porting Notes: The good news is that many of the recent changes made upstream to the ARC tackled issues previously observed by ZoL with similar solutions. The bad news is those solution weren't identical to the ones we applied. This patch is designed to split the difference and apply as much of the upstream work as possible. * The arc_available_memory() function was removed previous in ZoL but due to the upstream changes it makes sense to add it back. This function has been customized for Linux so that it can be used to determine a low memory. This provides the same basic functionality as the illumos version allowing us to minimize changes through the rest of the code base. The exact mechanism used to detect a low memory state remains unchanged so this change isn't a significant as it might first appear. * This patch includes the long standing fix for arc_shrink() which was originally proposed in openzfs#2167. Since there were related changes to this function it made sense to include that work. * The arc_init() function has been re-factored. As before it sets sane default values for the ARC but then calls arc_tuning_update() to apply user specific tuning made via module options. The arc_tuning_update() function is then called periodically by the arc_reclaim_thread() to apply changes to the tunings made during normal operation. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3616 Closes openzfs#2167
…no_grow 5376 arc_kmem_reap_now() should not result in clearing arc_no_grow Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5376 illumos/illumos-gate@2ec99e3 Porting Notes: The good news is that many of the recent changes made upstream to the ARC tackled issues previously observed by ZoL with similar solutions. The bad news is those solution weren't identical to the ones we applied. This patch is designed to split the difference and apply as much of the upstream work as possible. * The arc_available_memory() function was removed previous in ZoL but due to the upstream changes it makes sense to add it back. This function has been customized for Linux so that it can be used to determine a low memory. This provides the same basic functionality as the illumos version allowing us to minimize changes through the rest of the code base. The exact mechanism used to detect a low memory state remains unchanged so this change isn't a significant as it might first appear. * This patch includes the long standing fix for arc_shrink() which was originally proposed in openzfs#2167. Since there were related changes to this function it made sense to include that work. * The arc_init() function has been re-factored. As before it sets sane default values for the ARC but then calls arc_tuning_update() to apply user specific tuning made via module options. The arc_tuning_update() function is then called periodically by the arc_reclaim_thread() to apply changes to the tunings made during normal operation. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3616 Closes openzfs#2167
In arc_get_data_buf, if arc_is_overflowing() is true we signal the reclaim thread and wait for arc_reclaim_waiters_cv. On my machine with git tip ZFS on Linux, I'm periodically seeing multiple processes stalling in this wait for a second or more; this has the effect of drastically slowing down their IO.
This CV is primarily signalled from two places. In arc_evict_state_impl(), it is signalled on !arc_is_overflowing(); in arc_adapt_thread() (which winds up calling arc_evict_state_impl()) it is signalled on 'arc_size <= arc_c || arc_evicted = 0'. However, these two conditions are not quite the same because arc_is_overflowing() adds an overflow block to arc_c. So if arc_evict_state_impl() evicts some space but not the full overflow amount, it will not signal the CV but we'll wind up with arc_evicted > 0, arc_adapt_thread() may not signal the CV either because arc_size is now a bit over arc_c, and arc_adapt_thread() will then wind up sleeping for a second before it tries again and maybe signals the CV this time around. In the mean time, arc_get_data_buf() callers are waiting.
This issue was introduced in ca0bf58, a port of the upstream 'Illumos 5497 - lock contention on arcs_mtx'. The issue is not present in the upstream Illumos code because there the entire cv_timedwait_sig() section is also in the 'arc_size <= arc_c || arc_evicted == 0' conditional, so arc_adapt_thread() always signals if it's going to wait (and otherwise it immediately tries to shrink again).
(I don't know if the best fix is to make the code work like the Illumos code or to change the condition in arc_adapt_thread() to also be '!arc_is_overflowing()' or something like that. The Illumos code clearly pushes ARC shrinking more, since it will keep trying to evict things until nothing is evicted.)
The text was updated successfully, but these errors were encountered: