Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid blocking in arc_reclaim_thread() #3826

Closed
wants to merge 1 commit into from

Conversation

behlendorf
Copy link
Contributor

As described in the comment above arc_reclaim_thread() it's critical
that the reclaim thread be careful about blocking. Just like it must
never wait on a hash lock, it must never wait on a task which can in
turn wait on the CV in arc_get_data_buf(). This will deadlock, see
issue #3822 for full backtraces showing the problem.

To resolve this issue arc_kmem_reap_now() has been updated to use the
asynchronous arc prune function. This means that arc_prune_async()
may now be called while there are still outstanding arc_prune_tasks.
However, this isn't a problem because arc_prune_async() already
keeps a reference count preventing multiple outstanding tasks per
registered consumer. Functionally, this behavior is the same as
the counterpart illumos function dnlc_reduce_cache().

Signed-off-by: Brian Behlendorf behlendorf1@llnl.gov
Issue #3808
Issue #3822

As described in the comment above arc_reclaim_thread() it's critical
that the reclaim thread be careful about blocking.  Just like it must
never wait on a hash lock, it must never wait on a task which can in
turn wait on the CV in arc_get_data_buf().  This will deadlock, see
issue openzfs#3822 for full backtraces showing the problem.

To resolve this issue arc_kmem_reap_now() has been updated to use the
asynchronous arc prune function.  This means that arc_prune_async()
may now be called while there are still outstanding arc_prune_tasks.
However, this isn't a problem because arc_prune_async() already
keeps a reference count preventing multiple outstanding tasks per
registered consumer.  Functionally, this behavior is the same as
the counterpart illumos function dnlc_reduce_cache().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3808
Issue openzfs#3822
@behlendorf
Copy link
Contributor Author

@dweeezil can you review this.

@dweeezil
Copy link
Contributor

@behlendorf I'll look it over this evening.

@dweeezil
Copy link
Contributor

@behlendorf This looks OK for its intended purpose. My only question is whether it's OK that the shrinker callback (__arc_shrinker_func) will also gain the asynchronous behavior so it'll likely unblock arc_get_data_buf() in the overflowing case before its reaping has actually completed. My thinking is that it is OK with the only side-effect possibly being a bit of additional arc overflow.

@behlendorf
Copy link
Contributor Author

@dweeezil good point. This should be fine since spurious wake ups in arc_get_data_buf() are allowed as long as it's a fairly rare thing. As you said the only side effect show be a little arc overflow which will be quickly corrected. Thanks for looking at this.

@behlendorf
Copy link
Contributor Author

Merged as:

ef5b2e1 Avoid blocking in arc_reclaim_thread()

@behlendorf behlendorf closed this Sep 30, 2015
@behlendorf behlendorf deleted the issue-3822 branch April 19, 2021 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants