-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove WaitMap
dependency
#1183
Conversation
@konstin - I haven't done any benchmarking, but this does consistently not-deadlock for me. |
Current dependencies on/for this PR:
This stack of pull requests is managed by Graphite. |
(Sorry for pushing this branch, i misread the updated graphite docs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removes the deadlock! Benchmarks in the upstack PR are looking good. My main concern is deadlock-warning in the DashMap::entry
api
Value::Waiting(notify) => { | ||
let notify = notify.clone(); | ||
drop(entry); | ||
notify.notified().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert that on drop, nobody is waiting anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a task waiting on some request but there is no task providing it (because we're done, we're dropping the once map), this sounds like a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I see now. Yes. On Drop
of the OnceMap
.
I agree it might be a bug... but I could also see that maybe it isn't? Maybe the caller has finished what they needed and didn't need to wait for everything to finish. And it is perhaps hard to distinguish between "there is no task providing it" and "there is a task providing it, but it hasn't done so yet." I defer to y'all here because I don't have enough context on how this is used. But I absolutely agree that if you can assert that any Waiting
values are a bug, then it might be nice to assert it on Drop
. (Although note that a panic during Drop
is an instant abort.)
@@ -406,6 +405,9 @@ impl<'a, Provider: ResolverProvider> Resolver<'a, Provider> { | |||
if self.index.distributions.register_owned(dist.package_id()) { | |||
priorities.add(dist.name().clone()); | |||
request_sink.unbounded_send(Request::Dist(dist))?; | |||
|
|||
// Yield, to allow subscribers to continue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add here that we specifically need this because the channel is sync
crates/once-map/src/lib.rs
Outdated
let mut lock = self.started.lock().unwrap(); | ||
if lock.contains(key) { | ||
return false; | ||
let entry = self.items.entry(key.to_owned()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs for entry say:
Locking behaviour: May deadlock if called when holding any sort of reference into the map.
This assumes we never actually call this function from two threads at the same time. This is the same call that WaitMap
previously deadlocked in.
If we know that the map will stay on the main thread anyway, we can use Cell<FxHashMap<K, V>>
instead of DashMap
. If we want share the index between threads (i don't know that - it depends on how we want to structure our async code) we'll need something like RwLock<FxHashMap<K, V>>
. DashMap
is a Box<[RwLock<HashMap<K, V, S>>]>
internally (https://docs.rs/dashmap/latest/src/dashmap/lib.rs.html#88-92) and our lock times are minimal, so i expect no perf difference. DashMap
is missing a get_or_insert() -> bool
, an "atomic" compare-and-swap option, that both we and WaitMap
would need to be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the warning is about calling entry
from two different threads simultaneously. That's normal and should be fine, otherwise it'd be a pretty poor concurrent hashmap. The warning is, AIUI, about calling entry
(or get
) while you have a reference to a previous call in hand in the same thread. When multiple threads are calling it, one will (hopefully) eventually make progress, drop the reference and unblock the other thread. But if you do let entry = self.items.entry(foo);
and then self.items.get(foo)
while entry
is still alive, then that get
call seems likely to block waiting for entry
to drop. Which, of course, will never happen because the thread is blocked on get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying, this makes much more sense! I was indeed assuming that dashmap was behaving rather poorly, but this makes more sense and works for us if we don't have any await points while we hold the entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this longer, was the problem with waitmap maybe merely that we there was some yielding while holding an entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@konstin - I spent a while investigating that, and trying to make changes to the resolver to solve it, but I ultimately couldn't figure out where it might be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also briefly looked at waitmap
's implementation and nothing jumped out at me immediately.
self.wait_map.insert(key, value); | ||
if let Some(Value::Waiting(notify)) = self.items.insert(key, Value::Filled(Arc::new(value))) | ||
{ | ||
notify.notify_waiters(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also yield here? Notifying is surprisingly sync, so the subscribers won't get notified immediately until the next yielding of the task that called done. (This would make the function async, but i think it's correct that all operations on the OnceMap
should be async)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notifying is surprisingly sync, so the subscribers won't get notified immediately until the next yielding of the task that called done.
Hmmm. Are you sure? We're using the multi-threaded runtime for tokio right? If so, AIUI, other waiters could be notified and acting on it before notify.notify_waiters()
even finishes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's correct, but our receivers are all on the some thread atm as far as i can see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason the waiters need to wake up sooner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #1163 i got a speedup from 30ms to 20ms for warm cache jupyter by inserting one tokio::task::yield_now().await
, now i'm motivated to avoid these kinds of bottlenecks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. I like the use of Arc
to simplify things here.
crates/once-map/src/lib.rs
Outdated
let mut lock = self.started.lock().unwrap(); | ||
if lock.contains(key) { | ||
return false; | ||
let entry = self.items.entry(key.to_owned()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the warning is about calling entry
from two different threads simultaneously. That's normal and should be fine, otherwise it'd be a pretty poor concurrent hashmap. The warning is, AIUI, about calling entry
(or get
) while you have a reference to a previous call in hand in the same thread. When multiple threads are calling it, one will (hopefully) eventually make progress, drop the reference and unblock the other thread. But if you do let entry = self.items.entry(foo);
and then self.items.get(foo)
while entry
is still alive, then that get
call seems likely to block waiting for entry
to drop. Which, of course, will never happen because the thread is blocked on get
.
self.wait_map.insert(key, value); | ||
if let Some(Value::Waiting(notify)) = self.items.insert(key, Value::Filled(Arc::new(value))) | ||
{ | ||
notify.notify_waiters(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notifying is surprisingly sync, so the subscribers won't get notified immediately until the next yielding of the task that called done.
Hmmm. Are you sure? We're using the multi-threaded runtime for tokio right? If so, AIUI, other waiters could be notified and acting on it before notify.notify_waiters()
even finishes.
Value::Waiting(notify) => { | ||
let notify = notify.clone(); | ||
drop(entry); | ||
notify.notified().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more?
Thank you both so much for the close read, I needed it! |
dashmap::mapref::entry::Entry::Occupied(_) => false, | ||
dashmap::mapref::entry::Entry::Vacant(entry) => { | ||
entry.insert(Value::Waiting(Arc::new(Notify::new()))); | ||
true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could return some Handle
object that we could use to enforce the invariant, right? e.g. done
would consume a handle which would have a reference to the key instead and we'd be able to check if the handle was not used on drop.
If this method returns
true
, you need to start a job and call [OnceMap::done
] eventually
or other tasks will hang.
Not sure how problematic that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that would be a much nicer API. Will consider it in a future PR, the dataflow might be tedious for now.
This reverts commit ea97c30185d08f8ce736b021f54e95859d54b086.
2ab1949
to
887fc3f
Compare
Summary
This is an attempt to #1163 by removing the
WaitMap
and gaining more granular control over the values that we hold overawait
boundaries.