-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary uses of DashMap
and Arc
#3413
Conversation
'a, | ||
Context: BuildContext + Send + Sync, | ||
InstalledPackages: InstalledPackagesProvider + Send + Sync, | ||
> Resolver<'a, DefaultResolverProvider<'a, Context>, InstalledPackages> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love that!
Is the resolver running on just the main thread what we want? If we wanted it to make use of multiple threads in the future (even as an experiment to try?), then adding these |
@BurntSushi That's a good point. I'll mark this as a draft for now before we settle down on our async architecture. I agree we probably don't pay much of a cost for using |
Actually I'm going to take that back. Making any of the resolver or installer code multi-threaded would be very hard because of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm supportive of the change, though good to get @BurntSushi sign-off too before merging since he chimed in with comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. I'm overall in favor of simplifying the code as it exists under current constraints, and if we find ourselves needing to add this (or something like it) back, then it doesn't seem that horrible to do.
## Summary This PR introduces parallelism to the resolver. Specifically, we can perform PubGrub resolution on a separate thread, while keeping all I/O on the tokio thread. We already have the infrastructure set up for this with the channel and `OnceMap`, which makes this change relatively simple. The big change needed to make this possible is removing the lifetimes on some of the types that need to be shared between the resolver and pubgrub thread. A related PR, #1163, found that adding `yield_now` calls improved throughput. With optimal scheduling we might be able to get away with everything on the same thread here. However, in the ideal pipeline with perfect prefetching, the resolution and prefetching can run completely in parallel without depending on one another. While this would be very difficult to achieve, even with our current prefetching pattern we see a consistent performance improvement from parallelism. This does also require reverting a few of the changes from #3413, but not all of them. The sharing is isolated to the resolver task. ## Test Plan On smaller tasks performance is mixed with ~2% improvements/regressions on both sides. However, on medium-large resolution tasks we see the benefits of parallelism, with improvements anywhere from 10-50%. ``` ./scripts/requirements/jupyter.in Benchmark 1: ./target/profiling/baseline (resolve-warm) Time (mean ± σ): 29.2 ms ± 1.8 ms [User: 20.3 ms, System: 29.8 ms] Range (min … max): 26.4 ms … 36.0 ms 91 runs Benchmark 2: ./target/profiling/parallel (resolve-warm) Time (mean ± σ): 25.5 ms ± 1.0 ms [User: 19.5 ms, System: 25.5 ms] Range (min … max): 23.6 ms … 27.8 ms 99 runs Summary ./target/profiling/parallel (resolve-warm) ran 1.15 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm) ``` ``` ./scripts/requirements/boto3.in Benchmark 1: ./target/profiling/baseline (resolve-warm) Time (mean ± σ): 487.1 ms ± 6.2 ms [User: 464.6 ms, System: 61.6 ms] Range (min … max): 480.0 ms … 497.3 ms 10 runs Benchmark 2: ./target/profiling/parallel (resolve-warm) Time (mean ± σ): 430.8 ms ± 9.3 ms [User: 529.0 ms, System: 77.2 ms] Range (min … max): 417.1 ms … 442.5 ms 10 runs Summary ./target/profiling/parallel (resolve-warm) ran 1.13 ± 0.03 times faster than ./target/profiling/baseline (resolve-warm) ``` ``` ./scripts/requirements/airflow.in Benchmark 1: ./target/profiling/baseline (resolve-warm) Time (mean ± σ): 478.1 ms ± 18.8 ms [User: 482.6 ms, System: 205.0 ms] Range (min … max): 454.7 ms … 508.9 ms 10 runs Benchmark 2: ./target/profiling/parallel (resolve-warm) Time (mean ± σ): 308.7 ms ± 11.7 ms [User: 428.5 ms, System: 209.5 ms] Range (min … max): 287.8 ms … 323.1 ms 10 runs Summary ./target/profiling/parallel (resolve-warm) ran 1.55 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm) ```
Summary
All of the resolver code is run on the main thread, so a lot of the
Send
bounds and uses ofDashMap
andArc
are unnecessary. We could also switch to using single-threaded versions ofMutex
andNotify
in some places, but there isn't really a crate that provides those I would be comfortable with using.The
Arc
inOnceMap
can't easily be removed because of the uv-auth code which uses the reqwest-middleware crate, that seems to adds unnecessarySend
bounds because ofasync-trait
. We could duplicate the code and create aOnceMapLocal
variant, but I don't feel that's worth it.