Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary uses of DashMap and Arc #3413

Merged
merged 3 commits into from
May 7, 2024

Conversation

ibraheemdev
Copy link
Member

Summary

All of the resolver code is run on the main thread, so a lot of the Send bounds and uses of DashMap and Arc are unnecessary. We could also switch to using single-threaded versions of Mutex and Notify in some places, but there isn't really a crate that provides those I would be comfortable with using.

The Arc in OnceMap can't easily be removed because of the uv-auth code which uses the reqwest-middleware crate, that seems to adds unnecessary Send bounds because of async-trait. We could duplicate the code and create a OnceMapLocal variant, but I don't feel that's worth it.

'a,
Context: BuildContext + Send + Sync,
InstalledPackages: InstalledPackagesProvider + Send + Sync,
> Resolver<'a, DefaultResolverProvider<'a, Context>, InstalledPackages>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😌

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love that!

@BurntSushi
Copy link
Member

All of the resolver code is run on the main thread, so a lot of the Send bounds and uses of DashMap and Arc are unnecessary.

Is the resolver running on just the main thread what we want? If we wanted it to make use of multiple threads in the future (even as an experiment to try?), then adding these Send bounds back could be pretty annoying. (I'm not sure our uses of DashMap and Arc are costing us much, but maybe I'm wrong about that. Of course, if we don't need them, then I agree with this PR.)

@ibraheemdev ibraheemdev marked this pull request as draft May 6, 2024 17:02
@ibraheemdev
Copy link
Member Author

@BurntSushi That's a good point. I'll mark this as a draft for now before we settle down on our async architecture. I agree we probably don't pay much of a cost for using DashMap/Arc on a single-thread.

@ibraheemdev ibraheemdev marked this pull request as ready for review May 6, 2024 17:28
@ibraheemdev
Copy link
Member Author

Actually I'm going to take that back. Making any of the resolver or installer code multi-threaded would be very hard because of the 'a lifetime that proliferates throughout the codebase. The changes required to do that would be significantly more than reverting this commit. I'd also be surprised to see any gain from making those changes that we couldn't get by optimizing our use of the single threaded scheduler, given our scale of I/O.

Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm supportive of the change, though good to get @BurntSushi sign-off too before merging since he chimed in with comments.

@charliermarsh charliermarsh added the internal A refactor or improvement that is not user-facing label May 6, 2024
Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. I'm overall in favor of simplifying the code as it exists under current constraints, and if we find ourselves needing to add this (or something like it) back, then it doesn't seem that horrible to do.

@charliermarsh charliermarsh merged commit 94cf604 into astral-sh:main May 7, 2024
43 checks passed
ibraheemdev added a commit that referenced this pull request May 17, 2024
## Summary

This PR introduces parallelism to the resolver. Specifically, we can
perform PubGrub resolution on a separate thread, while keeping all I/O
on the tokio thread. We already have the infrastructure set up for this
with the channel and `OnceMap`, which makes this change relatively
simple. The big change needed to make this possible is removing the
lifetimes on some of the types that need to be shared between the
resolver and pubgrub thread.

A related PR, #1163, found that
adding `yield_now` calls improved throughput. With optimal scheduling we
might be able to get away with everything on the same thread here.
However, in the ideal pipeline with perfect prefetching, the resolution
and prefetching can run completely in parallel without depending on one
another. While this would be very difficult to achieve, even with our
current prefetching pattern we see a consistent performance improvement
from parallelism.

This does also require reverting a few of the changes from
#3413, but not all of them. The
sharing is isolated to the resolver task.

## Test Plan

On smaller tasks performance is mixed with ~2% improvements/regressions
on both sides. However, on medium-large resolution tasks we see the
benefits of parallelism, with improvements anywhere from 10-50%.

```
./scripts/requirements/jupyter.in
Benchmark 1: ./target/profiling/baseline (resolve-warm)
  Time (mean ± σ):      29.2 ms ±   1.8 ms    [User: 20.3 ms, System: 29.8 ms]
  Range (min … max):    26.4 ms …  36.0 ms    91 runs
 
Benchmark 2: ./target/profiling/parallel (resolve-warm)
  Time (mean ± σ):      25.5 ms ±   1.0 ms    [User: 19.5 ms, System: 25.5 ms]
  Range (min … max):    23.6 ms …  27.8 ms    99 runs
 
Summary
  ./target/profiling/parallel (resolve-warm) ran
    1.15 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm)
```
```
./scripts/requirements/boto3.in   
Benchmark 1: ./target/profiling/baseline (resolve-warm)
  Time (mean ± σ):     487.1 ms ±   6.2 ms    [User: 464.6 ms, System: 61.6 ms]
  Range (min … max):   480.0 ms … 497.3 ms    10 runs
 
Benchmark 2: ./target/profiling/parallel (resolve-warm)
  Time (mean ± σ):     430.8 ms ±   9.3 ms    [User: 529.0 ms, System: 77.2 ms]
  Range (min … max):   417.1 ms … 442.5 ms    10 runs
 
Summary
  ./target/profiling/parallel (resolve-warm) ran
    1.13 ± 0.03 times faster than ./target/profiling/baseline (resolve-warm)
```
```
./scripts/requirements/airflow.in 
Benchmark 1: ./target/profiling/baseline (resolve-warm)
  Time (mean ± σ):     478.1 ms ±  18.8 ms    [User: 482.6 ms, System: 205.0 ms]
  Range (min … max):   454.7 ms … 508.9 ms    10 runs
 
Benchmark 2: ./target/profiling/parallel (resolve-warm)
  Time (mean ± σ):     308.7 ms ±  11.7 ms    [User: 428.5 ms, System: 209.5 ms]
  Range (min … max):   287.8 ms … 323.1 ms    10 runs
 
Summary
  ./target/profiling/parallel (resolve-warm) ran
    1.55 ± 0.08 times faster than ./target/profiling/baseline (resolve-warm)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal A refactor or improvement that is not user-facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants