Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] per-module arenas #11152

Merged
merged 7 commits into from
Apr 26, 2024
Merged

[red-knot] per-module arenas #11152

merged 7 commits into from
Apr 26, 2024

Conversation

carljm
Copy link
Contributor

@carljm carljm commented Apr 25, 2024

I got this all working and solved the API lifetime issues without Arc, by means of a new set of XTypeRef structs.

The remaining potential performance issue is that anytime you hold on to any of the new XTypeRef structs, you lock a shard of the TypeStore::modules dashmap to writes (because you are holding a reference into it). So it will be important to minimize the use and scope of these type-refs. I think we can do this to some degree by caching type judgments using just type IDs. I also think for CLI use when we want to be highly parallel, we can be smart about ordering (check all module bodies first, then check function bodies when module level types are all populated) to minimize write contention. Also, if needed we can break up ModuleTypeStore, or use inner mutability and internal locking to have finer-grained locking within it.

I went with this version instead of rewriting to have the type arenas hold Arc to the types, because I am not totally convinced the Arc version will be better. With Arc every "read" turns into a write to the atomic reference count, which introduces overhead (which is really useless overhead for us, since ultimately we rely on the arenas for garbage collection). And so we will introduce contention on the atomic reference count even for reads of highly-used types. So for both versions we will have to be careful with our use of references. I think the Arc-free version is lower overhead and sets us up better for future optimization of the locking strategy, once we have more working code to optimize against.

Even if I turn out to be wrong about the above and eventually we decide to use Arc, I'd rather go with this for now and move on to type evaluation, and make the Arc change later when we can evaluate the effects better.

@carljm carljm marked this pull request as draft April 25, 2024 20:43
@carljm carljm requested a review from MichaReiser April 25, 2024 20:48
@carljm carljm force-pushed the cjm/red-knot/type-arenas branch from a7b304f to b087d97 Compare April 25, 2024 21:33
@carljm carljm changed the title [WIP] per-module arenas [red-knot] per-module arenas Apr 25, 2024
@carljm carljm force-pushed the cjm/red-knot/type-arenas branch from b087d97 to 35fdb40 Compare April 25, 2024 21:39
Copy link
Contributor

github-actions bot commented Apr 25, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+1 -0 violations, +0 -0 fixes in 1 projects; 43 projects unchanged)

python/typeshed (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select E,F,FA,I,PYI,RUF,UP,W

+ stubs/setuptools/pkg_resources/_vendored_packaging/specifiers.pyi:10:22: PYI001 Name of private `TypeVar` must start with `_`

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
PYI001 1 1 0 0 0

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@carljm carljm marked this pull request as ready for review April 25, 2024 22:50
@carljm carljm requested a review from BurntSushi April 25, 2024 22:50
@carljm
Copy link
Contributor Author

carljm commented Apr 25, 2024

Also requesting review from @BurntSushi, not necessarily to review all the code (though that would be awesome too!) but mostly just to double-check whether any of what I wrote in the summary sounds totally off base at a high level.

@carljm carljm requested a review from AlexWaygood April 25, 2024 22:55
@carljm
Copy link
Contributor Author

carljm commented Apr 25, 2024

And @AlexWaygood, mostly just to observe that I switched unions and intersections from HashSet to IndexSet, which provides both O(1) containment checks and de-duplication, and preserves ordering. So we have the capability to distinguish order, both for UX purposes (I wanted it even just for stable tests of the union/intersection display code!), and if needed for actual type system purposes (I don't think we should, but now we can, if you convince me otherwise!)

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

And so we will introduce contention on the atomic reference count even for reads of highly-used types.

I'm not sure I follow the reasoning here. Isn't the main contention the module lock that guards access to the type? And if so, isn't holding on to the lock for longer than necessary (for longer than incrementing a reference counter) causing worse contention?

I think my main worry is that it's non obvious for callers that they're holding on to a lock, making it very easy to hold on to a lock for too-long (I just call this very expensive function and pass a reference to a &FunctionType).

I wonder if Salsa solves some of this by making the accessors methods on the Id types.

let function: FunctionId = // somehow get a function id
let name = function.name(db); // smolstr::Clone is O(1)
let parameters  = function.parameters(db); // We could intern `Parameters` into an arena so that `Parameters` is just an Id.

for parameter in parameters.iter(db) {
	let name = parameter.name(db);
}

I must admit, it's a bit awkward dealing with so many ids and I don't know what the overhead of all the hash table lookups is.

But I'm happy to try this out and iterate on it later on.

crates/red_knot/src/types.rs Outdated Show resolved Hide resolved
@carljm
Copy link
Contributor Author

carljm commented Apr 26, 2024

Isn't the main contention the module lock that guards access to the type? And if so, isn't holding on to the lock for longer than necessary (for longer than incrementing a reference counter) causing worse contention?

I am really not sure how all this will play out in practice with our actual workload, so take all of this with a grain of salt! (Really my top-line motivation here was just to do the quickest thing that would work and move on, and re-evaluate once we can actually benchmark on real workloads and see how the contention plays out.)

Dashmap allows multiple simultaneous readers (they have to take a reader lock, but they don't contend with each other, only with writers). With the approach in this PR, there is no contention between readers, but readers may block writers for longer (however long a TypeRef is held). If we add Arc, then even readers have to contend over incrementing the atomic reference count (even if there is no writer). It may be that this is a non-issue in practice, I don't know. I just know that e.g. for CPython, which also does reference counting, contention over reference counts for hot shared objects is a huge performance issue if you try to run Python code concurrently and make reference counts atomic. And I know in general (just from searching on the topic) that Arc contention can be a big problem for scaling parallel Rust workloads too.

I think my main worry is that it's non obvious for callers that they're holding on to a lock, making it very easy to hold on to a lock for too-long (I just call this very expensive function and pass a reference to a &FunctionType).

Yes, I'm also concerned that this will be a foot-gun.

I wonder if Salsa solves some of this by making the accessors methods on the Id types.

Yes, I saw you did this with the Module type, and it's an alternative we could move towards if we need to. My main concerns about it are all the extra hashmap lookups if you need multiple things from a type all at once (as you mentioned), and the need to copy anything returned by these accessors (though this may not be an issue if it's mostly just small strings and other type IDs.)

@carljm
Copy link
Contributor Author

carljm commented Apr 26, 2024

My other issue with using Arc here is similar to your objection to using a GC crate: it's just overhead that we don't actually need. We don't need reference counting, because we are using arenas to deallocate the types all at once, and IDs to refer between them. So all the work to update the reference counts (and any contention it causes) is just waste.

(Not pure waste, because it does provide a solution to the problem of callers taking temporary references to the type data and knowing they will keep it alive, without blocking writers to the arena. But it still feels like a big hammer for that problem, which should be solvable with just references and lifetimes.)

@MichaReiser
Copy link
Member

On the Arc overhead. I agree, it's neglectagble for e.g building the symbol table but it can become a problem if the object is used often. I think we want to explore a non-std Arc to get an implementation without WeakRef support. That will help but obviously doesn't mitigate the perf overhead entirely.

@ibraheemdev
Copy link
Member

Another option is using a different concurrent hash-table library, like flurry, where holding a read guard doesn't block writers, just potentially block memory from being freed.

@carljm
Copy link
Contributor Author

carljm commented Apr 26, 2024

I think we want to explore a non-std Arc to get an implementation without WeakRef support. That will help but obviously doesn't mitigate the perf overhead entirely.

Where would the savings come from here? It looks to me like weakrefs only really have a cost if you use them. You could save a tiny bit of memory in ArcInner, I guess (one AtomicUsize), but otherwise I'm not seeing where not supporting weakrefs would make Arc itself faster (or reduce refcount contention.)

Another option is using a different concurrent hash-table library, like flurry, where holding a read guard doesn't block writers, just potentially block memory from being freed.

Ooh, thank you for pointing this out! I spent a few minutes yesterday looking for exactly this, but for some reason flurry didn't come up in my search. Definitely might consider this instead of DashMap for the type store.

@carljm carljm merged commit bfee36d into red-knot Apr 26, 2024
18 checks passed
@carljm carljm deleted the cjm/red-knot/type-arenas branch April 26, 2024 18:54
@carljm carljm added the internal An internal refactor or improvement label Apr 26, 2024
MichaReiser pushed a commit that referenced this pull request Apr 27, 2024
I got this all working and solved the API lifetime issues without Arc,
by means of a new set of `TypeRef` structs.

The remaining potential performance issue is that anytime you hold on to
any of the new `TypeRef` structs, you lock a shard of the
`TypeStore::modules` dashmap to writes (because you are holding a
reference into it). So it will be important to minimize the use and
scope of these type-refs. I think we can do this to some degree by
caching type judgments using just type IDs. I also think for CLI use
when we want to be highly parallel, we can be smart about ordering
(check all module bodies first, then check function bodies when module
level types are all populated) to minimize write contention. Also, if
needed we can break up `ModuleTypeStore`, or use inner mutability and
internal locking to have finer-grained locking within it.

I went with this version instead of rewriting to have the type arenas
hold Arc to the types, because I am not totally convinced the Arc
version will be better. With Arc every "read" turns into a write to the
atomic reference count, which introduces overhead (which is really
useless overhead for us, since ultimately we rely on the arenas for
garbage collection). And so we will introduce contention on the atomic
reference count even for reads of highly-used types. So for both
versions we will have to be careful with our use of references. I think
the Arc-free version is lower overhead and sets us up better for future
optimization of the locking strategy, once we have more working code to
optimize against.

Even if I turn out to be wrong about the above and eventually we decide
to use Arc, I'd rather go with this for now and move on to type
evaluation, and make the Arc change later when we can evaluate the
effects better.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants