-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kick off Red-knot #10849
Kick off Red-knot #10849
Conversation
CodSpeed Performance ReportMerging #10849 will not alter performanceComparing Summary
|
49dbe6b
to
34fa6fd
Compare
#[derive(Debug, Eq, PartialEq, Hash)] | ||
pub struct FileAstId<N: HasAstId> { | ||
ast_id: AstId, | ||
_marker: PhantomData<fn() -> N>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this marker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust requires for structs that each type parameter is used by at least one field. Now, AstId
isn't generic over N
.
We can work around this by using PhantomData
. PhantomData
has no runtime cost (it compiles down to a zero size type) and it's only purpose is to capture the type N
.
You can think of the pattern applied here as compile-time only generics similar to Java where the generic arguments are erased at runtime but we want them at compile time to catch typing errors (e.g. we want to prevent that you use a AstId
for an IfStmt
to load a FunctionDef
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing I find a bit odd here is the fn() -> N
; why is it not just PhantomData<N>
? It seems like putting the typevar in return position like that might be intended to make TypedAstId
contravariant in N
rather than variant? But since Rust doesn't have struct subtyping, I'm not really sure what that would even mean; I thought Rust only really has variance for lifetimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is (and I was surprised by it) that Phantom<Data>
made the type non-copyable, NonEq
etc.
I guess the alternative would have been to implement all these types manually https://users.rust-lang.org/t/how-to-copy-phantomdata-of-un-clone-able-types/82229
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh weird! So basically it's just that Rust knows how to implement Eq
for fn() -> N
but not necessarily for N
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module stuff looks great! Sorry for leaving so many comments; many of them can probably just turn into TODOs for now.
Self(smol_str::SmolStr::new(name)) | ||
} | ||
|
||
pub fn from_relative_path(path: &Path) -> Option<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps somewhere in here we should validate that the extensions is .py
or .pyi
?
This actually makes me think that perhaps from_relative_path
should not exist at the ModuleName
layer but at a layer where it returns several pieces of information: a ModuleName
, a ModuleKind
(e.g. ::Python
or ::Stub
), and also maybe is_package
boolean (e.g. true for foo/__init__.py
, false for foo.py
). We will need all of those at some point; as implemented currently this from_relative_path
is a lossy operation, since it returns only one of those.
Perhaps those other two fields, and this method, belong on Module/ModuleData
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ModuleKind
and whether it is a package belongs on ModuleData
. ModuleName
is just the full qualified name of a package. The only reason I see for adding them to ModuleName
is if we want to support explicitly querying modules by their kind, but I don't think this is something we want. That's why I think from_relative_path
is fine. All it should do is to create the full qualified name from a relative import (it shouldn't perform any IO).
Perhaps somewhere in here we should validate that the extensions is .py or .pyi?
Yeah, early returning when it is not a py
or pyi
file makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I agree this data belongs on ModuleData
, not on ModuleName
, I wasn't suggesting to put it on ModuleName
.
What I was suggesting was that from_relative_path
should not be implemented like this as a ModuleName
constructor, because then we throw away information from the path that we will need. I think from_relative_path
should instead be implemented at a higher level where it returns all of the data that we can glean from the path, including the ModuleName
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see us throwing away any information, considering that this method doesn't do any Io. The idea here is that we get a full qualified name that we can then throw into the resolve function (that retrieves all information)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, I was looking at this method in isolation; I see the only place it's actually used is in resolve_path
, where the very next step is to resolve that module name back to a path (and make sure it actually resolves to the right path). So I agree, we will get all that information from resolving, and that's where we should get it from.
I would still prefer for this to be a private method of ModuleResolver
rather than a public constructor of ModuleName
, because I think it's kind of important to ensure that it only be used in the context of resolve_path
, as just described. We don't want some code in the future constructing ModuleName
using ModuleName::from_relative_path
and assuming that means it has a correct module name to path correspondence. But this is more a future thing for robustness, not an important prototype consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't mind making it private and agree, the only two methods that need to be public are relative
and new
(or absolute).
Self(smol_str::SmolStr::new(name)) | ||
} | ||
|
||
pub fn relative(_dots: u32, name: &str, _to: &Path) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm here will be simpler if instead of taking a full Path
for _to
parameter, we just take a ModuleName
and is_package
boolean (or some structure that encompasses both, might just be Module
).
We need is_package
because in foo/bar/__init__.py
, from . import baz
means foo.bar.baz
, but from foo/bar.py
, from . import baz
means foo.baz
.
If we take a full Path
here, then we effectively have to re-implement (or call) from_relative_path
again as part of this method, but in the places we will likely call this from (resolving a relative import we find in the AST) we will already have all the data of the current module handy, so it will be wasteful to recalculate that from Path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, it also avoids the need to check if the file has a .py
extension haha
The only challenge with this is that we need to analyze files that aren't modules (and may not have a module name):
- Jupyter notebooks
- Files that don't have a
py
orpyi
extension (Ruff supports configuring additional extensions that should be handled as python files)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about Jupyter notebooks, but I would assume they just can't have relative imports?
For files that don't have a py
or pyi
extension, I would assume we still treat them as modules as if they did? I'm not sure what the use cases for this is.
// src | ||
// parent | ||
// child | ||
// one.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
// one.py | |
// __init__.py | |
// one.py |
The link does not seem to be public. |
@JonathanPlasse I just opened it in a private session without any problems. Do you get an error that the link is invalid? Also, the link is a bit outdated. |
I can open it just fine 👍 |
It works for me too in private session. Sorry for the noise. |
crates/red_knot/src/types.rs
Outdated
/// Arena holding all known types | ||
#[derive(Default)] | ||
pub(crate) struct TypeEnvironment { | ||
types_by_id: IndexVec<TypeId, Type>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be interesting if we can come up with a more stable id for types than relying on inference order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a good point, I thought about this. I don't think TypeId
(as in, the index into this IndexVec
) can be stable; that's not really compatible with lazy type evaluation. But we may want to have another, more stable identifier (based on fully qualified name?) that we use most places instead of TypeId
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could also abandon using the IndexVec
arena in this case? But we will have a lot of types...
|
This is needed because of the AST changes from `OnceCell` to `OnceLock`. If we decide we don't need AST to be Sync/Send, we can revert this along with those changes; just want green CI for now. Also adjust a URL in doc comments so `cargo docs` doesn't complain about it.
Review comments from 762fa0b
Just a reorganization per feedback from Micha that I agreed with. Puts all `impl` right after the corresponding `struct`, and moves iterators further down in the file, since they are implementation detail.
## Summary Indexing definitions of symbols is necessary for fast lazy type evaluation, so that when we see a reference to a name we can figure out the type of that symbol from its definition(s). We only want to do this indexing of definitions once per module, and then it needs to remain available in the cache thereafter for use by lazy type evaluation. Rust lifetimes won't let us use direct references to AST nodes owned by the cache. We could use unsafe code to strip the lifetimes from these references, with safety ensured by our cache invalidation: if we evict the AST from the cache, we must evict the symbol definitions also. But in order to serialize such a cache to disk, we would need (at least) an AST numbering scheme. This may still be something to look into in the future, for improved performance. For now, use `NodeKey`: indirect references to an AST node consisting of a `NodeKind` and a `TextRange`, which we can find again reasonably quickly in the AST. These are easy to serialize, have no lifetime problems, and don't require unsafe code. ## Test Plan Updated tests.
## Summary Now that symbol definitions don't hold direct references to AST nodes, and thus don't have a lifetime bound, there's no longer any reason to separate the "core SymbolTable" from the definitions. All of the symbol table, including definitions, can be cached together, and it all needs to be invalidated together if the module AST changes. So let's simplify this and have fewer structs. ## Test Plan Existing tests.
Fill out the representation of types to a few more cases (especially unions and intersections) before going too far with type evaluation.
I got this all working and solved the API lifetime issues without Arc, by means of a new set of `TypeRef` structs. The remaining potential performance issue is that anytime you hold on to any of the new `TypeRef` structs, you lock a shard of the `TypeStore::modules` dashmap to writes (because you are holding a reference into it). So it will be important to minimize the use and scope of these type-refs. I think we can do this to some degree by caching type judgments using just type IDs. I also think for CLI use when we want to be highly parallel, we can be smart about ordering (check all module bodies first, then check function bodies when module level types are all populated) to minimize write contention. Also, if needed we can break up `ModuleTypeStore`, or use inner mutability and internal locking to have finer-grained locking within it. I went with this version instead of rewriting to have the type arenas hold Arc to the types, because I am not totally convinced the Arc version will be better. With Arc every "read" turns into a write to the atomic reference count, which introduces overhead (which is really useless overhead for us, since ultimately we rely on the arenas for garbage collection). And so we will introduce contention on the atomic reference count even for reads of highly-used types. So for both versions we will have to be careful with our use of references. I think the Arc-free version is lower overhead and sets us up better for future optimization of the locking strategy, once we have more working code to optimize against. Even if I turn out to be wrong about the above and eventually we decide to use Arc, I'd rather go with this for now and move on to type evaluation, and make the Arc change later when we can evaluate the effects better.
This PR demonstrates resolving an import from one module to a class type from another module!
The beginning of multifile analysis. We'll eventually merge this with ruff but are using a dedicated crate to flesh out the basic infrastructure first.