-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] Resolve symbols from builtins.pyi
in the stdlib if they cannot be found in other scopes
#12390
Conversation
Haha, that's fun. This makes the red-knot benchmarks crash. I belive that's because the benchmarks just use the vendored typeshed stubs, and the benchmark code uses class str(Sequence[str]):
... which we obviously can't cope with right now, so we panic 😆 |
|
CodSpeed Performance ReportMerging #12390 will degrade performances by 97.48%Comparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Sort-of expected, I think... not sure there's any way around that; this is just what we have to do, I think! |
Wait, are those super-high regression numbers on the benchmarks because it was failing, or are those from after the benchmark fix? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful for reviews if the summary could explain some of the newly introduced concepts.
We should also a analyze the performance regression. The increase seems to big for the few builtins that we resolve
I think we have to update our benchmarks first, for example by pre-parsing builtins |
This is neat! And awesome how few changes weren't required |
#12390 adds support for resolving types to classes in typeshed's `builtins.pyi` stub file. This causes redknot to crash when attempting to execute this benchmark, as the `str` definition in typeshed is too complex for us to handle right now. `object` is a simpler type definition which we can resolve symbols to without crashing.
3ca68a0
to
0357af7
Compare
Definitely agree, but it's not obvious to me which concepts in the PR were not well explained in the summary? |
I'm mainly interested in understanding newly introduced salsa queries and how they relate |
There's a cycle problem here, because we kind of need the |
This would mean that our benchmarks wouldn't catch any performance regressions caused by upstream changes to typeshed's stubs for |
I think we want more benchmarks. The once we have today are intentionally narrow in scope so that they're very sensitive to overhead in the type inference machinery |
That makes sense. In that case, my instinct would be to update the benchmarks to use a custom typeshed directory with a minimal builtins stub, rather than using the vendored typeshed builtins stub. |
I think the I don't think we should use a fake builtins for the benchmarks. |
68775e1
to
bc8aa77
Compare
Also this was wrong, it's easy enough to just create a |
5600b49
to
d901769
Compare
bc8aa77
to
4239eb2
Compare
Hmm, it doesn't seem like the fixes I made to avoid redundant globals/builtins queries made a big dent in the perf regression here, so something I don't understand is still going on. Trying to dig into the CodSpeed data to understand what it could be. |
Ok, after poring over the CodSpeed flame graphs for a while, my conclusion is that in the non-incremental benchmarks ( I also pored over the traces from locally linting the same files that the benchmark runs on, and I didn't see any issues in the traces: it looked to me like we are doing the work we expect to do. One way we could potentially reduce this cost would be to semantic-index by scope instead of by file? But this might be over-indexing on the current example, where we use very little of a large file; in real-world large projects I expect the proportional cost of semantic indexing for stuff we don't use would be much, much lower. At this point I am open to further exploration, but my inclination based on what I've seen is that this regression is accurate based on adding semantic index of a much larger file, and we should merge it and keep paying attention to the benchmarks as we go; once we are able to check a much larger real-world program, we should take a careful look at where the bottlenecks are. |
353c486
to
6df0ac3
Compare
ac32227
to
5afe2a2
Compare
6df0ac3
to
3d58005
Compare
Co-authored-by: Carl Meyer <carl@astral.sh>
3d58005
to
44977e3
Compare
(It doesn't look like I have permissions to acknowledge the perf regression on CodSpeed. Somebody else might have to do that for me -- or give me permission to do so ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I think we should also add a test that first-party builtins.py
doesn't override the builtin one.
I added that test in the latest push ;) |
Summary
This PR means that red-knot is now able to understand builtin symbols -- resolving them to symbols in a
builtins.pyi
stub file (either in a custom typeshed directory, if one was supplied, or to the vendored stubs we ship as part of the binary).The first commit here moves some code around in the module resolver and adds a new public function exported by the module resolver,
resolve_builtins
. This is a thin wrapper around a new Salsa query,resolve_builtins_query
. The query short-circuits most of the module resolution logic we do for other Python modules, because this is what Python does at runtime: builtin symbols are (nearly) always resolved to the builtins module shipped as part of the interpreter, even if abuiltins.py
file exists in the first-party workspace.The second commit uses this new query exposed by the module resolver to obtain the builtins scope, and uses the builtins scope to resolve builtin symbols and infer the types of those symbols.
Test Plan
New tests have been added to
red_knot_module_resolver
andred_knot_python_semantic
Co-authored-by: Carl Meyer carl@astral.sh