Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Resolve symbols from builtins.pyi in the stdlib if they cannot be found in other scopes #12390

Merged
merged 9 commits into from
Jul 19, 2024

Conversation

AlexWaygood
Copy link
Member

Summary

This PR means that red-knot is now able to understand builtin symbols -- resolving them to symbols in a builtins.pyi stub file (either in a custom typeshed directory, if one was supplied, or to the vendored stubs we ship as part of the binary).

The first commit here moves some code around in the module resolver and adds a new public function exported by the module resolver, resolve_builtins. This is a thin wrapper around a new Salsa query, resolve_builtins_query. The query short-circuits most of the module resolution logic we do for other Python modules, because this is what Python does at runtime: builtin symbols are (nearly) always resolved to the builtins module shipped as part of the interpreter, even if a builtins.py file exists in the first-party workspace.

The second commit uses this new query exposed by the module resolver to obtain the builtins scope, and uses the builtins scope to resolve builtin symbols and infer the types of those symbols.

Test Plan

New tests have been added to red_knot_module_resolver and red_knot_python_semantic

Co-authored-by: Carl Meyer carl@astral.sh

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Jul 18, 2024
@AlexWaygood
Copy link
Member Author

AlexWaygood commented Jul 18, 2024

Haha, that's fun. This makes the red-knot benchmarks crash. I belive that's because the benchmarks just use the vendored typeshed stubs, and the benchmark code uses str as a return annotation in one function, and the definition of str in typeshed's builtins.pyi file is

class str(Sequence[str]):
    ...

which we obviously can't cope with right now, so we panic 😆

Copy link
Contributor

github-actions bot commented Jul 18, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Copy link

codspeed-hq bot commented Jul 18, 2024

CodSpeed Performance Report

Merging #12390 will degrade performances by 97.48%

Comparing builtins-resolution (0f0f5b2) with main (1c7b840)

Summary

❌ 3 (👁 3) regressions
✅ 30 untouched benchmarks

Benchmarks breakdown

Benchmark main builtins-resolution Change
👁 red_knot_check_file[cold] 347.1 µs 13,754.9 µs -97.48%
👁 red_knot_check_file[incremental] 211.3 µs 423.7 µs -50.12%
👁 red_knot_check_file[without_parse] 260.2 µs 6,359.1 µs -95.91%

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
crates/ruff_benchmark/benches/red_knot.rs Outdated Show resolved Hide resolved
@AlexWaygood
Copy link
Member Author

Merging #12390 will degrade performances by 98.09%

Sort-of expected, I think... not sure there's any way around that; this is just what we have to do, I think!

@carljm
Copy link
Contributor

carljm commented Jul 18, 2024

Wait, are those super-high regression numbers on the benchmarks because it was failing, or are those from after the benchmark fix?

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful for reviews if the summary could explain some of the newly introduced concepts.

We should also a analyze the performance regression. The increase seems to big for the few builtins that we resolve

crates/red_knot_module_resolver/src/resolver.rs Outdated Show resolved Hide resolved
@MichaReiser
Copy link
Member

I think we have to update our benchmarks first, for example by pre-parsing builtins

@MichaReiser
Copy link
Member

This is neat! And awesome how few changes weren't required

AlexWaygood added a commit that referenced this pull request Jul 18, 2024
#12390 adds support for resolving types to classes in typeshed's `builtins.pyi` stub file. This causes redknot to crash when attempting to execute this benchmark, as the `str` definition in typeshed is too complex for us to handle right now. `object` is a simpler type definition which we can resolve symbols to without crashing.
@carljm carljm force-pushed the builtins-resolution branch from 3ca68a0 to 0357af7 Compare July 18, 2024 20:24
@carljm carljm changed the base branch from main to cjm/simplify-benchmark July 18, 2024 20:25
@carljm
Copy link
Contributor

carljm commented Jul 18, 2024

It would be helpful for reviews if the summary could explain some of the newly introduced concepts.

Definitely agree, but it's not obvious to me which concepts in the PR were not well explained in the summary?

@MichaReiser
Copy link
Member

It would be helpful for reviews if the summary could explain some of the newly introduced concepts.

Definitely agree, but it's not obvious to me which concepts in the PR were not well explained in the summary?

I'm mainly interested in understanding newly introduced salsa queries and how they relate

@carljm
Copy link
Contributor

carljm commented Jul 18, 2024

I think we have to update our benchmarks first, for example by pre-parsing builtins

There's a cycle problem here, because we kind of need the builtins_module function added in this PR in order to add pre-parsing of builtins to the benchmarks. I guess I can split builtins_module out into its own PR, along with the change to the "without_parse" benchmark.

Base automatically changed from cjm/simplify-benchmark to main July 18, 2024 21:04
@AlexWaygood
Copy link
Member Author

I think we have to update our benchmarks first, for example by pre-parsing builtins

This would mean that our benchmarks wouldn't catch any performance regressions caused by upstream changes to typeshed's stubs for builtins. Mypy has had several of these in the past; I think it would be very useful if our benchmarks automatically flagged any performance degradations as part of the CI run for an automated PR syncing our vendored typeshed stubs.

@MichaReiser
Copy link
Member

MichaReiser commented Jul 18, 2024

I think we have to update our benchmarks first, for example by pre-parsing builtins

This would mean that our benchmarks wouldn't catch any performance regressions caused by upstream changes to typeshed's stubs for builtins. Mypy has had several of these in the past; I think it would be very useful if our benchmarks automatically flagged any performance degradations as part of the CI run for an automated PR syncing our vendored typeshed stubs.

I think we want more benchmarks. The once we have today are intentionally narrow in scope so that they're very sensitive to overhead in the type inference machinery

@AlexWaygood
Copy link
Member Author

I think we want more benchmarks. The once we have today are intentionally narrow in scope so that they're very sensitive to overhead in the type inference machinery

That makes sense. In that case, my instinct would be to update the benchmarks to use a custom typeshed directory with a minimal builtins stub, rather than using the vendored typeshed builtins stub.

@carljm
Copy link
Contributor

carljm commented Jul 18, 2024

I think the without_parse red-knot benchmark should exclude parsing builtins also, and the other benchmarks can be left as-is (the "incremental" one should automatically exclude parsing builtins, since builtins won't have changed, and the "cold" one should include parsing builtins.) I'm already working on a PR for this.

I don't think we should use a fake builtins for the benchmarks.

@carljm
Copy link
Contributor

carljm commented Jul 18, 2024

There's a cycle problem here

Also this was wrong, it's easy enough to just create a VendoredPath and call vendored_path_to_file in the benchmark, we don't need to pull in anything from this PR.

@carljm
Copy link
Contributor

carljm commented Jul 18, 2024

#12395 adds builtins pre-parsing to the without-parse benchmark, and #12396 fixes what looks like reversed naming of benchmarks. This PR is rebased on both.

@carljm carljm changed the base branch from main to cjm/fix-benchmark-naming July 18, 2024 21:40
@carljm carljm force-pushed the cjm/fix-benchmark-naming branch from 5600b49 to d901769 Compare July 18, 2024 21:45
@carljm carljm force-pushed the builtins-resolution branch from bc8aa77 to 4239eb2 Compare July 18, 2024 21:46
@carljm
Copy link
Contributor

carljm commented Jul 18, 2024

Hmm, it doesn't seem like the fixes I made to avoid redundant globals/builtins queries made a big dent in the perf regression here, so something I don't understand is still going on. Trying to dig into the CodSpeed data to understand what it could be.

@carljm
Copy link
Contributor

carljm commented Jul 19, 2024

Ok, after poring over the CodSpeed flame graphs for a while, my conclusion is that in the non-incremental benchmarks (cold and without_parse) the main difference is that we are now paying for semantic indexing of a very large file (builtins.pyi) which is many orders of magnitude larger than any file the benchmark previously touched, and in the incremental benchmark we pay for deep validation of that semantic index and the other ingredients depending on it.

I also pored over the traces from locally linting the same files that the benchmark runs on, and I didn't see any issues in the traces: it looked to me like we are doing the work we expect to do.

One way we could potentially reduce this cost would be to semantic-index by scope instead of by file? But this might be over-indexing on the current example, where we use very little of a large file; in real-world large projects I expect the proportional cost of semantic indexing for stuff we don't use would be much, much lower.

At this point I am open to further exploration, but my inclination based on what I've seen is that this regression is accurate based on adding semantic index of a much larger file, and we should merge it and keep paying attention to the benchmarks as we go; once we are able to check a much larger real-world program, we should take a careful look at where the bottlenecks are.

@carljm carljm changed the base branch from cjm/fix-benchmark-naming to cjm/fix-incremental-bench July 19, 2024 01:09
@carljm carljm force-pushed the builtins-resolution branch from 353c486 to 6df0ac3 Compare July 19, 2024 01:09
@carljm carljm force-pushed the cjm/fix-incremental-bench branch from ac32227 to 5afe2a2 Compare July 19, 2024 14:59
Base automatically changed from cjm/fix-incremental-bench to main July 19, 2024 15:32
@carljm carljm force-pushed the builtins-resolution branch from 6df0ac3 to 3d58005 Compare July 19, 2024 15:41
@carljm carljm force-pushed the builtins-resolution branch from 3d58005 to 44977e3 Compare July 19, 2024 15:42
@AlexWaygood
Copy link
Member Author

AlexWaygood commented Jul 19, 2024

(It doesn't look like I have permissions to acknowledge the perf regression on CodSpeed. Somebody else might have to do that for me -- or give me permission to do so ;)

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think we should also add a test that first-party builtins.py doesn't override the builtin one.

@AlexWaygood
Copy link
Member Author

Looks good! I think we should also add a test that first-party builtins.py doesn't override the builtin one.

I added that test in the latest push ;)

@AlexWaygood AlexWaygood merged commit d8cf8ac into main Jul 19, 2024
20 checks passed
@AlexWaygood AlexWaygood deleted the builtins-resolution branch July 19, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants