-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing mypyc support pt. 2 #2431
Conversation
Pretty much marking a few tests as straight incompatible or marking certain functions with monkeypatching support. Also address the lack of 3.8 or higher support in src/black/parsing.py caused by my original changes. Finally tweak the compile configuration.
Typing the ast3 / ast27 variables as Any breaks mypyc. More details in the comment added in this commit. Many thanks goes to Jelle for helping out, with their insight I was able to find a workaround.
before .whl was 1.3M and now it is 1.2M
The `platform=linux` was causing mypy (and therefore mypyc) to believe any `if sys.platform == "win32":` branch would never be taken. That led to `Reached allegedly unreachable code!` crashes because of safety checks mypyc adds. There's not a strong reason for pinning the platform in `mypy.ini` so with the agreement of Jelle, it's gone now! I also disabled CliRunner's exception catching feature because while it does lead to nicer test failures when an exception goes unhandled, it also removes traceback information which makes debugging a pain. P.S. You can blame Windows Update for the slowness of this bugfix :p
Unreachable code has either led to mypyc being unable to analyze or compile Black, or has led to runtime failures (like the one the previous commit fixed - although mypy wouldn't be able to catch it).
Hardcoding the targets isn't great since we will probably forget to add paths to the list as new files are created. Inspired from mypy's setup for mypyc.
- early binding; plus - reordering of if checks to reduce work / hit the happy case more often; plus - a few more tiny mypyc-specific tweaks
Honestly at this point, I was tired of optimizing and only attempted basic and safe optimizations. I hope this actually makes a performance impact :p
Mostly just dataclasses nonsense + more pain from the TOML bare CR thing ... ugh
diff-shades depends on reformat_many being monkeypatchable so it can calculate files and black.Mode without copying and pasting a bunch of parsing, file discovery, and configuration code. On the other hand, I've been working on blackbench and upcoming version 21.8a1 is now compatible with the driver convert changes.
Nice work. FWIW - I'm totally down for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work on this! I'm really excited to see this PR.
src/black/strings.py
Outdated
@@ -142,6 +152,11 @@ def normalize_string_prefix(s: str, remove_u_prefix: bool = False) -> str: | |||
return f"{new_prefix}{match.group(2)}" | |||
|
|||
|
|||
@lru_cache(maxsize=256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really help? re.compile
already caches things internally I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the source and testing locally, I can confirm that both re and regex cache compiled patterns. Somehow I still observe a 8-9% speed up on that strings list literal file. Running this test script:
import time
from functools import lru_cache
import re
import regex
_cached_regex_compile = lru_cache(regex.compile)
# This is *roughly* how many times regex.compile was called formatting strings-list.py
loops = 410
pattern = r"\s*\t+\s*(\S)"
re.compile(pattern)
regex.compile(pattern)
t0 = time.perf_counter()
for i in range(1, loops + 1):
re.compile(pattern)
elapsed = time.perf_counter() - t0
print(f"{loops} cached compiles with re: {elapsed * 1000:.1f}ms")
t0 = time.perf_counter()
for i in range(1, loops + 1):
regex.compile(pattern)
elapsed = time.perf_counter() - t0
print(f"{loops} cached compiles with regex: {elapsed * 1000:.1f}ms")
t0 = time.perf_counter()
for i in range(1, loops + 1):
_cached_regex_compile(pattern)
elapsed = time.perf_counter() - t0
print(f"{loops} cached compiles with regex + lru_cache: {elapsed * 1000:.1f}ms")
I get:
- 410 cached compiles with re: 0.4ms
- 410 cached compiles with regex: 1.6ms
- 410 cached compiles with regex + lru_cache: 0.1ms
So it seems like this is the source of the slight speedup, but oddly enough this only accounts for ~half of the 8-9% speedup I saw for fmt-strings-list. Maybe the fact this script only repeatedly compiles one regex is why the delta isn't 1:1. For real life cases this speedup is probably minimal so I wouldn't mind reverting this to be less confusing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks for the extra information. I looked at re.compile
and it calls a helper function (_compile
) and does an isinstance
call before looking at the cache, and the cache key is a tuple type(pattern), pattern, flags
. I suppose that adds up to more overhead than whatever lru_cache does internally.
- I don't actually have to go on a rant about tomli and bare CR it turns out. - Subclassing KeyError isn't actually *that* important for one custom exception.
Andddd it's broken now because of PR #2557 - https://github.com/ichard26/black-mypyc-wheels/runs/4040805225?check_suite_focus=true#step:5:197 |
Still broken since the test suite is still dependent on CWD being something specific :( gonna fix this later |
May help unblock psf#2431
Whoops this isn't PyPy compatible ATM, will fix. |
It's finally happening 🎉, the initial work was done several years ago in GH-1009 by @ msullivan is now being followed through by yours truly. If this work doesn't find its way to PyPI eventually, I'll be disappointed 🙂
Performance
I wrote an entire report which can be read here: https://gist.github.com/ichard26/b996ccf410422b44fcd80fb158e05b0d, but the TL;DR is:
Windows numbers are to come later. Unfortunately I probably won't be able to get MacOS numbers (I own no MacOS running machine), but I'll see what I can do.
Stability
I've been running the test suite during the wheel build and as of latest of this branch it's all green. Though it should be noted that the arm64 and the arm portion of the universial2 wheels are completely untested since we don't have access to any M1 CI resources :(
Locally I've ran diff-shades (Jelle, it's basically mypy-primer but for black) against the CPython 3.8 manylinux wheel and no difference between interpreted and compiled Black was found (across the same project suite used by black-primer + blackbench 😉). I plan to also test Windows with diff-shades in the following days. I'll also get Cooper to test Intel MacOS + M1 MacOS for me, but I need to come up with better instructions before that.
Anyhow, please don't expect this to be fully tested. I plan to ask the wider community to test the experimental compiled wheels (and push any additional fixes) after this PR is merged.
Review notes
This PR contains a lot of things including QOL improvements, optimizations, compatibility changes, hacks, etc. While the diff isn't that painful, following the commit history should give you some context on why certain changes were made.
It should be noted that I plan to handle the merge of this PR myself since this project involves a lot of moving parts and I'd like to maintain control for the time being. Thanks!
Related issue: GH-366
Related repo: https://github.com/ichard26/black-mypyc-wheels
And if you're some passerby who really wants to try the experimental wheels, checkout the workflows here: https://github.com/ichard26/black-mypyc-wheels/actions.