-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pluggable optimizer API #104584
Comments
Note that the above API is just the initial version to support our work on speeding up Python 3.13. |
* Add test for long loops * Clear ENTER_EXECUTOR when deopting code objects.
Added a new, experimental, tracing optimizer and interpreter (a.k.a. "tier 2"). This currently pessimizes, so don't use yet -- this is infrastructure so we can experiment with optimizing passes. To enable it, pass ``-Xuops`` or set ``PYTHONUOPS=1``. To get debug output, set ``PYTHONUOPSDEBUG=N`` where ``N`` is a debug level (0-4, where 0 is no debug output and 4 is excessively verbose). All of this code is likely to change dramatically before the 3.13 feature freeze. But this is a first step.
This effectively reverts bb578a0, restoring the original DEOPT_IF() macro in ceval_macros.h, and redefining it in the Tier 2 interpreter. We can get rid of the PREDICTED() macros there as well!
test_counter_optimizer() and test_long_loop() of test_capi now create a new function at each call. Otherwise, the optimizer counters are not the expected values when the test is run more than once.
…6171) test_counter_optimizer() and test_long_loop() of test_capi now create a new function at each call. Otherwise, the optimizer counters are not the expected values when the test is run more than once.
This produces longer traces (superblocks?). Also improved debug output (uop names are now printed instead of numeric opcodes). This would be simpler if the numeric opcode values were generated by generate_cases.py, but that's another project. Refactored some code in generate_cases.py so the essential algorithm for cache effects is only run once. (Deciding which effects are used and what the total cache size is, regardless of what's used.)
* Check eval-breaker in ENTER_EXECUTOR. * Make sure that frame->prev_instr is set before entering executor.
Instead of special-casing specific instructions, we add a few more special values to the 'size' field of expansions, so in the future we can automatically handle additional super-instructions in the generator.
This adds several of unspecialized opcodes to superblocks: TO_BOOL, BINARY_SUBSCR, STORE_SUBSCR, UNPACK_SEQUENCE, LOAD_GLOBAL, LOAD_ATTR, COMPARE_OP, BINARY_OP. While we may not want that eventually, for now this helps finding bugs. There is a rudimentary test checking for UNPACK_SEQUENCE. Once we're ready to undo this, that would be simple: just replace the call to variable_used_unspecialized with a call to variable_used (as shown in a comment). Or add individual opcdes to FORBIDDEN_NAMES_IN_UOPS.
Is documenting these planned for 3.13? |
Since we are removing them in 3.14, probably not. |
If possible, I would like to backport #120643 to Python 3.13, to solve a C99 compatibility issue. |
A bunch of PRs were merged; a quick glance at the PR list suggests that this issue can be closed as completed. Are there further work to be done? |
We discussed the optimizer API offline. In short: we should rip the API itself out, and just keep all of the code that does that actual optimizing. The API itself is constantly changing, poorly-defined, and undocumented. There's a bunch of infrastructure required just to test the API (not the actual optimizations we perform), and it introduces indirection and artificial boundaries into some pretty performance-sensitive stuff. Nobody it using it that we're aware of, nobody we've talked to is planning on using it, and frankly we don't want anyone to start using it. So let's remove it. |
Nitpick: I suggest opening a new issue to remove it. |
We need an API for optimizers to be plugged in to CPython.
The proposed model is that of client server, where the VM is the client and the optimizer is the server.
The optimizer registers with the VM, then VM calls the optimizer when hotspots are detected.
The API:
The semantics of a
PyExecutorObject
is that upon return from itsexecute
function, the VM state will have advancedN
instructions. WhereN
is a non-negative integer.Full discussion here: faster-cpython/ideas#380
This is not a replacement for PEP 523. That will need a PEP. We should get this working first, before we consider replacing PEP 523.
Linked PRs
ENTER_EXECUTOR
#106141-Xuops
#106908frame->stacktop
on optimizer error #108953ip_offset
and simplifyEXIT_TRACE
#108961JUMP_BACKWARD
#109347The text was updated successfully, but these errors were encountered: