Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluggable optimizer API #104584

Closed
markshannon opened this issue May 17, 2023 · 8 comments
Closed

Pluggable optimizer API #104584

markshannon opened this issue May 17, 2023 · 8 comments
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@markshannon
Copy link
Member

markshannon commented May 17, 2023

We need an API for optimizers to be plugged in to CPython.

The proposed model is that of client server, where the VM is the client and the optimizer is the server.
The optimizer registers with the VM, then VM calls the optimizer when hotspots are detected.

The API:

type struct {
    OBJECT_HEADER;
    _PyInterpreterFrame *(*execute)(PyExecutorObject *self, _PyInterpreterFrame *frame, PyObject **stack_pointer);
    /* Data needed by the executor goes here, but is opaque to the VM */
} PyExecutorObject;

/* This would be nicer as an enum, but C doesn't define the size of enums */
#define PY_OPTIMIZE_FUNCTION_ENTRY 1
#define PY_OPTIMIZE_RESUME_AFTER_YIELD 2
#define PY_OPTIMIZE_BACK_EDGE 4
typedef uint32_t PyOptimizerCapabilities;

type struct {
    OBJECT_HEADER;
    PyExecutorObject *(*compile)(PyOptimizerObject* self, PyCodeObject *code, int offset);
    PyOptimizerCapabilities capabilities;
    float optimization_cost;
    float run_cost;
    /* Data needed by the compiler goes here, but is opaque to the VM */
} PyOptimizerObject;

void _Py_Executor_Replace(PyCodeObject *code, int offset, PyExecutorObject *executor);

int _Py_Optimizer_Register(PyOptimizerObject* optimizer);

The semantics of a PyExecutorObject is that upon return from its execute function, the VM state will have advanced N instructions. Where N is a non-negative integer.

Full discussion here: faster-cpython/ideas#380

This is not a replacement for PEP 523. That will need a PEP. We should get this working first, before we consider replacing PEP 523.

Linked PRs

@markshannon markshannon added performance Performance or resource usage 3.13 bugs and security fixes labels May 17, 2023
@markshannon
Copy link
Member Author

markshannon commented May 18, 2023

Note that the above API is just the initial version to support our work on speeding up Python 3.13.
It will probably need to be extended to support PyTorch Dynamo and other users of PEP 523 that cannot use PEP 669, but that is for another issue.

markshannon added a commit that referenced this issue Jun 19, 2023
* Add test for long loops

* Clear ENTER_EXECUTOR when deopting code objects.
gvanrossum added a commit that referenced this issue Jun 27, 2023
Added a new, experimental, tracing optimizer and interpreter (a.k.a. "tier 2"). This currently pessimizes, so don't use yet -- this is infrastructure so we can experiment with optimizing passes. To enable it, pass ``-Xuops`` or set ``PYTHONUOPS=1``. To get debug output, set ``PYTHONUOPSDEBUG=N`` where ``N`` is a debug level (0-4, where 0 is no debug output and 4 is excessively verbose).

All of this code is likely to change dramatically before the 3.13 feature freeze. But this is a first step.
gvanrossum added a commit that referenced this issue Jun 27, 2023
This effectively reverts bb578a0, restoring the original DEOPT_IF() macro in ceval_macros.h, and redefining it in the Tier 2 interpreter. We can get rid of the PREDICTED() macros there as well!
vstinner added a commit to vstinner/cpython that referenced this issue Jun 28, 2023
test_counter_optimizer() and test_long_loop() of test_capi now create
a new function at each call. Otherwise, the optimizer counters are
not the expected values when the test is run more than once.
vstinner added a commit that referenced this issue Jun 28, 2023
…6171)

test_counter_optimizer() and test_long_loop() of test_capi now create
a new function at each call. Otherwise, the optimizer counters are
not the expected values when the test is run more than once.
gvanrossum added a commit that referenced this issue Jun 28, 2023
This produces longer traces (superblocks?).

Also improved debug output (uop names are now printed instead of numeric opcodes). This would be simpler if the numeric opcode values were generated by generate_cases.py, but that's another project.

Refactored some code in generate_cases.py so the essential algorithm for cache effects is only run once. (Deciding which effects are used and what the total cache size is, regardless of what's used.)
markshannon added a commit that referenced this issue Jul 3, 2023
* Check eval-breaker in ENTER_EXECUTOR.

* Make sure that frame->prev_instr is set before entering executor.
gvanrossum added a commit that referenced this issue Jul 7, 2023
Instead of special-casing specific instructions,
we add a few more special values to the 'size' field of expansions,
so in the future we can automatically handle
additional super-instructions in the generator.
gvanrossum added a commit that referenced this issue Jul 7, 2023
This adds several of unspecialized opcodes to superblocks:

TO_BOOL, BINARY_SUBSCR, STORE_SUBSCR,
UNPACK_SEQUENCE, LOAD_GLOBAL, LOAD_ATTR,
COMPARE_OP, BINARY_OP.

While we may not want that eventually, for now this helps finding bugs.

There is a rudimentary test checking for UNPACK_SEQUENCE.

Once we're ready to undo this, that would be simple:
just replace the call to variable_used_unspecialized
with a call to variable_used (as shown in a comment).
Or add individual opcdes to FORBIDDEN_NAMES_IN_UOPS.
@iritkatriel iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 27, 2023
@encukou
Copy link
Member

encukou commented Jun 20, 2024

Is documenting these planned for 3.13?

@markshannon
Copy link
Member Author

Since we are removing them in 3.14, probably not.

@vstinner
Copy link
Member

Since we are removing them in 3.14, probably not.

If possible, I would like to backport #120643 to Python 3.13, to solve a C99 compatibility issue.

Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
@erlend-aasland
Copy link
Contributor

A bunch of PRs were merged; a quick glance at the PR list suggests that this issue can be closed as completed. Are there further work to be done?

@brandtbucher
Copy link
Member

We discussed the optimizer API offline. In short: we should rip the API itself out, and just keep all of the code that does that actual optimizing.

The API itself is constantly changing, poorly-defined, and undocumented. There's a bunch of infrastructure required just to test the API (not the actual optimizations we perform), and it introduces indirection and artificial boundaries into some pretty performance-sensitive stuff. Nobody it using it that we're aware of, nobody we've talked to is planning on using it, and frankly we don't want anyone to start using it. So let's remove it.

@vstinner
Copy link
Member

vstinner commented Nov 8, 2024

Nitpick: I suggest opening a new issue to remove it.

@brandtbucher
Copy link
Member

GH-126599

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

7 participants
@vstinner @encukou @iritkatriel @markshannon @erlend-aasland @brandtbucher and others