-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The symbolic value lattice used in the tier 2 optimizer needs documentation and tests. #115816
Comments
You might also want to add the docs to https://github.com/faster-cpython/ideas/blob/main/3.13/redundancy_eliminator.md. |
EDITED -- Everything is true for [NOTE -- top and bottom should be swapped, see https://github.com/faster-cpython/ideas/pull/658] Regarding tests, we need to test that:
|
Before we can even start writing tests we'll need to agree on the proper API for the symbols. |
If it were up to me, the API should be focused on how we use it:
For construction, I'd propose:
For updates, we'd get:
Note that if a symbols is known to be NULL and then we learn it is NULL, it must become top. For inquiries:
An orthogonal question is whether a symbol owns the references to the type and constant value.
As a result the constants must be DECREF'ed when the abstract interpretation context is destroyed. If symbols own their constant, do sym_new_const() and sym_set_const() "steal" a reference from the caller, or does the caller (if it owns a reference) have to DECREF the constant too? We must take into account the error case too. While sym_set_const() cannot fail, sym_new_const() can fail (it can run out of space for new symbols). Possible approaches: (A) sym_new_const() only steals a reference when it succeeds // value has a new reference
sym = sym_new_const(ctx, value);
if (sym == NULL) {
DECREF(value);
goto error;
} (B) sym_new_const() always steals the reference (even when it fails) // value has a new reference
sym = sym_new_const(ctx, value);
if (sym == NULL) {
goto error;
} (C) sym_new_const() never steals the reference // value has a new reference
sym = sym_new_const(ctx, value);
Py_DECREF(value);
if (sym == NULL) {
goto error;
} By comparison, for sym_set_const(), the call can never fail, so the equivalent cases are (A), (B) sym_set_const() always steals the reference // value has a new reference
sym_set_const(sym, value); (C) sym_set_const() never steals the reference // value has a new reference
sym_set_const(sym, value);
Py_DECREF(value); Of these, (C) is perhaps the least surprising API design (it's how most C APIs work) but (B) requires writing slightly less code. (A) feels the most error-prone. I think it's important for sym_new_const() and sym_set_const() to be consistent, which also argues against (A). (It seems we currently don't need sym_set_const(), so maybe this argument is moot -- though ISTM that |
PS. Mark's lattice diagram has a node representing "not None". This might require a separate set of APIs (we currently apparently don't need to test for "not None" yet). |
We should ensure that if we test for a property that's represented lower in the lattice (e.g. "not But what to do with Also, according to Wikipedia, in type theory, Bottom (⊥) is the type that has no values, whereas Top (⊤) is the type that encompasses all values. Maybe we should switch terminology? |
From Mark's list of tests I deduce that he wants the symbol representing the type without values to return false/NULL for all inquiries. IIRC I ran into this in the context of static type checking too. This feels pragmatic, as it is likely to cause the optimizer to make a conservative decision. It probably doesn't matter too much: if we ever encounter such a symbol we've reached an error condition or unreachable code, so the optimizer might as well insert |
So I've been thinking about how to export the symbols API in a way that we can easily write tests in Python. My idea is to export A test I don't see in Mark's list:
|
I agree. That should be true of all APIs.
Yes, it is confusing, which is why we need clear docs. |
We will want to reuse a memory buffer (probably per-thread) when optimizing, so the optimizer will need to initialize the context from a buffer. The test can take a |
Regarding |
Further thoughts on It is useful to be able to query a symbol, extract a value and get a meaningful result. |
|
Yes. Let's use "unknown" for "top" and "contradiction" for "bottom". It will hopefully save some confusion.
Whoever calculates it. Although they could transfer ownership to the executor if we want to add constants to executors.
It would be surprising if a value that was a constant is no longer a constant when we add more information. Which means that
Maybe, but it is shorter than |
Ah, so the current code in (I'm okay with the rest of your response.) |
Anyway, Mark designed it differently, writing the tests in C. That's fine too. |
…ability. (GH-115987) * Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol. * #define shortened form of _Py_uop_... names for improved readability.
Mark and I have discussed this some more. I am going to take over making all the tests pass. Some things we discussed:
|
Question: What's the better behavior for Bottom (Contradiction)? Should it have all properties (e.g. sym_is_null() and sym_is_not_null() both true) or none (both false)? Naive application of the rules would imply it has all properties (e.g. is null and not-null, matches any type, and is a constant), but that's not necessarily the most useful, and it breaks when we ask for its constant value. Right now I am coding things so that all properties are false for Bottom, it matches no types, and it has no constant. |
That sounds right. |
- Any `sym_set_...` call that attempts to set conflicting information cause the symbol to become `bottom` (contradiction). - All `sym_is...` and similar calls return false or NULL for `bottom`. - Everything's tested. - The tests still pass with `PYTHONUOPSOPTIMIZE=1`.
We have tests now, but I'm leaving this open because the API isn't yet documented. |
…intainability. (pythonGH-115987) * Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol. * #define shortened form of _Py_uop_... names for improved readability.
…6028) - Any `sym_set_...` call that attempts to set conflicting information cause the symbol to become `bottom` (contradiction). - All `sym_is...` and similar calls return false or NULL for `bottom`. - Everything's tested. - The tests still pass with `PYTHONUOPSOPTIMIZE=1`.
…op prefix (python#116077) This was left behind by pythonGH-115987. Basically a lot of diffs like this: ``` - res = _Py_uop_sym_new_unknown(ctx); + res = sym_new_unknown(ctx); ```
…intainability. (pythonGH-115987) * Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol. * #define shortened form of _Py_uop_... names for improved readability.
…6028) - Any `sym_set_...` call that attempts to set conflicting information cause the symbol to become `bottom` (contradiction). - All `sym_is...` and similar calls return false or NULL for `bottom`. - Everything's tested. - The tests still pass with `PYTHONUOPSOPTIMIZE=1`.
…op prefix (python#116077) This was left behind by pythonGH-115987. Basically a lot of diffs like this: ``` - res = _Py_uop_sym_new_unknown(ctx); + res = sym_new_unknown(ctx); ```
…intainability. (pythonGH-115987) * Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol. * #define shortened form of _Py_uop_... names for improved readability.
…6028) - Any `sym_set_...` call that attempts to set conflicting information cause the symbol to become `bottom` (contradiction). - All `sym_is...` and similar calls return false or NULL for `bottom`. - Everything's tested. - The tests still pass with `PYTHONUOPSOPTIMIZE=1`.
…op prefix (python#116077) This was left behind by pythonGH-115987. Basically a lot of diffs like this: ``` - res = _Py_uop_sym_new_unknown(ctx); + res = sym_new_unknown(ctx); ```
The behavior of the symbolic values used in the tier 2 optimizer have a solid theoretical basis, but there is some subtlety involved.
The symbolic values and the lattice they form needs to be well documented and tested.
What are the symbolic values?
Each symbolic value represents not a real object, but knowledge about the set of objects that could be present at a given point in the code. This is powerful, but not intuitive.
Example:
At line 3, the symbolic value for
x
isnot None
, whereas the symbolic value forx
at line 0 isunknown
.However, any object referred to
x
is unchanged byfoo
. It is easy to think in terms of objects, but it is really a type system that we are representing.Why we need a lattice
Example:
and this trace:
The trace is a correct trace; it correctly represents one path through the function.
So what symbolic value does
x
have for line 5? It is bothNone
andnot None
.This is why the lattice needs a
top
value, even though it superficially makes no sense.If a value is
top
for a location, then that location is unreachable.Linked PRs
The text was updated successfully, but these errors were encountered: