-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-89279: In ceval.c, redefine some macros for speed #32387
Conversation
In particular, Py_DECREF, Py_XDECREF and Py_IS_TYPE are redefined as macros that completely replace the inline functions of the same name. These three came out in the top four of functions that (in MSVC) somehow weren't inlined. (There was another, _Py_atomic_load_32bit_impl, but it is more difficult.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In C with well defined scopes, it's fine to reuse the same variable name in nested scopes ("op" in this case). It's just more annoying to debug.
C code:
#include <stdio.h>
int main()
{
int x = 1;
{
int x = 2;
{
int x = 3;
printf("x = %i\n", x);
}
printf("x = %i\n", x);
}
printf("x = %i\n", x);
return 0;
}
Output:
x = 3
x = 2
x = 1
And yet, in I've done the rest you've asked, and merged origin/main. |
PS. Any suggestion on how to handle the The compiler says there are 24 separate places where this fails to inline just in |
I took care of this, it was relatively straightforward since the invocations in ceval.c all go through macros that in the end call |
So that the assert() in the macro will be checked in debug mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python/ceval.c changes LGTM, but I cannot review the 2 other modified files.
Python/ceval.c
Outdated
// bpo-45116: The MSVC compiler fails to inline these in PGO build, | ||
// and they're kind of important for performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might explain that MSVC can inline them, but it's just an arbitrary limit which cannot be configured.
neonene wrote "PR25244 told me the amount of code in _PyEval_EvalFrameDefault() is over the limit of PGO."
I suggest something like:
// bpo-45116: The MSVC compiler fails to inline these in PGO build, | |
// and they're kind of important for performance. | |
// bpo-45116: The MSVC compiler does not inline these 3 static inline functions in PGO build | |
// in _PyEval_EvalFrameDefault(), because this function is over the limit of PGO, | |
// and these limits which cannot be configured. Define them as macro to make sure that | |
// there are always inlined by the preprocessor. |
Maybe tomorrow, MSVC PGO limits will change or become configurable, and it will become possible to remove these macros on more recent MSVC versions.
I would prefer to not redefine _Py_atomic_load_relaxed() in ceval.c. PEP 7 was updated to require C11 to build Python 3.11, but C11 without optional features: without atomatic functions/variables. Python still requires pycore_atomic.h with an implementation which depends on the C compiler and on the platform. Maybe tomorrow we will be able to use C11 atomic variables, but I don't think that it's possible right now :-( |
Oh, I see. The simple version of |
Using the techniques I developed in faster-cpython/ideas#321 I couldn't find any uses of Py_INCREF in |
When we take a inlining log at profiling time (PGInstrument):
|
Thanks for explaining what I have a new slight mystery. I write a script that scans build.log and prints the top-10 most common non-inlined functions (in my 'inline' branch):
Looking at the last one,
This seems rather silly, especially since we're obviously freeing something and there's a code path that allocates. I am at a loss where exactly |
Hm, address sanitizer doesn't like |
I'm running out of time today. I will apply @vstinner's improvement of the macros later this week. (If you create a comment with a suggestion I can do it quicker.) @sweeneyde I'm not sure that it matters much whether |
Sounds reasonable to me, and it would be nice to not have to worry about different casting between the function and the macro, but I don't fully understand what |
Me neither, but I'm guessing it might have been caused by a shortcut I took where instead of running |
A full run where I had restored the
so I conclude that I should keep that expanded. |
With
I conclude that the macro is still a good idea. So the only thing left on my to-do list for this PR is Victor's beatification of the macros. |
With this PR as is, I just see:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion to reformat macro on mutliple lines, now as a GitHub suggestion (and with _Py_DECREF_SPECIALIZED).
Co-authored-by: Victor Stinner <vstinner@python.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, even if I didn't understand everything. But I trust the ones who ran extensive tests on Python benchmarks with different compiler options.
There are many compiler warnings. Are they related to the change?
GitHub Actions / Address sanitizer
Python/ceval.c#L2408
function called through a non-compatible type
Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
I took the liberty of fixing two obvious bugs. |
macOS failure is unrelated:
|
I now think that macrofying here is good only for stable APIs.
Maybe some pragma can fix this. I'll check the entries. |
I don’t see the problem with …DECREF_SPECIALIZED. It’s semantics are very straightforward. @sweeneyde WDYT? |
I think it would be sufficient to add a comment with something like "if this changes, the macro version in ceval.c should change accordingly". |
Though I do agree that _Py_DECREF_SPECIALIZED is not likely to change semantics, particularly not the non-debug version. |
Then, I'm fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks everyone for the help! |
In particular,
Py_DECREF
,Py_XDECREF
andPy_IS_TYPE
are redefined as macros that completely replace the inline functions of the same name. These three came out in the top four of functions that (in MSVC) somehow weren't inlined. (There was another,_Py_atomic_load_32bit_impl
, but it is more difficult.I got the top-N list using the recipe from this bpo message.
#89279