-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The DISPATCH() macro is not as efficient as it could be (move PyThreadState.use_tracing) #87926
Comments
The DISPATCH() macro has two failings.
|
I am afraid the "Speed up check for tracing in interpreter dispatch" brought some backwards incompatible changes: yappi/_yappi.c:1261:9: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘use_tracing’; did you mean ‘tracing’? This is not mentioned in https://docs.python.org/3.10/whatsnew/3.10.html and I haven't noticed the use_tracing member being deprecated. I am confused. Should this happened? |
Fedora packages affected (that we know of now): greenlet: https://bugzilla.redhat.com/show_bug.cgi?id=1957784 |
At yappi/_yappi.c:1261 sets an undocumented field on a CPython internal data structure. What did you believe that was supposed to do? use_tracing is not documented anywhere. We could add the field back and ignore it, but I doubt that would help you much. |
If there is no C-API function that supports your needs, feel free to suggest one. |
Disclaimer: I have not written the code nor do I understand what is trying to achieve. I merely collect the data and report the problems to the package maintainers. It just seems to me that a non-underscored (and hence public) member variable on a non-underscored (and hence public) structure should not suddenly go missing. Although, I am not familiar with the rules that define what part of the API falls under https://www.python.org/dev/peps/pep-0497/ |
scikit-learn: https://bugzilla.redhat.com/show_bug.cgi?id=1958976 gcc: sklearn/cluster/_k_means_fast.c The usage comes from https://github.com/cython/cython/blob/master/Cython/Utility/Profile.c |
PEP-0497 is rejected; the active one is PEP-387, which says "backwards incompatibility" means preexisting code ceases to comparatively function after a change. Unfortunately, not all of the C API is documented, so unless it's explicitly marked private, people will use it :( |
But what does "use it" mean? As I said, we could keep the field and ignore it, but that seems worse. |
I don't think the PEP meant to restrict individual struct member such as this. For example, we were able to switch from byte code to word code without violating the intended rules. Consider asking Brett and Benjamin for clarification. I would think that if a new function were introduced to provide a reliable way to determine whether tracing was enabled, that would suffice for external packages to have a minimally disruptive migration path. |
I understand that some projects manually call the profile and/or trace functions, and temporarily set use_tracing 0 while calling these functions. Some projects restore use_tracing to the correct value (compute the efficient value), some projects simply set use_tracing to 1. I see 3 use cases:
We can add 3 functions:
PyThreadState_EnableTracing(tstate) would do something like:
If we added these functions, I can then add an implementation for Python 3.9 and older to my https://github.com/pythoncapi/pythoncapi_compat project for backward compatibility. The problem is that some projects also increase temporarily ts->tracing. Since I would like to make PyThreadState opaque, I would prefer to hide this access behind a function call as well. Maybe we need an API to call profile and/or trace functions? -- According to the bugzilla compiler errors:
It has already been fixed: It uses:
It uses:
It uses "tstate->use_tracing = 0;".
It uses:
Simplified code: -------------- static int __Pyx_TraceSetupAndCall(...)
{
...
tstate->tracing++;
tstate->use_tracing = 0;
} int __Pyx_use_tracing = 0;
#define __Pyx_TraceCall(funcname, srcfile, firstlineno, nogil, goto_error) \
if (nogil) { \
if (CYTHON_TRACE_NOGIL) { \
PyThreadState *tstate; \
PyGILState_STATE state = PyGILState_Ensure(); \
tstate = __Pyx_PyThreadState_Current; \
if (unlikely(tstate->use_tracing) && !tstate->tracing && \
(tstate->c_profilefunc || (CYTHON_TRACE && tstate->c_tracefunc))) { \
__Pyx_use_tracing = __Pyx_TraceSetupAndCall(&$frame_code_cname, &$frame_cname, tstate, funcname, srcfile, firstlineno); \
} \
PyGILState_Release(state); \
if (unlikely(__Pyx_use_tracing < 0)) goto_error; \
} \
} else { \
PyThreadState* tstate = PyThreadState_GET(); \
if (unlikely(tstate->use_tracing) && !tstate->tracing && \
(tstate->c_profilefunc || (CYTHON_TRACE && tstate->c_tracefunc))) { \
__Pyx_use_tracing = __Pyx_TraceSetupAndCall(&$frame_code_cname, &$frame_cname, tstate, funcname, srcfile, firstlineno); \
if (unlikely(__Pyx_use_tracing < 0)) goto_error; \
} \
} |
+1 for Victor's suggestions. It provides a reasonable way forward without locking in eval-loop implementation details that weren't intended to be public and frozen in time. |
A Cython issue report: cython/cython#4153 |
For the same reason that motivated this ticket, I think the functions should be inline functions. They should also take the current thread-state as argument, because that's probably known on the caller side already. I guess a macro would be fine, too. :) Cython previously used "use_tracing" directly because it needs to implement the exact same tracing/profiling behaviour as CPython, regardless of who called a Cython implemented function (Cython or CPython). Naming nit: Get/Is/UsesTracing? Also, given that a common use case seems to be "make sure tracing is disabled, do something, enable tracing if it was enabled", I think DisableTracing() should return the previous state. |
I just noticed that new C-API functions are probably useless for Cython since I think it will have to maintain the CFrame stack, so not to enable "use_tracing" for the (Python) caller but the current (Cython) function. This then means that we own The current CFrame as well as its "use_tracing" field and don't need any help from CPython in order to change the state. I'm not sure if this is any different for other users of the "use_tracing" field. |
The commit 28d28e0 caused a performance regression on Windows which is currently blocking the Python 3.10.0 final release: bpo-45116. Moroever, this issue introduced a incompatible C API change which is not documented in What's New in Python 3.10, and it doesn't provide any solution for projects broken by these changes. So far, the following projects are known to be broken by the change:
Would it be possible to:
By the way, I'm also disappointed that nothing was done to enhance the situation for 4 months (since the first known projects were reported here in May). I raise the priority to release blocker to make more people aware of the situation. |
Also Numba is broken: https://bugzilla.redhat.com/show_bug.cgi?id=2005686 |
Cython 0.29.24 released at July 13, 2021 with a fix (2 commits): The Cython master branch was fixed as well: see cython/cython#4153 |
IMO those failures are bugs in the projects listed not in CPython. Relying on the exact meaning, or even the existence of an undocumented field of a C struct is not, nor ever has been, safe. The code in the examples given above are using I propose adding back I'm minded to prefix all the names of all fields in all C structs that happen to be included by Python.h with "if_you_use_this_then_we_will_break_your_code_in_some_way_that_will_ruin_your_reputation_as_a_library_developer__maybe_not_tomorrow_maybe_not_next_year_but_someday" My attempts to avoid this happening again next year, and the year after that, and... |
No, actually not. It is using the field in the same way as CPython, simply because most of this code was originally copied from CPython, and we also copied the optimisation of avoiding to check the other fields (for the obvious reason of being an optimisation).
Cython 0.29.24 has already been adapted to this change and will use the new field in CPython 3.10b1+.
Any code that reads and /writes/ the field would probably also continue to work correctly, which is what older Cython versions did.
The thing is, new APIs can only be added to new CPython releases. Supporting features in older CPython versions (currently 2.7+) means that we always *have to* use the existing fields, and can only switch to new APIs by duplicating code based on a PY_VERSION_HEX preprocessor check. Even if a new low-latency profiling API was added in CPython 3.11, we'd have to wait until there is at least an alpha release that has it before enabling this code switch. And if the new API proves to be slower, we may end up keeping the old code around and adding a C compile-time configuration option for users to enable (or disable) its use. Cython has lots of those these days, mostly to support the different C-API capabilities of different Python implementations, e.g. to take advantage of the PyLong or PyUnicode internals if available, and use generic C-API calls if not. |
Is adding the field back an option at this point? It would mean that extensions compiled against the release candidates may not be binary compatible with the final release My take is that use_tracing is an implementation and version dependent field, and that binary compatibility will be maintained for a specific release (e.g. 3.10) but that there's no assurance that it will be there in the next release -- though these things tend not to change. I also regard generated cython code as only being valid for the releases that a specific cython version supports. Code and API's change slowly, but eventually they do change. |
Also, just to clarify, I also opened PR 28498 to discuss the possibility of going ahead, I still don't want to move on without consensus. |
Also, I personally thing there is absolutely no guarantee that Cython code generated for 3.9 should work for 3.10 and the thread state is a private structure that has undocumented fields and is not part of the stable API nor the limited API so, tstate->tracing disappearing is totally withing the guarantees between Python versions. |
I discussed this particular instance with the Steering Council and the conclusion was that this field (use_tracing) is considered an implementation detail and therefore its removal it's justified so we won't be restoring it. I'm therefore closing PR28498 Notice that this decision only affects this particular issue and should not be generalized to other fields or structures. We will try to determine and open a discusion in the future about what is considered public/private in these ambiguous cases and what can users expect regarding stability and backwards compatibility. |
I'm removing the release blocker as per above, feel free to close of there is nothing else to discuss or act on here. |
I'll just note that a change in struct size does technically break ABI, since *arrays* of PyThreadState will break. So the size shouldn't be changed in RCs or point releases. (But since it's not part of stable ABI, it was OK to change it for 3.10.)
Please keep me in the loop; I'm working on summarizing my understanding of this (in a form that can be added to the docs if approved). |
Not that matters now because we are not proceeding but just to clarify why I deemed this acceptable: arrays of PyThreadState is extremelly unlikely in extensions because we pass it by Pointer and is always manipulated by pointer. To place it in an array you either need to create one or copy one into an array, which I cannot see what would be the point because the fields are mainly pointers that would become useless as the interpreter will not update anything |
Also, I checked the DWARF tree of all existing wheels for 3.10 on PyPI (there aren't many) and none had anything that uses the size of the struct. |
I created PR 28527 to document PyThreadState.use_tracing removal and explain how to port existing code to Python 3.10. |
Analysis use use_tracing usage in 3rd part code. I see two main ways to add C API functions covering these use cases:
(*) greenlet greenlet disables temporarily tracing in g_calltrace(), and then restore it, to call a "tracing" function: It also saves and then restores use_tracing value: ts__g_switchstack_use_tracing = tstate->cframe->use_tracing;
(...)
tstate->cframe->use_tracing = ts__g_switchstack_use_tracing; => it can use PyThreadState_IsTracing(), PyThreadState_DisableTracing() and PyThreadState_ResetTracing(). These functions don't handle "tstate->tracing++;" and "tstate->tracing--;" which is also used by greenlet. greenlet also saves and restores tstate->cframe: (*) dipy Code generated by Cython. (*) smartcols Code generated by Cython. (*) yappi yappi is Python profiler. yappi sets use_tracing to 1 when it sets its profile function: "ts->c_profilefunc = _yapp_callback;". It sets use_tracing to 0 when it clears the profile function: "ts->c_profilefunc = NULL;". That's wrong, it ignores the trace function. PyEval_SetProfile() cannot be used because yappi works on a PyThreadState (ts). Code: https://github.com/sumerc/yappi/blob/master/yappi/_yappi.c It can use PyThreadState_DisableTracing() and PyThreadState_ResetTracing(). Maybe a PyThreadState_SetProfile(tstate, func) function would fit better yappi's use case. (*) Cython Cython defines 2 compatibility functions:
Code: https://github.com/cython/cython/blob/0.29.x/Cython/Utility/Profile.c The code is quite complicated. In short, it checks if tracing and/or profiling is enabled. If it's enabled, it disables temporarily tracing (use_tracing=0) while calling trace and profile functions. => it requires PyThreadState_IsTracing(), PyThreadState_DisableTracing() and PyThreadState_ResetTracing(). |
Ah, I think the docs need to be clarified a bit. Here's what I was missing: The key thing to know here is that there are three state variables; Normally Disabling means setting There's also a fourth variable, Would it be reasonable to just put these APIs in pythoncapi_compat, instead of in the stdlib? (It would be yet one more selling point for people to start using that. :-) |
I created changes to use it:
|
PyThreadState.cframe.use_tracing format changed again: set value set to 0 or 255. |
Merged:
|
I created #29121 to add PyThreadState_SetProfile() and PyThreadState_SetTrace() functions. |
greenlet now uses PyThreadState_EnterTracing() and PyThreadState_LeaveTracing() rather than accessing directly use_tracing: python-greenlet/greenlet@9b49da5 On Python 3.10, it implements these functions with: // bpo-43760 added PyThreadState_EnterTracing() to Python 3.11.0a2
#if PY_VERSION_HEX < 0x030B00A2 && !defined(PYPY_VERSION)
static inline void PyThreadState_EnterTracing(PyThreadState *tstate)
{
tstate->tracing++;
#if PY_VERSION_HEX >= 0x030A00A1
tstate->cframe->use_tracing = 0;
#else
tstate->use_tracing = 0;
#endif
}
#endif
// bpo-43760 added PyThreadState_LeaveTracing() to Python 3.11.0a2
#if PY_VERSION_HEX < 0x030B00A2 && !defined(PYPY_VERSION)
static inline void PyThreadState_LeaveTracing(PyThreadState *tstate)
{
tstate->tracing--;
int use_tracing = (tstate->c_tracefunc != NULL
|| tstate->c_profilefunc != NULL);
#if PY_VERSION_HEX >= 0x030A00A1
tstate->cframe->use_tracing = use_tracing;
#else
tstate->use_tracing = use_tracing;
#endif
}
#endif This code was copied from my https://github.com/pythoncapi/pythoncapi_compat project. (I wrote the greenlet change.) |
Is there anything left to do here? |
IMO we are done, and I close the issue. See the issue #84128 for making PyThreadState opaque.
This part was fixed early. Following comments were more about PyThreadState incompatible changes, how to migrate existing C extensions to Python 3.11 and how to design a new API which no longer access directly PyThreadState changes. In fact, that's already the topic of the issue #84128 that I created in 2020 and so we can continue the discussion there. The main change related to PyThreadState was the use_tracing member which was moved. I added PyThreadState_EnterTracing() and PyThreadState_LeaveTracing() functions to Python 3.11 for that and so projects already use it.
I abandoned this PR. @pablogsal added PyEval_SetProfileAllThreads() and PyEval_SetTraceAllThreads() functions to Python 3.12 (commit e34c82a) which should fit the use case, with a different design. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: