-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C-API for signalling monitoring events #111997
Comments
Do you want to propose an API for that? |
PEP-669 seems to discuss a mechanism to modify the bytestream for the purpose of implementing monitoring. Would access to this mechanism also be provided as part of this API? I haven't seen too much mention of this feature elsewhere. Source: https://peps.python.org/pep-0669/#rationale
IMO it would be very useful for an API of this nature to expose low-level interpreter state to monitoring tools. For example, if we are able to see events like |
In order to map source/line/column information to an offset, we need a way to build a Lines 190 to 444 in d67f947
That's purely internal code. I don't think we need C-API support for this, but a Python implementation would be nice. Or a static method on code objects to build the byte string from a Python list of positions. Given the complexity of the format and the risk of future changes, I think it's in CPython's responsibility to generate this format from something user friendly.
/* source location information */
typedef struct {
int lineno;
int end_lineno;
int col_offset;
int end_col_offset;
} _PyCompilerSrcLocation; This looks like a function could help that accepts a sequence of tuples of these four values and maps it to a We'd probably still have to reimplement it in Cython in order to generate the line table string in "older" Python versions. But we could at least rely on it from CPython 3.13 on. |
@markshannon, could you please comment whether you consider this the right approach? To me, a line table seems required to generate source level monitoring events. |
@markshannon any comments? Given that Python 3.12 broke an entire C-API feature, I consider it a blocker for 3.13 to fix it. |
What C-API was broken? |
C extensions can already register callbacks and set events using the Python API, so I assume we are talking about adding an API for C extensions to fire events. There will be two parts to such an API:
Handling locationsApart from We also need a specification for the table mapping offsets to locations. The existing table format is quite fiddly, and can only be searched linearly, so designing a new table format might be a good idea. Firing events from C extension codeAny potential API has to be more complex than for We will need 16 bits of data for each event location, to track active tools and which tool/location pairs are disabled. We will need to initialize this data, and to specify which event location maps to which event. With that in mind, and using the typedef struct _PyMonitoringState {
uint8_t active;
uint8_t opaque;
} PyMonitoringState;
typedef struct _PyCodeLikeObject PyCodeLikeObject;
/* Returns a version, which should be stored and passed to the next call of _PyMonitioringEnterScope.
Arguments:
previous_version: Points to a per code-like object value. The value must be set to 0 before first call per code-like object.
state_array: The array of all PyMonitoringStates for this code-like object.
Should be initialized to all zeroes before first call.
event_types: Array of the event types describing the event type for each state in state array.
length: The length of the state and event arrays, which must be the same.
*/
void _PyMonitoringEnterScope(uint64_t *previous_version, PyMonitoringState *state_array, uint8_t event_types, uint32_t length);
int _PyMonitoring_FirePyStartEvent(PyCodeLikeObject *codelike, uint32_t offset, PyMonitoringState *state); If performance is going to be an issue, then the above could be implemented as inline functions. Example usage.Suppose we have a code-like object that that should fire two events when called, We would need to describe the events: static const uint8_t EVENTS = { PY_START, PY_RETURN }; And the code-like object would need an array for the state and a version number. struct MyCodeLike {
PyObject_HEAD
PyMonitoringState monitoring_state[2];
uint64_t version;
/* Other fields */
}; The C code for the function: void init_code(MyCodeLike *code)
{
code->monitoring_state = {0, 0};
code->version = 0;
}
PyObject *call_code(MyCodeLike *code)
{
_PyMonitoringEnterScope(&code->version, code->monitoring_state, EVENTS, 2);
...
_PyMonitoring_FirePyStartEvent(code, 0, &code->monitoring_state[0]);
...
_PyMonitoring_FirePyReturnEvent(code, 1, &code->monitoring_state[1]);
...
} |
@scoder, would such API work for you? |
Hmm, looks like I didn't receive any email notifications for this ticket although I created it, am subscribed to it, and got explicitly mentioned. No idea why. Sorry. I've re-subscribed, just in case. @markshannon, I've done some guesswork to fill in the gaps of your proposal. Please correct me if I misunderstood something.
Comments from me:
|
More or less, yes. It is changed when the set of events being monitored changes. Listener registrations do not change it.
The ID also serves as an index into the locations table.
Broadly, yes. There are up to 8 possible listeners for events, so if the state is non-zero it represents a bit vector of active listeners.
I'm assuming that the layout of the code-like struct is opaque to the VM, it is up to Cython/pybind11/mypyc, etc. how to lay it out, so we can't deduce the offset from the pointer to the monitoring state. Thus we need both.
From the (as yet unspecified) location table and the offset ID.
Would one location per code-like object be enough?
|
Probably not. If the API requires a "start scope", "work in scope", "end scope" kind of state keeping, then Cython's While it's probably possible for user code to juggle with different code-like objects for a single code scope, it would be nicer if the API kept code scope and file position independent from each other. |
What is |
It is defined by the third-party code (generator). |
(I got an answer offline) |
@iritkatriel Is this still a (deferred) release blocker? |
Regarding the interaction with the existing After that, I got assertion failures in CPython when generating LINE events in assert(args[0] == (PyObject *)_PyFrame_GetCode(frame->f_frame)); The old API depended on frames to provide line information, the new one doesn't. The effect is that event creators now need to know whether there are listeners of the old API (which they probably can't know), or they always need to set up frames just in case, which is quite an annoyance given the new shiny monitoring API. Any idea what we can do about this? |
We can remove the assert, nothing will crash. How did Cython work with profiling prior to 3.12? Did you create a fake frame? |
Yes. We could keep doing that, but a) frames aren't "officially" exposed from the C-API and b) as I understand it, the goal of creating the monitoring C-API was to allow non-Python non-frame code to interact with the tracing infrastructure, so requiring frames seems counter-productive. I now found that coverage.py was actually updated to work directly with
The exception originates here and seems to trigger on generally all code objects that do not have byte code: cpython/Python/instrumentation.c Lines 1971 to 1974 in de19694
cpython/Python/instrumentation.c Lines 2253 to 2259 in de19694
I'm afraid that we might still run into a lot of similar assumptions about what is traceable code. |
Changing the code from |
…f argument counts. (pythonGH-119179) (cherry picked from commit 70b07aa) Co-authored-by: scoder <stefan_ml@behnel.de>
Should we make |
What are the open issues here currently? |
…f argument counts. (python#119179)
I'm closing this as complete. Let's create new issues for any followup work. |
…ents (pythonGH-123822) (cherry picked from commit 77c68b4) Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
Feature or enhancement
Proposal:
Language implementations for the CPython runtime (Cython, JIT compilers, regular expression engines, template engines, etc.) need a way to signal PEP-669 monitoring events to the registered listeners.
We need a way to create events and inject them into the monitoring system. Since events have more than one signature, we might end up needing more than one C-API function for this.
We need a way to map 3D source code positions (file name, line, character) to 1D integer offsets. Code objects help, but branches might cross source file boundaries, so that's more tricky. For many use cases, a mapping between (line, character) positions and an integer offset would probably suffice, although templating languages usually provide some kind of include commands, as does Cython. There should be some help in the C-API for building up such a mapping.
The reason why I think we need CPython's help for mapping indices is that both sides, listeners and even producers, need to agree on the same mapping. Sadly, the events don't include line/character positions directly but only a single integer. So event producers need a way to produce a number that event listeners like coverage analysers can map back to a source code position.
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018/61?u=scoder
Linked PRs
The text was updated successfully, but these errors were encountered: