-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-75459: Doc: C API: Improve object life cycle documentation #125962
base: main
Are you sure you want to change the base?
Conversation
I'm not familiar enough with CPython's internals to be super confident about these changes. I would appreciate it if a GC expert would carefully review this. Thanks! |
Hmm, I'm not sure if we can require graphviz for the docs. We'd have to consider installing it on the main docs server in addition to Read the Docs, and also make sure the docs can still build without it, for downstream redistributors who might only want to build with "vanilla" Sphinx and no extra extensions. Plus other developers would need an easy way to build the docs on their machines. cc @AA-Turner |
be22691
to
d804c6c
Compare
Maybe I should just commit the generated |
* Add "cyclic isolate" to the glossary. * Add a new "Object Life Cycle" page. * Illustrate the order of life cycle functions. * Document `PyObject_CallFinalizer` and `PyObject_CallFinalizerFromDealloc`. * `PyObject_Init` does not call `tp_init`. * `PyObject_New`: * also initializes the memory * does not call `tp_alloc`, `tp_new`, or `tp_init` * should not be used for GC-enabled objects * memory must be freed by `PyObject_Free` * `PyObject_GC_New` memory must be freed by `PyObject_GC_Del`. * Warn that garbage collector functions can be called from any thread. * `tp_finalize` and `tp_clear`: * Only called when there's a cyclic isolate. * Only one object in the cyclic isolate is finalized/cleared at a time. * Clearly warn that they might not be called. * They can optionally be manually called from `tp_dealloc` (via `PyObject_CallFinalizerFromDealloc` in the case of `tp_finalize`). * `tp_finalize`: * Reference `object.__del__`. * The finalizer can resurrect the object. * Suggest `PyErr_GetRaisedException` and `PyErr_SetRaisedException` instead of the deprecated `PyErr_Fetch` and `PyErr_Restore` functions. * Add links to `PyErr_GetRaisedException` and `PyErr_SetRaisedException`. * Suggest using `PyErr_WriteUnraisable` if an exception is raised during finalization. * Rename the example function from `local_finalize` to `foo_finalize` for consistency with the `tp_dealloc` documentation and as a hint that the name isn't special. * Minor wording and sylistic tweaks. * Warn that `tp_finalize` can be called during shutdown.
I committed the generated |
I think that requiring I would want to include a NEWS entry to say that A |
My main concern with documenting nitty-gritty details of the lifecycle is that we're technically documenting implementation details, which are subject to change (and we've been bad at updating these kind of things from version-to-version in past). I suggest the SVG go into the InternalDocs folder instead. It's also worth noting here that |
This reverts commit 361eaca.
I don't want to document any implementation details here, so I'm happy to remove what isn't necessary. It's hard to tell what is and isn't necessary because the end of an object's life is especially fraught with peril. I think that it is better to err on the side of over-documenting this topic than under-documenting. I wrote this PR because there were several things that I needed to know that the existing documentation didn't make clear:
If I understand correctly,
I thought that was already sufficiently explained, even before this PR. Can you explain what you think is lacking? |
First, thanks for doing this!
I don't, that limits our ability to modify the lifecycle in the future (especially because there's no good way to deprecate things here). I'll point this out when doing a more in-depth review though, I don't see anything particularly bad right now.
You're right, it's not, but I don't think we should limit ourselves to that in the future. It might be possible someday to automatically do this for untracked types as well. We should just document that all types, even GC, require
Basically, it's not documented which types need to have a Also, I don't think it should be documented that A few other notes:
|
I did a major rewrite to hopefully address all of the feedback (thanks for reviewing!). Please take another look. |
Thank you very much for taking this on, @rhansen. I often wanted this documented better, to answer questions like:
As you know, the API grew organically -- into quite a maze. I don't think anyone has all of this in their head, so I think we need a two-stage review process here:
I'm in stage 1 now: what I write here might not necessarily be correct, I'll need to re-check it all later.
I'd prefer that. Even if Graphviz is easy to install (and it not always is), it's an additional obstacle, unless you already have it.
IMO, we can do without the clickable parts, especially since the explanation below should have the same links. Would you be OK with leaving this, as a HTML-specific enhancement, to a later PR?
Here's my take on the graph.
(click for source)digraph G {
compound=true
graph [
fontname="svg"
fontsize=10.0
layout="dot"
ranksep=0.25
]
node [
fontname="Monospace"
fontsize=10.0
]
edge [
fontsize=10.0
]
subgraph cluster_type_call {
label="Type call"
shape=box
{
rank="same"
"tp_new" [href="typeobj.html#c.PyTypeObject.tp_new"]
"tp_alloc" [href="typeobj.html#c.PyTypeObject.tp_alloc"]
"tp_new" -> "tp_alloc" [label=" calls " style=bold arrowhead=open]
}
"tp_init" [href="typeobj.html#c.PyTypeObject.tp_init"]
"tp_new" -> "tp_init" [label=" (tp_init can be \n bypassed \n or repeated) "]
}
"reachable" [fontname="Times-Italic" shape=box label=" object \n is in use "]
{
rank="same"
"gc" [fontname="Times-Italic" shape=box label=" cyclic isolate \n detection "]
"tp_traverse" [href="typeobj.html#c.PyTypeObject.tp_traverse"]
"gc" -> "tp_traverse" [label=" calls " style=bold arrowhead=open]
}
"is_finalized" [label=" tp_finalize \n called \n before? " shape=diamond fontname="Times-Italic" ]
"tp_finalize" [href="typeobj.html#c.PyTypeObject.tp_finalize"]
"reachable" -> "gc"
"gc" -> "reachable" //[label=" still reachable "]
"gc" -> "is_finalized" [label=" unreachable "]
"is_finalized" -> "tp_finalize" [label="no"]
"is_finalized" -> "tp_clear" [label="yes"]
"tp_clear" [href="typeobj.html#c.PyTypeObject.tp_clear"]
"tp_dealloc" [href="typeobj.html#c.PyTypeObject.tp_dealloc"]
"tp_free" [href="typeobj.html#c.PyTypeObject.tp_free"]
"tp_init" -> "reachable" [ltail=cluster_type_call]
"tp_finalize" -> "tp_clear"
"tp_finalize" -> "reachable" [label=" resurrected "]
"tp_dealloc" -> "tp_finalize" [
style=dashed
arrowhead=open
href="lifecycle.html#c.PyObject_CallFinalizerFromDealloc"
label="recommended call\n(see note)"
]
"tp_clear" -> "tp_dealloc"
"garbage" [fontname="Times-Italic" shape=box label="uncollectable\n(gc.garbage)"]
"tp_clear" -> "garbage"
"tp_clear" -> "reachable" [label=" resurrected "]
"tp_finalize" -> "tp_dealloc"
"tp_dealloc" -> "tp_free" [label=" calls " style=bold arrowhead=open]
"reachable" -> "tp_dealloc" [label=" unused \n (refcount \n dropped \n to 0) "]
} |
Thank you for the thoughtful review @encukou!
👍
I actually kinda like the current accessibility solution—it reuses the "Explanation" text below the image as the overall image description (at least that's the intention), and hovering over each component in the image shows alt text specific to that component. I'm not sure how to recreate that if we switch to Is the reason you want to switch to |
Thank you! The new graph looks great at first glance. I'll get to a more thorough review this or next week.
The main issue is that it's HTML-only, so in PDF or EPUB it won't appear at all. |
I'll do a review sometime today :) Edit: Still working on the review, this is quite a big PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great PR! I have a lot of comments, though many of them are up for debate--if you have a different idea or don't want to change something, I'm happy to yield :)
Thank you so much for the thoughtful and thorough review, @ZeroIntensity! I'm a firm believer in code reviews, so I appreciate you taking the time to carefully go over everything. I'm going to mark this as draft until I have completed another pass to address your feedback. I'll mark it as ready to review once I'm satisfied that it is ready for another look. In the meantime, feel free to add more comments if you think of anything. |
I'll start the second pass today, but it will definitely take a few days for me to finish the whole review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay.
I've not been able to go through the entire PR, but I'm sending the comments I already wrote. I hope they make sense.
I'll make another pass later.
#life_events_graph { | ||
/* | ||
* Unfortunately these colors don't seem to be exposed in any of the theme's | ||
* variables, so they are manually copied here. | ||
*/ | ||
--svg-light-fgcolor: black; | ||
--svg-light-bgcolor: white; | ||
--svg-dark-fgcolor: rgba(255, 255, 255, 0.87); | ||
--svg-dark-bgcolor: #222; | ||
--svg-fgcolor: var(--svg-light-fgcolor); | ||
--svg-bgcolor: var(--svg-light-bgcolor); | ||
} | ||
@media (prefers-color-scheme: dark) { | ||
#life_events_graph { | ||
--svg-fgcolor: var(--svg-dark-fgcolor); | ||
--svg-bgcolor: var(--svg-dark-bgcolor); | ||
} | ||
} | ||
@media (prefers-color-scheme: light) { | ||
#life_events_graph { | ||
--svg-fgcolor: var(--svg-light-fgcolor); | ||
--svg-bgcolor: var(--svg-light-bgcolor); | ||
} | ||
} | ||
:root:has(#pydoctheme_dark_css[media="not all"]) #life_events_graph { | ||
--svg-fgcolor: var(--svg-light-fgcolor); | ||
--svg-bgcolor: var(--svg-light-bgcolor); | ||
} | ||
:root:has(#pydoctheme_dark_css[media="all"]) #life_events_graph { | ||
--svg-fgcolor: var(--svg-dark-fgcolor); | ||
--svg-bgcolor: var(--svg-dark-bgcolor); | ||
} | ||
#life_events_graph [stroke="black"] { | ||
stroke: var(--svg-fgcolor); | ||
} | ||
#life_events_graph :is(text, [fill="black"]) { | ||
fill: var(--svg-fgcolor); | ||
} | ||
#life_events_graph :is([fill="white"]) { | ||
fill: var(--svg-bgcolor); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified using currentcolor
& transparent
:
#life_events_graph { | |
/* | |
* Unfortunately these colors don't seem to be exposed in any of the theme's | |
* variables, so they are manually copied here. | |
*/ | |
--svg-light-fgcolor: black; | |
--svg-light-bgcolor: white; | |
--svg-dark-fgcolor: rgba(255, 255, 255, 0.87); | |
--svg-dark-bgcolor: #222; | |
--svg-fgcolor: var(--svg-light-fgcolor); | |
--svg-bgcolor: var(--svg-light-bgcolor); | |
} | |
@media (prefers-color-scheme: dark) { | |
#life_events_graph { | |
--svg-fgcolor: var(--svg-dark-fgcolor); | |
--svg-bgcolor: var(--svg-dark-bgcolor); | |
} | |
} | |
@media (prefers-color-scheme: light) { | |
#life_events_graph { | |
--svg-fgcolor: var(--svg-light-fgcolor); | |
--svg-bgcolor: var(--svg-light-bgcolor); | |
} | |
} | |
:root:has(#pydoctheme_dark_css[media="not all"]) #life_events_graph { | |
--svg-fgcolor: var(--svg-light-fgcolor); | |
--svg-bgcolor: var(--svg-light-bgcolor); | |
} | |
:root:has(#pydoctheme_dark_css[media="all"]) #life_events_graph { | |
--svg-fgcolor: var(--svg-dark-fgcolor); | |
--svg-bgcolor: var(--svg-dark-bgcolor); | |
} | |
#life_events_graph [stroke="black"] { | |
stroke: var(--svg-fgcolor); | |
} | |
#life_events_graph :is(text, [fill="black"]) { | |
fill: var(--svg-fgcolor); | |
} | |
#life_events_graph :is([fill="white"]) { | |
fill: var(--svg-bgcolor); | |
} | |
#life_events_graph { | |
--svg-fgcolor: currentcolor; | |
--svg-bgcolor: transparent; | |
} | |
#life_events_graph a { | |
color: inherit; | |
} | |
#life_events_graph [stroke="black"] { | |
stroke: var(--svg-fgcolor); | |
} | |
#life_events_graph text, | |
#life_events_graph [fill="black"] { | |
fill: var(--svg-fgcolor); | |
} | |
#life_events_graph [fill="white"] { | |
fill: var(--svg-bgcolor); | |
} | |
#life_events_graph [fill="none"] { | |
/* On links, setting fill will make the entire shape clickable */ | |
fill: var(--svg-bgcolor); | |
} |
labeltooltip="start to tp_new: type call" | ||
tooltip="start to tp_new: type call" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove labeltooltip
and tooltip
in the whole diagram. They don't add any information, but they do make non-link text underlined on mouse hover.
labeltooltip="start to tp_new: type call" | |
tooltip="start to tp_new: type call" |
slot). Specifically, this function does **not** call the object's | ||
:meth:`!__init__` method. | ||
|
||
.. warning:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer downgrading these to note
. These functions are not dangerous.
IMO, from the name it would be more reasonable to think that this calls tp_new
or tp_init
(thus perhaps setting up some important invariants) than to assume it zeroes memory.
.. warning:: | |
.. note:: |
(And similar for all the warnings in this PR.)
object. | ||
#. :c:member:`~PyTypeObject.tp_init` initializes the newly created object. | ||
:c:member:`!tp_init` can be called again to re-initialize an object, if | ||
desired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tp_new
can be skipped:
desired. | |
desired. The :c:member:`!tp_init` call can also be skipped entirely, | |
for example by Python code calling :py:meth:`~object.__new__`. |
This also needs an arrow in the diagram.
desired. | ||
|
||
* After :c:member:`!tp_init` completes, the object is ready to use. | ||
* After the last reference to an object is removed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* After the last reference to an object is removed: | |
* Some time after the last reference to an object is removed: |
*resurrected*, preventing its pending destruction. (Only | ||
:c:member:`!tp_finalize` is allowed to resurrect an object; | ||
:c:member:`~PyTypeObject.tp_clear` and | ||
:c:member:`~PyTypeObject.tp_dealloc` cannot.) Resurrecting an object may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would make the text contradictory: if tp_finalize
may resurrect an object but tp_dealloc
may not, then tp_dealloc
may call PyObject_CallFinalizerFromDealloc
(which calls tp_finalize
).
AFAIK, any of these may resurrect the object.
cyclic isolate | ||
A subgroup of one or more objects that reference each other in a reference | ||
cycle, but are not referenced by objects outside the group. The goal of | ||
the garbage collector is to identify these groups and break the reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the garbage collector is to identify these groups and break the reference | |
the :term:`cyclic garbage collector <garbage collection>` is to identify these groups and break the reference |
* :c:member:`~PyTypeObject.tp_dealloc` can optionally call | ||
:c:member:`~PyTypeObject.tp_finalize` via | ||
:c:func:`PyObject_CallFinalizerFromDealloc` if it wishes to reuse that | ||
code to help with object destruction. This is recommended because it | ||
guarantees that :c:member:`!tp_finalize` is always called before | ||
destruction. See the :c:member:`~PyTypeObject.tp_dealloc` documentation | ||
for example code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tp_clear
documentation also suggests that tp_dealloc
can call tp_clear
, to avoid code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK:
For heap types (ones defined in Python code, and ones defined via PyType_From*
functions), tp_dealloc
defaults to a function called subtype_dealloc
, which does the whole dance: calls PyObject_CallFinalizerFromDealloc
and PyObject_GC_UnTrack
, __dict__
and delegates to a supertype's dealloc.
For these types, we should, strongly recommend to not implement your own tp_dealloc
.
tp_dealloc
is what's set when a user defines __del__
in Python code. That's why it's important to call it once, and always: users expect their custom code to run!
However, it's not that important to call in on types you control, where you know users can't provide __del__
-- that is, on immutable types (Py_TPFLAGS_IMMUTABLETYPE
).
Note that all static (i.e. non-heap) types are immutable.
So, the recommendation to call PyObject_CallFinalizerFromDealloc
is only applicable when you're defining a heap type with custom tp_dealloc
-- which should not be recommended.
If you're looking to write tp_finalize
as a C function, a better recommendation would, IMO, be to instead put the clearing code in tp_clear
, and call that from tp_dealloc
(making sure that your tp_clear
is idempotent). I see this PR already makes that recommendation.
is only called when no (strong) references exist, and is thus able to | ||
safely destroy the object itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact it must destroy the object (unless it was resurrected).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for dropping the ball on my part--we're getting to a point where I'm willing to put the first checkmark on it!
I've noticed that a lot of information is repeated. It's good to be clear, but redudancy can be a bit frustrating on our end, because we need to update more places if we want to change something.
not initialized. Despite its name, this function is unrelated to the | ||
object's :meth:`~object.__init__` method (:c:member:`~PyTypeObject.tp_init` | ||
slot). Specifically, this function does **not** call the object's | ||
:meth:`!__init__` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth mentioning that you should just use tp_alloc
and not touch this (say it's a "low-level routine").
:c:func:`PyObject_Init`. The caller will own the only reference to the | ||
object (i.e. its reference count will be one). | ||
|
||
Do not call this directly to allocate memory for an object; call the type's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, "do not" is a strong phrase. Nothing bad will happen if you do it, it's just not recommended.
@@ -136,6 +178,21 @@ rules: | |||
Releases memory allocated to an object using :c:macro:`PyObject_GC_New` or | |||
:c:macro:`PyObject_GC_NewVar`. | |||
|
|||
Do not call this directly to free an object's memory; call the type's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not a fan of "do not" here again. This "do not" is not nearly the severity of the case below, in which you would get a segfault.
:c:member:`~PyTypeObject.tp_finalize` function. Python currently does | ||
not finalize an object when the last reference to it is deleted, but | ||
this may change in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add something like "(unless PyObject_CallFinalizerFromDealloc
is used in the destructor)"
held by the object. Python currently does not clear an object in | ||
response to the deletion of the last reference, but this may change in | ||
the future. | ||
#. :c:member:`~PyTypeObject.tp_dealloc` is called to destroy the object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to say that it's good practice to just have tp_dealloc
call into the tp_clear
slot?
|
||
* The :c:member:`~PyTypeObject.tp_finalize` function is permitted to add a | ||
reference to the object if desired. If it does, the object is | ||
*resurrected*, preventing its pending destruction. (Only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is "resurrected" italicized?
*resurrected*, preventing its pending destruction. (Only | ||
:c:member:`!tp_finalize` is allowed to resurrect an object; | ||
:c:member:`~PyTypeObject.tp_clear` and | ||
:c:member:`~PyTypeObject.tp_dealloc` cannot.) Resurrecting an object may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, tp_dealloc
can through PyObject_CallFinalizerFromDealloc
.
initialization by :c:member:`~PyTypeObject.tp_init`. To construct a | ||
fully initialized object, call *typeobj* instead. For example:: | ||
|
||
PyObject *foo = PyObject_CallNoArgs((PyObject *)&PyFoo_Type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth using an example of a real type here, primarily because not all types are available via PySomething_Type
.
PyObject *foo = PyObject_CallNoArgs((PyObject *)&PyFoo_Type); | |
PyObject *list_instance = PyObject_CallNoArgs((PyObject *)&PyList_Type); |
:c:member:`~PyTypeObject.tp_traverse` to identify :term:`cyclic isolates | ||
<cyclic isolate>`. | ||
* When the garbage collector discovers a :term:`cyclic isolate`, it | ||
finalizes one of the objects in the group by marking it as *finalized* and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I've already asked this, but why is this italicized?
|
||
Listed below are the stages of life of a hypothetical :term:`cyclic isolate` | ||
that continues to exist after each member object is finalized or cleared. It | ||
is a bug if a cyclic isolate progresses through all of these stages; it should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"memory leak" instead of "bug" is more informative:
is a bug if a cyclic isolate progresses through all of these stages; it should | |
is a memory leak if a cyclic isolate progresses through all of these stages; it should |
@rhansen Are you still planning on working on this? If not, I'm happy to take over. |
Updating to resolve a conflict with GH-129850. IMO, the warnings about @rhansen, do you have any work in progress on this? |
PyObject_CallFinalizer
andPyObject_CallFinalizerFromDealloc
.PyObject_Init
does not calltp_init
.PyObject_New
:tp_alloc
,tp_new
, ortp_init
PyObject_Free
PyObject_GC_New
memory must be freed byPyObject_GC_Del
.tp_finalize
andtp_clear
:tp_dealloc
(viaPyObject_CallFinalizerFromDealloc
in the case oftp_finalize
).tp_finalize
:object.__del__
.PyErr_GetRaisedException
andPyErr_SetRaisedException
instead of the deprecatedPyErr_Fetch
andPyErr_Restore
functions.PyErr_GetRaisedException
andPyErr_SetRaisedException
.PyErr_WriteUnraisable
if an exception is raised during finalization.local_finalize
tofoo_finalize
for consistency with thetp_dealloc
documentation and as a hint that the name isn't special.tp_finalize
can be called during shutdown.📚 Documentation preview 📚: https://cpython-previews--125962.org.readthedocs.build/