-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-31333: Re-implement ABCMeta in C #5273
Conversation
(I will add a news item later.) |
What exact magic does six use? This can actually be a serious issue and cause many breakages.
I'd go with (1). There're 0 reasons to use those private caches/registries directly. I had a similar problem problem in asyncio in 3.6, when I debated whether I want to expose private Task and Future attributes or not. Turns out that we'll be hiding some of them in 3.7 because it's impossible to optimize/refactor code otherwise.
Sounds like "minor downside that dead references will stay in caches" is a backwards-incompatible change. Some ORMs (like the one we've developed at MagicStack) create virtual DB Model classes on the fly. Some of them can be inherited from ABCMeta. So I guess with this PR we'd be risking to have a memory leak in 3.7. BTW, have you considered implementing caches and instance/subclass hooks in C, but implementing the actual ABCMeta class in pure Python? That way ABCMeta methods would have nice C-accelerated versions, but the 'six' problem should go away. |
We can also use |
Modules/_abc.c
Outdated
Stage 1: direct abstract methods. | ||
(It is safe to assume everything is fine since type.__new__ succeeded.) */ | ||
ns = PyTuple_GET_ITEM(args, 2); | ||
items = PyMapping_Items(ns); /* TODO: Fast path for exact dicts with PyDict_Next */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be null if ns is a subclass of dict that overwrites the items method to raise an error or return a non-iterable
Modules/_abc.c
Outdated
items = PyMapping_Items(ns); /* TODO: Fast path for exact dicts with PyDict_Next */ | ||
for (pos = 0; pos < PySequence_Size(items); pos++) { | ||
item = PySequence_GetItem(items, pos); | ||
key = PyTuple_GetItem(item, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
item isn't necessarily a tuple, for example in
class BadItems(dict):
def items(self):
return (87,)
AbcMeta.__new__("name", (), BadItems())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, item could be a empty tuple or 1-tuple, meaning that key and/or value could be null, which again needs to be checked for.
Modules/_abc.c
Outdated
if (!(iter = PyObject_GetIter(base_abstracts))) { | ||
goto error; | ||
} | ||
while ((key = PyIter_Next(iter))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing check for iterators that raise an error (such as when __abstractmethods__
is overwritten with a custom object).
Modules/_abc.c
Outdated
} | ||
Py_DECREF(iter); | ||
} | ||
if (_PyObject_SetAttrId((PyObject *)result, &PyId___abstractmethods__, abstracts) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this cause a change in behavior; __abstractmethods__
used to be a frozenset
, but would now be a regular mutable set
They're implemented in Python and it's one of major reason ABC is slow. |
Modules/_abc.c
Outdated
abcmeta_new, /* tp_new */ | ||
0, /* tp_free */ | ||
0, /* tp_is_gc */ | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use designated initializer for new type.
def with_metaclass(meta, *bases):
# ...
class metaclass(type):
def __new__(cls, name, this_bases, d):
return meta(name, bases, d)
@classmethod
def __prepare__(cls, name, this_bases):
return meta.__prepare__(name, bases)
return type.__new__(metaclass, 'temporary_class', (), {}) The problem is that it doesn't work with C-level metaclasses.
This actually sounds like a reasonable solution.
OK, I can hide them (we then just need to update code in # pseudo-code, will be in C
_the_registry: Dict[WeakRef[type], Set[WeakRef[type]]] = {}
...
def _abc_register(cls, subcls):
_registry = _the_registry[ref(cls)]
_registry.add(ref(subcls))
return subcls or is there a better way?
Yes, I have a TODO about limiting cache growth in code, but if we are going to hide private cache attributes, then it is easy, we can just register callbacks since caches are never iterated in this code.
As @methane mentioned they are implemented in Python and also slow and overly general. As I said above, if we are going to hide the attributes, then we don't need this for caches. We only iterate over the registry and for it I can just use callbscks with a "commit queue" and an iteration guard (this is actually the idea behind |
@pppery Thanks for a review! Will fix this. |
@serhiy-storchaka I just noticed you self-requested a review on this PR. Do you still want to review this before I merge? Or are you fine taking a look at it later and if necessary making a separate PR? |
I'm making a quick review right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed the C code. It mostly LGTM, but I added several comments.
Modules/_abc.c
Outdated
PyObject *_abc_registry; | ||
PyObject *_abc_cache; /* Normal set of weak references. */ | ||
PyObject *_abc_negative_cache; /* Normal set of weak references. */ | ||
PyObject *_abc_negative_cache_version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use just a plain 64-bit integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, good point, I was worried about some ORMs that register lots of classes, but I just calculated that it will take million years to reach the maximum value even if 1000 classes are registered every second.
Modules/_abc.c
Outdated
static int | ||
_in_weak_set(PyObject *set, PyObject *obj) | ||
{ | ||
if (set == NULL || PySet_Size(set) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use PySet_GET_SIZE()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these attributes are not accessible from Python code so that we know that they always refer to sets.
if (wr == NULL) { | ||
return -1; | ||
} | ||
destroy_cb = PyCFunction_NewEx(&_destroy_def, wr, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check destroy_cb == NULL
?
Modules/_abc.c
Outdated
self: object | ||
/ | ||
|
||
Internal ABC helper for class set-up. Should be never used outside abc module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed period at the end.
Modules/_abc.c
Outdated
int is_abstract = _PyObject_IsAbstract(value); | ||
Py_DECREF(value); | ||
if (is_abstract < 0 || | ||
(is_abstract && PySet_Add(abstracts, key) < 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 7 requires {
on new line in case of multiline condition.
Modules/_abc.c
Outdated
|
||
/* 6. Check if it's a subclass of a subclass (recursive). */ | ||
subclasses = PyObject_CallMethod(self, "__subclasses__", NULL); | ||
if(!PyList_Check(subclasses)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed space after if
.
Modules/_abc.c
Outdated
goto end; | ||
} | ||
for (pos = 0; pos < PyList_GET_SIZE(subclasses); pos++) { | ||
int r = PyObject_IsSubclass(subclass, PyList_GET_ITEM(subclasses, pos)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyList_GET_ITEM(subclasses, pos)
is a borrowed reference while subclasses
is mutable and can has external references. It is safe to temporary increment the refcount.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, but the list itself holds strong references to all its elements, and will not be destroyed since we hold a strong reference to the list. Anyway, if you think it is important I can add an INCREF
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyObject_IsSubclass()
can execute arbitrary Python code. This code can modify the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but note that this can happen only if a subclass overrides __subclasses__
, because object.__subclasses__
returns a new list on each call.
Modules/_abc.c
Outdated
return 0; | ||
} | ||
// Weakref callback may remove entry from set. | ||
// Se we take snapshot of registry first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"So"?
Modules/_abc.c
Outdated
} | ||
// Weakref callback may remove entry from set. | ||
// Se we take snapshot of registry first. | ||
PyObject **copy = PyMem_Malloc(sizeof(PyObject*) * registry_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be simpler to use PySequence_List(impl->_abc_registry)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this is a very hot code, so we decided to make an optimisation here (anyway we need a copy for avery short time).
Modules/_abc.c
Outdated
|
||
The token is an opaque object (supporting equality testing) identifying the | ||
current version of the ABC cache for virtual subclasses. The token changes | ||
with every call to ``register()`` on any ABC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double bacticks look not needed.
@serhiy-storchaka Thanks for review! I will make the changes now. |
@serhiy-storchaka I implemented the latest comments, are you happy with the PR now? |
It seems like after this change the |
The key word here is after. |
static unsigned long long abc_invalidation_counter = 0; | ||
|
||
/* This object stores internal state for ABCs. | ||
Note that we can use normal sets for caches, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is out of date
since they are never iterated over. */ | ||
typedef struct { | ||
PyObject_HEAD | ||
PyObject *_abc_registry; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of registry, cache, and negative cache are normal sets of weak references; there should be comments stating that for either all of none of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth mentioning in What's New in the optimizations section.
Modules/_abc.c
Outdated
@@ -508,6 +513,9 @@ _abc__abc_instancecheck_impl(PyObject *module, PyObject *self, | |||
} | |||
|
|||
subclass = _PyObject_GetAttrId(instance, &PyId___class__); | |||
if (subclass == NULL) { | |||
return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaked impl
.
Modules/_abc.c
Outdated
Py_DECREF(negative_cache); | ||
return NULL; | ||
} | ||
PyObject *res = PyTuple_Pack(4, registry, cache, negative_cache, cache_version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be written simpler as:
PyObject *res = Py_BuildValue("NNNK",
PySet_New(impl->_abc_registry),
PySet_New(impl->_abc_cache),
PySet_New(impl->_abc_negative_cache),
impl->_abc_negative_cache_version);
Modules/_abc.c
Outdated
abc_invalidation_counter, Py_LT); | ||
assert(r >= 0); // Both should be PyLong | ||
if (r > 0) { | ||
if (impl->_abc_negative_cache_version < abc_invalidation_counter) { | ||
/* Invalidate the negative cache. */ | ||
if (impl->_abc_negative_cache != NULL && | ||
PySet_Clear(impl->_abc_negative_cache) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 7 requires {
on a separate line in this case.
@ilevkivskyi: Please replace |
This adds C versions of methods used by ABCMeta that improve performance of various ABC operations.
Thank you @ilevkivskyi and @methane! |
Thank you for good reviews @1st1! |
This implementation is fully functional but there are three remaining issues/questions that I hope we can resolve quickly:
six.with_metaclass
(used bypip
) does some fragile "magic". With this PR it fails withTypeError: type.__new__(metaclass) is not safe, use ABCMeta.__new__()
. I see two options here: (1) Try to somehow special-caseABCMeta
to fix this; (2) Do nothing here, but instead contactsix
andpip
maintainers so that hey can fix this.C._abc_cache.clear()
, but notC._abc_cache = "Surprise!"
; (3) Expose them, and make them writable. I currently go with option (2), which seems to be a reasonable compromise.I didn't do any careful benchmarking yet, but this this seems to give a decent speed-up for Python start-up time and to several ABC-related tests. For example, on my machine Python startup is 10% faster.
@methane @serhiy-storchaka I will really appreciate your help/advise/review here.
@ned-deily I know beta1 is very close, but I would like this to get in. I already discussed with @gvanrossum and he is OK with this getting into beta1.
https://bugs.python.org/issue31333