-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-100240: Generic freelist, applied to ints #101453
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is a draft, but I've taken the liberty of reviewing anyway 🙂
Objects/longobject.c
Outdated
result = (PyLongObject *)_PyInterpreterState_FreelistAlloc(interp, sizeof(PyLongObject)); | ||
} | ||
#else | ||
if (size == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't change it for this PR, but this shouldn't be needed, as there should only be one zero. I think this indicates a bug in the caller.
The above assert, assert(size >= 0)
should then be assert(size > 0)
Include/internal/pycore_interp.h
Outdated
} | ||
|
||
static inline void | ||
_PyInterpreterState_FreelistFree(PyInterpreterState * interp, PyObject *op, int size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is size
?. The assumption is that it is size in bytes, but I think you are using size in machine words.
Also needs to be Py_ssize_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's the results of sizeof(PyLongObject), that should be bytes no?
Objects/longobject.c
Outdated
static void | ||
int_dealloc(PyLongObject *op) | ||
{ | ||
#if WITH_FREELISTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move the #if WITH_FREELISTS
into PyInterpreterState_FreelistFree
to keep it localized?
Personally, I think we should just remove WITH_FREELISTS
, but that needs a wider discussion.
} | ||
uint32_t i = 0; | ||
for (; i < list->space>>1; i++) { | ||
void* ptr = PyObject_Malloc(list->size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for future PR: we should add the bulk allocate/free capability to the allocator.
benchmarks are showing this as 2% slower overall. I guess we need to tune it. |
It looks like this PR is allocating from the freelist, but freeing to the underlying allocator in the specialized |
Include/internal/pycore_interp.h
Outdated
@@ -230,6 +237,34 @@ PyAPI_FUNC(int) _PyInterpreterState_IDInitref(PyInterpreterState *); | |||
PyAPI_FUNC(int) _PyInterpreterState_IDIncref(PyInterpreterState *); | |||
PyAPI_FUNC(void) _PyInterpreterState_IDDecref(PyInterpreterState *); | |||
|
|||
#define SIZE_TO_FREELIST_INDEX(size) ((size-4)/2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why divide by 2 and not 2 * sizeof(void *)
, which is the quantum of allocation for malloc and ob_malloc?
(Assuming that the size is in bytes)
Could you use the term "size class" or equivalent, instead of "freelist index" The mapping here is from sizes to classes (not all size classes get free lists).
Any reason not to make this an inline function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, put an assert in this function, and put a size check in the caller.
I see you've added stats, have you recorded the stats yet? |
Not yet. I have a bug I'm trying to figure out. It crashes in tests that use subprocesses. |
Include/internal/pycore_interp.h
Outdated
@@ -230,6 +235,27 @@ PyAPI_FUNC(int) _PyInterpreterState_IDInitref(PyInterpreterState *); | |||
PyAPI_FUNC(int) _PyInterpreterState_IDIncref(PyInterpreterState *); | |||
PyAPI_FUNC(void) _PyInterpreterState_IDDecref(PyInterpreterState *); | |||
|
|||
#define FREELIST_QUANTUM (2*sizeof(void*)) | |||
#define SIZE_TO_FREELIST_INDEX(size) ((size-4)/FREELIST_QUANTUM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this formula?
You want to allocate all sizes from (n-1)*FREELIST_QUANTUM + 1
to n*FREELIST_QUANTUM
(inclusive) from the same freelist.
I would expect the formula to be (size + FREELIST_QUANTUM -1)/FREELIST_QUANTUM
, or (size-1)/FREELIST_QUANTUM
.
And because C division rounds towards 0, not -infinity, you want to use >>LOG_BASE_2_OF_FREELIST_QUANTUM
, not /FREELIST_QUANTUM
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't changed this yet. I'm trying to get it to work with just ints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already are some macros in pycore_obmalloc.h that seem to do something like this:
/*
* Alignment of addresses returned to the user. 8-bytes alignment works
* on most current architectures (with 32-bit or 64-bit address buses).
* The alignment value is also used for grouping small requests in size
* classes spaced ALIGNMENT bytes apart.
*
* You shouldn't change this unless you know what you are doing.
*/
#if SIZEOF_VOID_P > 4
#define ALIGNMENT 16 /* must be 2^N */
#define ALIGNMENT_SHIFT 4
#else
#define ALIGNMENT 8 /* must be 2^N */
#define ALIGNMENT_SHIFT 3
#endif
/* Return the number of bytes in size class I, as a uint. */
#define INDEX2SIZE(I) (((pymem_uint)(I) + 1) << ALIGNMENT_SHIFT)
|
test_embed is failing because of a leak (I'll check why). The other tests pass. |
The leak was because an int was freed and inserted to the freelist after it has been cleared in interpreter_clear. I set space and capacity to 0 in interpreter_clear to prevent this. |
Dropping this for now. |
@markshannon This is basically the freelist from your branch, made a bit more generic, and applied to ints. We should think about which sizes we want to include.
(I'm trying to run benchmarks, but there is some issue with the machine at the moment).