Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-100240: Generic freelist, applied to ints #101453

Closed
wants to merge 19 commits into from

Conversation

iritkatriel
Copy link
Member

@iritkatriel iritkatriel commented Jan 31, 2023

@markshannon This is basically the freelist from your branch, made a bit more generic, and applied to ints. We should think about which sizes we want to include.

(I'm trying to run benchmarks, but there is some issue with the machine at the moment).

@iritkatriel iritkatriel marked this pull request as draft January 31, 2023 10:36
@iritkatriel iritkatriel changed the title freelist for ints generic freelist, applied to ints Jan 31, 2023
Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a draft, but I've taken the liberty of reviewing anyway 🙂

result = (PyLongObject *)_PyInterpreterState_FreelistAlloc(interp, sizeof(PyLongObject));
}
#else
if (size == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change it for this PR, but this shouldn't be needed, as there should only be one zero. I think this indicates a bug in the caller.
The above assert, assert(size >= 0) should then be assert(size > 0)

Include/internal/pycore_interp.h Show resolved Hide resolved
}

static inline void
_PyInterpreterState_FreelistFree(PyInterpreterState * interp, PyObject *op, int size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is size?. The assumption is that it is size in bytes, but I think you are using size in machine words.
Also needs to be Py_ssize_t.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the results of sizeof(PyLongObject), that should be bytes no?

Include/internal/pycore_pymem.h Outdated Show resolved Hide resolved
static void
int_dealloc(PyLongObject *op)
{
#if WITH_FREELISTS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move the #if WITH_FREELISTS into PyInterpreterState_FreelistFree to keep it localized?

Personally, I think we should just remove WITH_FREELISTS, but that needs a wider discussion.

}
uint32_t i = 0;
for (; i < list->space>>1; i++) {
void* ptr = PyObject_Malloc(list->size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for future PR: we should add the bulk allocate/free capability to the allocator.

Objects/obmalloc.c Outdated Show resolved Hide resolved
Objects/obmalloc.c Outdated Show resolved Hide resolved
Python/pystate.c Outdated Show resolved Hide resolved
Python/pystate.c Outdated Show resolved Hide resolved
@iritkatriel
Copy link
Member Author

benchmarks are showing this as 2% slower overall. I guess we need to tune it.

https://github.com/faster-cpython/benchmarking/tree/main/results/bm-20230131-3.12.0a4%2B-fe65f49

@markshannon
Copy link
Member

It looks like this PR is allocating from the freelist, but freeing to the underlying allocator in the specialized BINARY_OP instructions.
Which could explain the large slowdown in spectralnorm.

@@ -230,6 +237,34 @@ PyAPI_FUNC(int) _PyInterpreterState_IDInitref(PyInterpreterState *);
PyAPI_FUNC(int) _PyInterpreterState_IDIncref(PyInterpreterState *);
PyAPI_FUNC(void) _PyInterpreterState_IDDecref(PyInterpreterState *);

#define SIZE_TO_FREELIST_INDEX(size) ((size-4)/2)
Copy link
Member

@markshannon markshannon Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why divide by 2 and not 2 * sizeof(void *), which is the quantum of allocation for malloc and ob_malloc?
(Assuming that the size is in bytes)
Could you use the term "size class" or equivalent, instead of "freelist index" The mapping here is from sizes to classes (not all size classes get free lists).

Any reason not to make this an inline function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, put an assert in this function, and put a size check in the caller.

@markshannon markshannon changed the title generic freelist, applied to ints GH-100240: Generic freelist, applied to ints Feb 3, 2023
@markshannon
Copy link
Member

I see you've added stats, have you recorded the stats yet?

@iritkatriel
Copy link
Member Author

I see you've added stats, have you recorded the stats yet?

Not yet. I have a bug I'm trying to figure out. It crashes in tests that use subprocesses.

@@ -230,6 +235,27 @@ PyAPI_FUNC(int) _PyInterpreterState_IDInitref(PyInterpreterState *);
PyAPI_FUNC(int) _PyInterpreterState_IDIncref(PyInterpreterState *);
PyAPI_FUNC(void) _PyInterpreterState_IDDecref(PyInterpreterState *);

#define FREELIST_QUANTUM (2*sizeof(void*))
#define SIZE_TO_FREELIST_INDEX(size) ((size-4)/FREELIST_QUANTUM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this formula?

You want to allocate all sizes from (n-1)*FREELIST_QUANTUM + 1 to n*FREELIST_QUANTUM (inclusive) from the same freelist.
I would expect the formula to be (size + FREELIST_QUANTUM -1)/FREELIST_QUANTUM, or (size-1)/FREELIST_QUANTUM.

And because C division rounds towards 0, not -infinity, you want to use >>LOG_BASE_2_OF_FREELIST_QUANTUM, not /FREELIST_QUANTUM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't changed this yet. I'm trying to get it to work with just ints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already are some macros in pycore_obmalloc.h that seem to do something like this:

/* 
 * Alignment of addresses returned to the user. 8-bytes alignment works
 * on most current architectures (with 32-bit or 64-bit address buses).
 * The alignment value is also used for grouping small requests in size
 * classes spaced ALIGNMENT bytes apart.
 *
 * You shouldn't change this unless you know what you are doing.
 */

#if SIZEOF_VOID_P > 4
#define ALIGNMENT              16               /* must be 2^N */
#define ALIGNMENT_SHIFT         4
#else
#define ALIGNMENT               8               /* must be 2^N */
#define ALIGNMENT_SHIFT         3
#endif
   
/* Return the number of bytes in size class I, as a uint. */
#define INDEX2SIZE(I) (((pymem_uint)(I) + 1) << ALIGNMENT_SHIFT)

@iritkatriel
Copy link
Member Author

test_embed is failing because of a leak (I'll check why). The other tests pass.

@iritkatriel
Copy link
Member Author

test_embed is failing because of a leak (I'll check why). The other tests pass.

The leak was because an int was freed and inserted to the freelist after it has been cleared in interpreter_clear. I set space and capacity to 0 in interpreter_clear to prevent this.

@iritkatriel
Copy link
Member Author

Dropping this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants