Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a MemoryAllocator object for easier Cython memory management #18868

Closed
nathanncohen mannequin opened this issue Jul 8, 2015 · 49 comments
Closed

a MemoryAllocator object for easier Cython memory management #18868

nathanncohen mannequin opened this issue Jul 8, 2015 · 49 comments

Comments

@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Jul 8, 2015

This is a re-implementation of an example appearing in Cython's documentation (see bottom of [1]).

The idea is simple: an object has a .malloc method, and returns arrays of memory. When that object is garbage-collected, the memory it allocated is automatically freed.

This makes it much easier to deal with C pointers in Cython code, which must be freed before any 'return' and whenever an exception can be raised.

As an illustration of how useful this can be, I removed a lot of graph code.

Nathann

[1] http://docs.cython.org/src/tutorial/memory_allocation.html

Depends on #18864

CC: @dimpase @sagetrac-borassi @dcoudert @vbraun @jdemeyer @simon-king-jena

Component: cython

Author: Nathann Cohen, Jeroen Demeyer

Branch/Commit: 0304d9f

Reviewer: Jeroen Demeyer, Volker Braun

Issue created by migration from https://trac.sagemath.org/ticket/18868

@nathanncohen nathanncohen mannequin added this to the sage-6.8 milestone Jul 8, 2015
@nathanncohen nathanncohen mannequin added c: cython labels Jul 8, 2015
@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jul 8, 2015

New commits:

2723175trac #18868: a MemoryAllocator object for easier Cython memory management

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jul 8, 2015

Commit: 2723175

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jul 8, 2015

Branch: public/18868

@nathanncohen nathanncohen mannequin added the s: needs review label Jul 8, 2015
@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

comment:2

This

include 'sage/ext/interrupt.pxi'

should be in the .pyx file.

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

comment:3

For performance, replace

cdef MemoryAllocator mem = MemoryAllocator()

by

cdef MemoryAllocator mem = <MemoryAllocator>MemoryAllocator.__new__(MemoryAllocator)

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

comment:4

Can you please support all memory-allocation functions, like calloc, realloc, (re)allocarray and replace the dangerous(*) calls like

mem.malloc( n * sizeof(unsigned short *))

by

mem.allocarray(n, sizeof(unsigned short *))

(*) the multiplication can overflow.

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

comment:5

Also for performance: keep calloc, do not replace it with malloc + memset.

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

comment:6

I guess that realloc(array) might not be so easy to support, so you can forget about that. But I would certainly support allocarray and calloc.

And I don't understand why you use

cdef void * val = malloc(size)
if val == NULL:
    raise MemoryError()

instead of

cdef void * val = check_malloc(size)

@dimpase
Copy link
Member

dimpase commented Jul 8, 2015

comment:7

[1] recommends PyMem_Malloc, PyMem_Realloc, PyMem_Free over the system functions. Why do you
stick with malloc etc?

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

comment:8

More about performance: note that your use of MemoryAllocator needs at least 2 extra memory allocation calls compared to not using MemoryAllocator: one to allocate the MemoryAllocator object and one to allocate the pointers member. If you allocate several pointers, it's worse because you again need more additional realloc() calls for pointers.

To solve some of these problems, I propose the following: add a small fixed array

cdef void* local_pointers[16]

in the MemoryAllocator class and initialize

self.max_size = 16
self.pointers = self.local_pointers

such that you avoid any allocations for pointers if you need to store at most 16 pointers (which is the usual case I guess).

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

comment:9

Replying to @dimpase:

Why do you stick with malloc etc?

In Sage, you should use absolutely use sage_malloc (or check_malloc) because it behaves well w.r.t. interrupts!

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

comment:10

More for performance: replace

cdef enlarge_if_needed(self):

by

cdef int enlarge_if_needed(self) except -1:

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

comment:11

One more detail: replace

cdef int n
cdef int max_size

by

cdef size_t n
cdef size_t max_size

(and adjust the loop index in __dealloc__) in case I ever need more than 231 pointers :-)

@dimpase
Copy link
Member

dimpase commented Jul 8, 2015

comment:13

Replying to @jdemeyer:

Replying to @dimpase:

Why do you stick with malloc etc?

In Sage, you should use absolutely use sage_malloc (or check_malloc) because it behaves well w.r.t. interrupts!

And why does sage_malloc use malloc, and not PyMem_Malloc?

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 8, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

bf39672trac #18868: back to calloc
9597eectrac #18868: Changes to MemoryAllocator

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 8, 2015

Changed commit from 2723175 to 9597eec

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jul 8, 2015

comment:15

This include 'sage/ext/interrupt.pxi' should be in the .pyx file.

Done

For performance, replace
cdef MemoryAllocator mem = MemoryAllocator()
by
cdef MemoryAllocator mem = <MemoryAllocator>MemoryAllocator.__new__(MemoryAllocator)

Makes no difference in any of the functions I touched, so I did not do it. Less
readable.

Can you please support all memory-allocation functions, like calloc, realloc,
(re)allocarray and replace the dangerous(*) calls like

I added support for calloc. I do not need realloc (you can add a commit if you
need it).

About realloc:

I guess that realloc(array) might not be so easy to support, so you can forget
about that.

+1

But I would certainly support allocarray

Add a commit if you like. My problem with 'allocarray' is that it is Sage's
terminology (as far as I know), and that I prefer to stick to more widely
understood terms like 'malloc/calloc'. I was just saying to David that we may
eventually move some C graph code away from sage and make it an independent C
library.

And I don't understand why you use

cdef void * val = malloc(size)
if val == NULL:
    raise MemoryError()

instead of

cdef void * val = check_malloc(size)

Done

[1] recommends PyMem_Malloc, PyMem_Realloc, PyMem_Free over the system
functions. Why do you stick with malloc etc?

Personally, for a simple reason: I trust C functions. Officially, it is for the
reason Jeroen mentionned. Note that we can satisfy everybody: it is possible to
have sage_malloc (which behaves properly with respect to interrupts) call your
functions instead of C ones.

More about performance: note that your use of MemoryAllocator needs at least 2
extra memory allocation calls compared to not using MemoryAllocator

This is not a problem for the codes I touched, for malloc is never a dominant
cost in any of them. You can add a commit if you like, though please keep the
code simple and understandable if you do.

More for performance: replace
cdef enlarge_if_needed(self):
by
cdef int enlarge_if_needed(self) except -1:

Done.

One more detail: replace int with size_t

Done.

Jeroen, could you send reviews as one big comment rather than adding one every
time you notice something? Everybody in Cc receives an email for each comments,
and it is also a bit harder to address your points:

  • One never knows if you are done reading the code or if we can expect a new
    comment in the next seconds

  • Answering your points like I do now is a bit harder: instead of clicking on
    'reply' (to your comment), I copy/paste all your individual messages into a
    text file, then start answering them.

Thaaaaaaaaanks,

Nathann

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

comment:16

Replying to @dimpase:

[1] recommends PyMem_Malloc, PyMem_Realloc, PyMem_Free over the system functions.

What is [1]?

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

comment:17

Replying to @nathanncohen:

My problem with 'allocarray' is that it is Sage's terminology (as far as I know)

It actually comes from BSD, where it was added for security reasons (like I said, the multiplication in malloc(n * sizeof(foo)) can overflow, a potential security issue). For Sage, it's not so much the "security" aspect which is important, but the reliability.

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

comment:18

Replying to @nathanncohen:

You can add a commit if you like, though please keep the
code simple and understandable if you do.

I thought that speed was an important reason to invent this new class. If you don't care so much about speed, there are several already-existing alternatives, such as Cython arrays.

If speed is not so much a concern, then why are you reinventing the wheel?

@dimpase
Copy link
Member

dimpase commented Jul 8, 2015

comment:29

Replying to @nathanncohen:

Why wouldn't we discuss it with the cython guys? Perhaps they would be willing to provide something we could use?

sure, why not; also, how about asking them about comment 23 ?

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 8, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

f514301Optimize MemoryAllocator and add allocarray()

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 8, 2015

Changed commit from 9597eec to f514301

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 8, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

39f8839Fix exception handling

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 8, 2015

Changed commit from f514301 to 39f8839

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

Changed author from Nathann Cohen to Nathann Cohen, Jeroen Demeyer

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

Reviewer: Jeroen Demeyer

@jdemeyer
Copy link

jdemeyer commented Jul 8, 2015

comment:32

This is ok for me, but someone needs to review my changes.

@dimpase
Copy link
Member

dimpase commented Jul 9, 2015

comment:33

still, how about PyMem_Malloc, PyMem_Realloc, PyMem_Free, as mentioned in http://docs.cython.org/src/tutorial/memory_allocation.html ?

Apart from performance, there could be advantages in sense of debugging, as one can
./configure Python with --with-pydebug to catch memory errors...

@jdemeyer
Copy link

jdemeyer commented Jul 9, 2015

comment:34

Replying to @dimpase:

still, how about PyMem_Malloc, PyMem_Realloc, PyMem_Free, as mentioned in http://docs.cython.org/src/tutorial/memory_allocation.html ?

In any case, that's independent of this ticket.

Second, sage_malloc() can be called without the GIL. I guess PyMem_Malloc cannot...

@jdemeyer
Copy link

jdemeyer commented Jul 9, 2015

comment:35

Third, Python's memory allocator does not support calloc() or allocarray(), two very useful functions.

And if you're speaking about performance, recall that malloc() + memset() is slower than calloc().

@jdemeyer
Copy link

comment:36

Can somebody review this please?

@jdemeyer
Copy link

Changed reviewer from Jeroen Demeyer to Jeroen Demeyer, Volker Braun

@dcoudert
Copy link
Contributor

comment:39

Your were faster than me.
I can at least confirm that all long tests pass on src/sage/graphs/.
David.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 24, 2015

Changed commit from 39f8839 to 0304d9f

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 24, 2015

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. Last 10 new commits:

da7bad7trac #18864: Merged with 6.8.beta8
da0260dtrac #18864: prevent using the Takes-Kosters algorithm on digraphs
554c509trac #18864: implement reviewer's comments
f48a33etrac #18864: minor corrections
a3717b7trac #18864: improve behavior with directed graphs
f621777trac #18864: corrections and improved doc
80cb899trac #18864: correct version
ce9d265trac #18864: return Sage integers
a1b36datrac #18864: Merged with 6.8.rc1
0304d9ftrac #18868: Merged with #18864

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jul 24, 2015

Dependencies: #18864

@vbraun
Copy link
Member

vbraun commented Jul 28, 2015

comment:43

Conflict with #18868

@vbraun
Copy link
Member

vbraun commented Jul 28, 2015

Changed branch from public/18868 to 0304d9f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants