Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __rust_unsized_deallocate for C libraries to hook into. #31976

Closed
wants to merge 1 commit into from

Conversation

michaelwu
Copy link
Contributor

__rust_deallocate requires a size to be passed in, especially when jemalloc is used. Most C/C++ libraries don't use sized deallocation, so this allows non-rust code to hook into Rust's allocator without changing every free() call.

Haven't tested this beyond building yet, but I'm putting this PR out first to see if the idea sounds ok. I'm adding sized allocation support to Spidermonkey https://bugzilla.mozilla.org/show_bug.cgi?id=1250998 but not everything can easily be converted to sized deallocation. The _rust* functions are being used to hook into rust's allocator.

__rust_deallocate requires a size to be passed in, especially when
jemalloc is used. Most C/C++ libraries don't use sized deallocation,
so this allows non-rust code to hook into Rust's allocator without
changing every free() call.
@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@sfackler
Copy link
Member

Can regions allocated with the sized API be deallocated with the unsizzed API? I wouldn't expect that they could - isn't the point of sized allocation that the allocator can omit the size header?

@michaelwu
Copy link
Contributor Author

All allocations are sized, and I don't see a way to request that jemalloc not store metadata about a particular allocation. As I understand it, the sized deallocation API is an optimization to avoid looking up the metadata, rather than not storing metadata at all.

AIUI modern allocators do not use size headers either.

@alexcrichton
Copy link
Member

I think that this may be the wrong use of these libraries, the double-underscore in front of these symbols really is intended to mean "you should not call these functions", and it's true that you should not call these functions or rely on their existence.

In terms of functionality, this also seems like it may not be the intended use case of these functions either? Could you elaborate a bit on why external languages are being fused to the Rust allocator? The current intention is that it'd go the other way around with this support existing to fuse Rust to some other allocator.

@michaelwu
Copy link
Contributor Author

Using double underscore functions is a bit weird, but it's required even if you wanted to plug in an allocator into rust rather than the other way around - https://doc.rust-lang.org/book/custom-allocators.html#writing-a-custom-allocator . It seems like we already rely on the existence of these symbols.

As to attaching external language to Rust's allocator - I think it's not so unusual when you look at C/C++ libraries. A large number of libraries - zlib, libpng, libjpeg, and of course, spidermonkey, provide ways to attach them to custom allocators. They don't really have their own allocator, just some default stubs that glue into libc if nothing else is specified. If we don't have an appropriate API here to plug into, then everyone who wants to make sure their rust code is sharing the same allocator as C/C++ code has to provide their own allocator.

@huonw
Copy link
Member

huonw commented Mar 1, 2016

As @sfackler says, allowing "standard" use of non-sized deallocation seems unfortunate: without that possibility, Rust could theoretically use an allocator that basically stores no information about allocated pointers internally (jemalloc has internal side tables, instead of headers), relying on the fact that we always pass back in a valid pointer along with its size.

It seems like this might be possible via a custom allocator that exposes this sort of function itself?

@alexcrichton
Copy link
Member

It seems like we already rely on the existence of these symbols.

I suspect that any stable form of this API would hide this implementation detail. It's not intended to be a public thing that's typed.

If we don't have an appropriate API here to plug into, then everyone who wants to make sure their rust code is sharing the same allocator as C/C++ code has to provide their own allocator.

I agree with @huonw here that this sounds like a case for shims rather than a new symbol that can be hard-wired to.

@michaelwu
Copy link
Contributor Author

So what's the alternative? It would be best to have all code use the same allocator, and there isn't much of a reason for most people to use anything other than what rust comes with. It seems broken to require a custom allocator in order to get code to share allocators.

I can't think of how a reasonable allocator that stores no metadata could exist. Doing better than a size header is possible, and is already done, but at some point you need to know what's allocated.

@nagisa
Copy link
Member

nagisa commented Mar 1, 2016

I can't think of how a reasonable allocator that stores no metadata could exist. Doing better than a size header is possible, and is already done, but at some point you need to know what's allocated.

You can expect most Rust code to reasonably remember the sizes of allocations by itself. E.g. Box<[T]> allocates slice_len * size_of::<T>() and Box<T> allocates size_of::<T>.

I share my surprise these symbols exist in the first place and I share the sentiment about unsized allocation/deallocation functions not being useful for the Rust itself.

@michaelwu
Copy link
Contributor Author

You can expect most Rust code to reasonably remember the sizes of allocations by itself. E.g. Box<[T]> allocates slice_len * size_of::() and Box allocates size_of::.

Sure, but when it comes time to make a new allocation, the allocator need to know what memory is and isn't being used, which requires the allocator to have some idea of how big any particular allocation is.

@michaelwu
Copy link
Contributor Author

I share my surprise these symbols exist in the first place and I share the sentiment about unsized allocation/deallocation functions not being useful for the Rust itself.

Yeah Rust doesn't need this. This is for C/C++ libraries, so I didn't add anything to liballoc.

@nagisa
Copy link
Member

nagisa commented Mar 1, 2016

Yeah Rust doesn't need this. This is for C/C++ libraries, so I didn't add anything to liballoc.

It seems to me this is for liballoc::heap’s exclusive use, actually (hence the __rust prefix).

@alexcrichton
Copy link
Member

@michaelwu the alternative is what @huonw mentioned which is to build that allocation on top of what the underlying API already gives you.

@michaelwu
Copy link
Contributor Author

And store the size for every pointer? That seems like a very poor solution. The alternative is to look up the size of the pointer, but no API exists for that either.

@nagisa
Copy link
Member

nagisa commented Mar 2, 2016

The alternative is to look up the size of the pointer, but no API exists for that either.

Box deallocation code uses pretty much an equivalent of https://gist.github.com/89b0e226909482bd935a to “lookup” the size.

@michaelwu
Copy link
Contributor Author

Yeah but the free API we want to implement takes untyped void* pointers. That particular strategy doesn't work here. In SM, I've already converted frees of typed pointers into sized frees where it works reasonably (which is most of it!). That covers nearly everything, but then there are weird classes that are manually allocated and things that are in the public API which are less trivial to convert to sized allocation.

@alexcrichton
Copy link
Member

@michaelwu basically, yeah. This is the API for all allocators in Rust, and it also seems unfortunate two force support for two modes of allocating?

@michaelwu
Copy link
Contributor Author

There isn't two modes of allocating though. A reasonable allocator needs to store enough data to find the rough size of any given valid pointer or it won't be able to quickly reuse freed space. Sized deallocation is strictly a performance optimization for any reasonable allocator you would use with this interface. (Edit - I take this back, I can think of one reasonable way of doing this, but all current allocators that I know of also know the sizes of allocations) FWIW - finding the rough size of a pointer is another missing thing in this api, so HeapSize needs to make assumptions about the allocator to hook into the right functions https://doc.servo.org/src/heapsize/lib.rs.html#42-63 . OSX, glibc, bionic, musl, jemalloc, tcmalloc, dlmalloc, and windows all provide functions for determining the usable size of a pointer.

@nagisa
Copy link
Member

nagisa commented Mar 4, 2016

OSX, glibc, bionic, musl, jemalloc, tcmalloc, dlmalloc, and windows all provide functions for determining the usable size of a pointer.

And some toy allocator somebody implements at some point in the future won’t. Instead it would rely on Rust always passing in correct sizes. With this change in place allocators like that won’t be possible anymore.

@nox
Copy link
Contributor

nox commented Mar 7, 2016

We (Servo) do need to know the actual allocated size of pointers and boxes and things like this. Cf the heapsize crate and #32075.

You can expect most Rust code to reasonably remember the sizes of allocations by itself. E.g. Box<[T]> allocates slice_len * size_of::<T>() and Box<T> allocates size_of::<T>.

And then you underestimate all memory usage, because that doesn't account for jemalloc rounding up, or the Windows Heap API shenanigans with alignment.

@nagisa
Copy link
Member

nagisa commented Mar 7, 2016

We (Servo) do need to know the actual allocated size of pointers and boxes and things like this.

I could get behind this need/want. We’ve also wanted something similar in the standard library for Vec capacity at some point.

And then you underestimate all memory usage, because that doesn't account for jemalloc rounding up, or the Windows Heap API shenanigans with alignment.

Isn’t that something the allocation library should be abstracting itself? I’d be seriously surprised if jemalloc_sdallocx(jemalloc_mallocx(size, align), size, align); (or the equivalent with any other allocation API) leaked memory in any circumstances.

@nox
Copy link
Contributor

nox commented Mar 7, 2016

Isn’t that something the allocation library should be abstracting itself? I’d be seriously surprised if jemalloc_sdallocx(jemalloc_mallocx(size, align), size, align); (or the equivalent with any other allocation API) leaked memory in any circumstances.

I am not sure what you mean by leaking memory. What I was referring to is that both jemalloc and Windows' Heap API require to overallocate in some cases, size_of::<T> doesn't know about that.

@nagisa
Copy link
Member

nagisa commented Mar 7, 2016

What I was referring to is that both jemalloc and Windows' Heap API require to overallocate in some cases

Yes, they do. However, Rust won’t request more than strictly necessary (i.e. it would call mallocx(size_of::<T>())) and whatever over-allocation the allocation libraries do are strictly the responsibility of allocation libraries themselves¹, right? I either do not see how sdallocx(size_of::<T>(), min_align_of::<T>()) is not sufficient as an implementation of “unsized deallocation” or I do not see the point you’re trying to make.

¹: They might provide capability to discover the exact size of memory allocated, which in my eyes, is perhaps a necessary hole in abstraction.

@alexcrichton
Copy link
Member

The purpose of this API is to basically outline the fundamentals of allocation for all Rust crates, and just because current implementations support a piece of functionality doesn't necessarily mean that's future proof in terms of being easily implementable in all possible allocators.

This sort of interface could certainly be added to the implementations which support it (eg alloc_jemalloc and alloc_system), but I don't think there's any need for this to be part of the fundamental underpinnings for all allocators in Rust.

I'd be fine just adding more Rust APIs to the alloc_jemalloc crate which Servo can use manually to funnel allocations through.

@bluss
Copy link
Member

bluss commented Mar 7, 2016

A function that reports the actual allocated size is much less of a commitment since it can be documented to return zero when the information is not available (of course using the best possible information when possible though, to help servo).

@alexcrichton
Copy link
Member

Perhaps, but adding bindings to alloc_{jemalloc,system} is even less of a commitment as they're just exporting what the implementation is already doing.

@alexcrichton
Copy link
Member

Closing due to inactivity, but feel free to resubmit with a rebase!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants