Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare global allocators for stabilization #1974

Merged
merged 7 commits into from
Jun 18, 2017
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
248 changes: 248 additions & 0 deletions text/0000-global-allocators.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@
- Feature Name: `allocator`
- Start Date: 2017-02-04
- RFC PR:
- Rust Issue:

# Summary
[summary]: #summary

Overhaul the global allocator APIs to put them on a path to stabilization, and
switch the default allocator to the system allocator when the feature
stabilizes.

This RFC is a refinement of the previous [RFC 1183][].

[RFC 1183]: https://github.com/rust-lang/rfcs/blob/master/text/1183-swap-out-jemalloc.md

# Motivation
[motivation]: #motivation

## API

The unstable `allocator` feature allows developers to select the global
allocator which will be used in a program. A crate identifies itself as an
allocator with the `#![allocator]` annotation, and declares a number of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now out of date, as we are just identifying the static?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, will fix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no, this section is a description of the API as it exists today, not as it will exist in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfackler I was also confused by this when I first read the section. Would it be possible to relabel the section as "What we have now" or something?

allocation functions with specific `#[no_mangle]` names and a C ABI. To
override the default global allocator, a crate simply pulls an allocator in
via an `extern crate`.

There are a couple of issues with the current approach:

A C-style ABI is error prone - nothing ensures that the signatures are correct,
and if a function is omitted that error will be caught by the linker rather than
compiler. The Macros 1.1 API is similar in that certain special functions must
be identified to the compiler, and in that case a special attribute
(`#[proc_macro_derive]`)is used rather than a magic symbol name.

Since an allocator is automatically selected when it is pulled into the crate
graph, it is painful to compose allocators. For example, one may want to create
an allocator which records statistics about active allocations, or adds padding
around allocations to attempt to detect buffer overflows in unsafe code. To do
this currently, the underlying allocator would need to be split into two
crates, one which contains all of the functionality and another which is tagged
as an `#![allocator]`.

## jemalloc

Rust's default allocator has historically been jemalloc. While jemalloc does
provide significant speedups over certain system allocators for some allocation
heavy workflows, it has has been a source of problems. For example, it has
deadlock issues on Windows, does not work with Valgrind, adds ~300KB to
binaries, and has caused crashes on macOS 10.12. See [this comment][] for more
details. As a result, it is already disabled on many targets, including all of
Windows. While there are certainly contexts in which jemalloc is a good choice,
developers should be making that decision, not the compiler. The system
allocator is a more reasonable and unsurprising default choice.

A third party crate allowing users to opt-into jemalloc would also open the door
to provide access to some of the library's other features such as tracing, arena
pinning, and diagnostic output dumps for code that depends on jemalloc directly.

[this comment]: https://github.com/rust-lang/rust/issues/36963#issuecomment-252029017

# Detailed design
[design]: #detailed-design

## Defining an allocator

An allocator crate identifies itself as such by applying the `#![allocator]`
annotate at the crate root. It then defines a specific set of functions which
are tagged with attributes:

```rust
#![allocator]

/// Returns a pointer to `size` bytes of memory aligned to `align`.
///
/// On failure, returns a null pointer.
///
/// Behavior is undefined if the requested size is 0 or the alignment is not a
/// power of 2. The alignment must be no larger than the largest supported page
/// size on the platform.
#[allocator(allocate)]
pub fn allocate(size: usize, align: usize) -> *mut u8 {
...
}

/// Returns a pointer to `size` bytes of memory aligned to `align`, and
/// initialized with zeroes.
///
/// On failure, returns a null pointer.
///
/// Behavior is undefined if the requested size is 0 or the alignment is not a
/// power of 2. The alignment must be no larger than the largest supported page
/// size on the platform.
#[allocator(allocate_zeroed)]
pub fn allocate_zeroed(size: usize, align: usize) -> *mut u8 {
...
}

/// Deallocates the memory referenced by `ptr`.
///
/// The `ptr` parameter must not be null.
///
/// The `old_size` and `align` parameters are the parameters that were used to
Copy link

@hanna-kruppe hanna-kruppe Apr 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruuda made a good point in the discussion of the allocator traits: It can be sensible to allocate over-aligned data, but this information is not necessarily carried along until deallocation, so there's a good reason deallocate shouldn't require the same alignment that was used to allocate.

This requirement was supposed to allow optimizations in the allocator, but AFAIK nobody could name a single existing allocator design that can use alignment information for deallocation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote an allocator for an OS kernel once that would have benefited greatly from alignment info.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be very relevant to both this RFC and the allocators design, so could you write up some details?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... It seems that I was very mistaken... I have to appologize 🤕

Actually, when I went back and looked at the code, I found the exact opposite. The allocator interface actually does pass the alignment to free, and my implementation of free ignores it for exactly the reasons mentioned above (more later). That said, passing alignment into the alloc function is useful (and required for correctness), so I assume that this discussion is mostly about if free should take align or not.

The code is here. It's a bit old and not very well-written since I was learning rust when I wrote it. Here is a simple description of what it does:

Assumptions

  • The kernel is the only entity using this allocator. (The user-mode allocator lives in user-mode).
  • The kernel is only using this allocator through Box, so the parameters size and align are trusted to be correct, since they are generated by the compiler.

Objective

Use as little metadata as possible.

Blocks

  • All blocks are a multiple of the smallest possible block size, which is based on the size of the free-block metadata (16B on a 32-bit machine).
  • All blocks have a minimum alignment which is the same as minimum block size (16B).
  • The allocator keeps a free-list which is simply a singly linked list of blocks.
  • Free blocks are used to store their own metadata.
  • Active blocks have no header/footer. This means that their is no header/footer overhead at all.

alloc

Allocating memory just grabs the first free block with required size and alignment, removes it from the free list, splits it if needed, and returns a pointer to its beginning. The size of the block allocated is a function of the alignment and size.

free

Freeing memory requires very little effort, it turns out. Since we assume that the parameters size and ptr are valid, we simply create block metadata and add to the linked list. If possible, we can merge with free blocks after the block we are freeing.

In fact, the alignment passed into free is ignored here because the ptr should already be aligned. The takeaway seems to be the opposite from what I said above (again, sorry). When I thought about it some more, it makes sense. A ptr inherently conveys some alignment information, so passing this information in as an argument actually seems somewhat redundant.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually quite relieved to hear that 😄 Yes, allocation and reallocation should have alignment arguments, it's just deallocation that shouldn't use alignment information. It's not quite true that "ptr inherently conveys alignment information", because the pointer might just happen to have more alignment than was requested, but it's true that it's always aligned as requested at allocation time (since it must be the exact pointer returned by allocation, not a pointer into the allocation).

/// create the allocation referenced by `ptr`.
#[allocator(deallocate)]
pub fn deallocate(ptr: *mut u8, old_size: usize, align: usize) {
...
}

/// Resizes the allocation referenced by `ptr` to `size` bytes.
///
/// On failure, returns a null pointer and leaves the original allocation
/// intact.
///
/// If the allocation was relocated, the memory at the passed-in pointer is
/// undefined after the call.
///
/// Behavior is undefined if the requested size is 0 or the alignment is not a
/// power of 2. The alignment must be no larger than the largest supported page
/// size on the platform.
///
/// The `old_size` and `align` parameters are the parameters that were used to
/// create the allocation referenced by `ptr`.
#[allocator(reallocate)]
pub fn reallocate(ptr: *mut u8, old_size: usize, size: usize, align: usize) -> *mut u8 {
...
}

/// Resizes the allocation referenced by `ptr` to `size` bytes without moving
/// it.
///
/// The new size of the allocation is returned. This must be at least
/// `old_size`. The allocation must always remain valid.
///
/// Behavior is undefined if the requested size is 0 or the alignment is not a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we go for "Behavior is undefined if the requested size is less than old_size or..."?
It might be worth spelling out explicitly whether old_size and size are the only legitimate return values or if the function can also return something inside that range.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just copied these docs out of alloc::heap - they need to be cleaned up.

/// power of 2. The alignment must be no larger than the largest supported page
/// size on the platform.
///
/// The `old_size` and `align` parameters are the parameters that were used to
/// create the allocation referenced by `ptr`.
///
/// This function is optional. The default implementation simply returns
/// `old_size`.
#[allocator(reallocate_inplace)]
pub fn reallocate_inplace(ptr: *mut u8, old_size: usize, size: usize, align: usize) -> usize {
...
}
```

Note that `useable_size` has been removed, as it is not used anywhere in the
standard library.

The allocator functions must be publicly accessible, but can have any name and
be defined in any module. However, it is recommended to use the names above in
the crate root to minimize confusion.

An allocator must provide all functions with the exception of
`reallocate_inplace`. New functions can be added to the API in the future in a
similar way to `reallocate_inplace`.

## Using an allocator

The functions that an allocator crate defines can be called directly, but most
usage will happen through the *global allocator* interface located in
`std::heap`. This module exposes a set of functions identical to those described
above, but that call into the global allocator. To select the global allocator,
a crate declares it via an `extern crate` annotated with `#[allocator]`:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification request: Can all crates do this? As mentioned in another comment, I would conservatively expect this choice to be left to the root crate, as with panic runtimes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, any crate can do this, yeah.

I would be fine restricting allocator selection to the root crate if it simplifies the implementation - I can't think of any strong reasons for needing to select an allocator in a non-root crate.


```rust
#[allocator]
extern crate jemalloc;
```

As its name would suggest, the global allocator is a global resource - all
crates in a dependency tree must agree on the selected global allocator. If two
or more distinct allocator crates are selected, compilation will fail. Note that
multiple crates can select a global allocator as long as that allocator is the
same across all of them. In addition, a crate can depend on an allocator crate
without declaring it to be the global allocator by omitting the `#[allocator]`
Copy link

@hanna-kruppe hanna-kruppe Apr 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to restrict this choice to "root crates" (executables, staticlibs, cdylibs) analogously to how the panic strategy is chosen? [1] I can't think of a good reason for a library to require a particular allocator, and it seems like it could cause a ton of pain (and fragmentation) to mix multiple allocators within one application.

[1]: It's true that the codegen option -C panic=... can and must be set for libraries too, but this is mostly to allow separate compilation of crates – the panic runtime to be linked in is determined by the root. There are also restrictions (can't link a panic=abort library into a panic=unwind library). In addition, Cargo exposes only the "root sets panic strategy" usage.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share this concern. Allowing libraries to require a particular global allocator could create rifts in the crate ecosystem, where different sets of libraries cannot be used together because they require different global allocators.

Allocators share the same interface, and so the optimal allocator will depend on the workload of the binary. It seems like the crate root author will be in the best position to make this choice, since they'll have insight into the workload type, as well as be able to run holistic benchmarks.

Thus is seems like a good idea to restrict global allocator selection to the crate root author.

annotation.

## Standard library

The standard library will gain a new stable crate - `alloc_system`. This is the
default allocator crate and corresponds to the "system" allocator (i.e. `malloc`
etc on Unix and `HeapAlloc` etc on Windows).

The `alloc::heap` module will be reexported in `std` and stabilized. It will
simply contain functions matching directly to those defined by the allocator
API. The `alloc` crate itself may also be stabilized at a later date, but this
RFC does not propose that.

The existing `alloc_jemalloc` may continue to exist as an implementation detail
of the Rust compiler, but it will never be stabilized. Applications wishing to
use jemalloc can use a third-party crate from crates.io.

# How We Teach This
[how-we-teach-this]: #how-we-teach-this

The term "allocator" is the canonical one for this concept. It is unfortunately
shared with a similar but distinct concept described in [RFC 1398][], which
defined an `Allocator` trait over which collections be parameterized. This API
is disambiguated by referring specifically to the "global" or "default"
allocator.

Global allocator selection would be a somewhat advanced topic - the system
allocator is sufficient for most use cases. It is a new tool that developers can
use to optimize for their program's specific workload when necessary.

It should be emphasized that in most cases, the "terminal" crate (i.e. the bin,
cdylib or staticlib crate) should be the only thing selecting the global
allocator. Libraries should be agnostic over the global allocator unless they
are specifically designed to augment functionality of a specific allocator.

Defining an allocator is an even more advanced topic that should probably live
in the _Nomicon_.

[RFC 1398]: https://github.com/rust-lang/rfcs/pull/1398

# Drawbacks
[drawbacks]: #drawbacks

Dropping the default of jemalloc will regress performance of some programs until
they manually opt back into that allocator, which may produce confusion in the
community as to why things suddenly became slower.

The allocator APIs are to some extent designed after what jemalloc supports,
which is quite a bit more than the system allocator is able to. The Rust
wrappers for those simpler allocators have to jump through hoops to ensure that
all of the requirements are met.

# Alternatives
[alternatives]: #alternatives

We could require that at most one crate selects a global allocator in the crate
graph, which may simplify the implementation.

The allocator APIs could be simplified to a more "traditional"
malloc/calloc/free API at the cost of an efficiency loss when using allocators
with more powerful APIs.

# Unresolved questions
[unresolved]: #unresolved-questions

It is currently forbidden to pass a null pointer to `deallocate`, though this is
guaranteed to be a noop with libc's `free` at least. Some kinds of patterns in C
are cleaner when null pointers can be `free`d - is the same true for Rust?