Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Allocator trait #39

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
185 changes: 185 additions & 0 deletions active/0000-allocator-trait.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
- Start Date: 2014-04-07
- RFC PR #: (leave this empty)
- Rust Issue #: (leave this empty)

# Summary

Rust is in need of a trait to generalize low-level memory allocators. This will enable a pluggable
default allocator, along with the possibility of supporting per-container allocators with the option
of statefulness.

# Motivation

Modern general purpose allocators are quite good, but need to make many design tradeoffs. There is
no "best" global allocator so it should be a configurable feature of the standard library. In order
for this to happen, an allocator interface needs to be defined.

Some applications may also have a use case for cache aligned nodes with concurrent data structures
to avoid contention, or very fast naive allocators (like a bump allocator) shared between data
structures with related lifetimes.

The basic `malloc`, `realloc` and `free` interface is quite lacking, since it's missing alignment
and the ability to obtain an allocation's size. In order to have good support an alignment
specification on types, it needs to be possible for types like `Vec<T>` to ask the allocator for an
alignment. This can be done inefficiently by building a wrapper around the `malloc` API, but many
allocators have efficient support for this need built-in.

# Detailed design

Trait design:

```rust
pub trait Allocator {
/// Return a pointer to `size` bytes of memory.
///
/// A null pointer may be returned if the allocation fails.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the alternative behavior, calling fail!? Just making sure I understand what the expectation is here.

(Or maybe this is just an english ambiguity; are you saying "A null pointer may be returned. In particular, a null pointer is return if and only if the allocation fails"? Or are you saying "If the allocation fails, then a null pointer may be returned, or may happen.")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rephrase this to say that returning a null pointer indicates a failed allocation. The alternative would be calling abort or fail!(), preventing this trait from being used where handling out-of-memory is necessary.

///
/// Behavior is undefined if the requested size is 0 or the alignment is not a power of 2. The
/// alignment must be no larger than the largest supported page size on the platform.
unsafe fn alloc(&self, size: uint, align: u32) -> *mut u8;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is alloc allowed to return null?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will return a null pointer in out-of-memory conditions. I'll add that to the documentation. There will still be stuff like the exchange_malloc wrapper implementing a specific mechanism to handle out-of-memory and 0-size allocations.


/// Extend or shrink the allocation referenced by `ptr` to `size` bytes of memory.
///
/// A null pointer may be returned if the allocation fails and the original memory allocation
/// will not be altered.
///
/// Behavior is undefined if the requested size is 0 or the alignment is not a power of 2. The
/// alignment must be no larger than the largest supported page size on the platform.
///
/// The `old_size` and `align` parameters are the parameters that were used to create the
/// allocation referenced by `ptr`. The `old_size` parameter may also be the value returned by
/// `usable_size` for the requested size.
unsafe fn realloc(&self, ptr: *mut u8, size: uint, align: u32, old_size: uint) -> *mut u8;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is realloc allowed to return null? If so, does the block stay allocated like in C?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes to both, I'll edit that in.


/// Deallocate the memory referenced by `ptr`.
///
/// The `ptr` parameter must not be null.
///
/// The `size` and `align` parameters are the parameters that were used to create the
/// allocation referenced by `ptr`. The `size` parameter may also be the value returned by
/// `usable_size` for the requested size.
unsafe fn dealloc(&self, ptr: *mut u8, size: uint, align: u32);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it allowed to pass ptr as null?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we ever do pass null to free, so it's probably best to disallow it just for the sake of removing a pointless branch. We do use null pointers to indicated moved-from at the moment, but we branch on null to check if we should call a destructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to allow size to vary between those two options, then I might have expected the size parameter to be allowed to be any integer in the range [orig_requested_size, usable_size_result] (inclusive).

Otherwise ... if I start using a little bit more of the available capacity, but not all of it, then I need to ensure I pass pack usable_size_result always, instead of the amount of the capacity that I ended up taking so far?

(In case its not clear, what I'm trying to say is that I would expect either a more flexible interface, as described above, or a stricter interface, where size needs to match the last amount that was registered with the allocator e.g. via realloc, or, in the case where we never called realloc, then it needs to match the original requested size passed to malloc)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to allow any integer in that range. I don't think it would be common to use one of the values in between though. The use cases I see are either using only the size you asked for, or recording the real capacity for future use in a type like a vector or hash table.


/// Return the usable size of an allocation created with the specified the `size` and `align`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the word "usable" here meant to imply that one can freely transmute to a larger size without registering it with the allocator? Or is it something where you still need to call realloc, but you can use usable_size to first ensure that the call to realloc is guaranteed to be very cheap?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnkfelix I'm convinced it's the first case. Here, the usable size means the de facto size of similar allocations. Basically, alloc(size, align) is exactly equivalent to alloc(usable_size(size, align), align).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the allocator informing you of how much memory it really hands over for that size / align. Modern general purpose allocators use size classes in order to tightly pack the allocations in arenas with a very low metadata / fragmentation upper bound. For very large sizes, the allocations are also being rounded to the page size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware of the use of size classes in modern allocators. My point was more about what the intended usage pattern was for clients of this API, since two reasonable people could read what is currently here and come to the two distinct protocols that I outlined in my message.

A reason why one might prefer the explicit "you still go through realloc protocol: there may be value (in terms of debugging tools, I'm thinking of e.g. valgrind) of still requiring a client to call back through realloc even when they "know" via usable_size that they already have the space available, so that the allocator has a chance to record locally what state is actually allowed to be used by a correct client.

#[inline(always)]
#[allow(unused_variable)]
unsafe fn usable_size(&self, size: uint, align: u32) -> uint { size }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's a good idea to have this separate as opposed to either of those options:

  1. Having malloc and realloc return a tuple of (ptr, usable_size)
  2. Keeping this as a function, but forcing the user to call it before malloc and realloc and pass the return value to them instead of the normal size

The problem with this design is that usable_size will usually have to computed twice, both in malloc and when the allocator user calls usable_size.

To decide which alternative: is it useful to know the usable size given the size without actually allocating memory? Is it useful to be able to write an allocator where the usable size is not constant given the size?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's useful to write an allocator where the usable size is not constant given the size. The excess space comes from statically defined size classes or excess space from the page granularity. It seems only a very naive allocator would end up in a situation where it would have dynamic trailing capacity unusable by other allocations.

I can see why the current API is not ideal for very cheap allocators, and the size calculation is likely not a huge cost with jemalloc. I'll need to think about this more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I edited the above comment a bit based on my opinion changing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's useful to write an allocator where the usable size is not constant given the size.

I'm not certain about this assumption. For example, suppose you have an allocator tuned for large allocations. If it receives a request for a 4 kB region and has a 4.5 kB free block, it might want to return the whole block because it knows it will never get a request that fits in 0.5 kB.

Additionally, you may be right, but I see opening up the possibility for more allocators as more important than making the API (which probably shouldn't be used directly) prettier.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be quite odd to have space after an allocation that's not usable for future small allocations though. If you're packing stuff together via bitmaps, you'll be able to reuse that space for something else. If you're not, you won't end up with unusable gaps anyway because size classes prevent it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to change how it works for another reason, which is that some allocators can avoid an extra calculation that won't always be constant folded by LLVM.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit that my example is highly unlikely, but I think it would arise if the following were true:

  • The allocator is only used for large arrays, meaning that keeping track of small chunks of memory is not very useful.
  • The allocated objects remain allocated for a long time, meaning that coalescing adjacent free regions is not very useful, making small chunks even less useful.
  • The user of the allocator can use any extra memory provided.
  • The sizes of allocations are neither constant nor rounded to nice (e.g. page size) values, meaning that size classes would waste lots of memory.

I can only think of one use case right now that satisfies these requirements (an unrolled linked list with a gradually increasing unrolling amount), but I think it could cause the situation I described. Because of the weirdly sized allocations, there might be a 4.5 kB free block. Because only long arrays are allocated, no request will fit in an 0.5 kB block. Because of the long object lifetime, there is no reason to suspect that an 0.5 kB block can be merged into another block in any reasonable timeframe. Taking all of that into account, returning all 4.5 kB is the logical thing to do.

I completely agree that for the vast majority of allocators, especially general purpose allocators, this sort of behavior is not going to show up. However, it might be useful in very specialized allocators, and I don't see a convincing argument that you would never need the ability to dynamically choose usable sizes in general purpose allocators - the lack of need in current systems and classes of systems just tells me that no one has thought of a use yet.

Moreover, I don't see the downside of returning the usable size. On the implementation side, it only makes things marginally more complicated, as the usable size is presumably already calculated, and so can simply be returned. On the user side, the complication should be contained to the standard library, as no one should be directly calling allocators - they should use predefined new functions.

}
```

The trait takes an `align` parameter alongside every `size` parameter in order to support large
alignments without overhead for all allocations. For example, aligned AVX/AVX-512 vectors,
cache-aligned memory for concurrent data structures or page-aligned memory for working with
hardware.

The `usable_size` method provides a way of asking for the real size/alignment the allocator will
produce for a specific request. While it is possible for the `alloc` and `realloc` methods to
return this dynamically, it adds complexity and does not seem to have a use case. There is no
reason for it to return an alignment, since the dynamic alignment information is trivially
obtainable from any pointer.

The `realloc` and `dealloc` methods require passing in the size and alignment of the existing memory
allocation. The alignment should be a known constant, so this will not place a burden on the caller.
The *guarantee* of having a size permits much more optimization than simply *sometimes* being passed
a size. It allows simple allocators to forgo storing a size altogether. For example, this permits an
implementation of a free list for variable size types without metadata overhead. C++ allocators use
this design, with the `deallocate` method always taking a size parameter.

It is left up to the caller to choose how to handle zero-size allocations, such as the current
wrapping done by the `rt::global_heap` allocator. Allocators like jemalloc do not provide a
guarantee here, and some callers may want a null pointer while others will want a non-null
sentinel pointing at a global.

The alignment is given a reasonable restriction, by capping it at the largest huge page size on
the system. It should never be dynamic, so this is easily satisfiable. This is to meet the
requirement of allocators like jemalloc for the alignment to be satisfiable without overflow while
still meeting every use case. It is given as 32-bit because LLVM uses a 32-bit integer for
alignment.

Sample default allocator:

```rust
extern crate libc;

use std::intrinsics::cttz32;
use libc::{c_int, c_void, size_t};

#[link(name = "jemalloc")]
extern {
fn nallocx(size: size_t, flags: c_int) -> size_t;
fn mallocx(size: size_t, flags: c_int) -> *mut c_void;
fn rallocx(ptr: *mut c_void, size: size_t, flags: c_int) -> *mut c_void;
fn dallocx(ptr: *mut c_void, flags: c_int);
}

pub struct DefaultAllocator;

// MALLOCX_ALIGN(a) macro
fn mallocx_align(a: u32) -> c_int { unsafe { cttz32(a as i32) as c_int } }

impl Allocator for DefaultAllocator {
unsafe fn alloc(&self, size: uint, align: u32) -> *mut u8 {
mallocx(size as size_t, mallocx_align(align)) as *mut u8
}

unsafe fn realloc(&self, ptr: *mut u8, size: uint, align: u32, _: uint) -> *mut u8 {
rallocx(ptr as *mut c_void, size as size_t, mallocx_align(align)) as *mut u8
}

unsafe fn dealloc(&self, ptr: *mut u8, _: uint, align: u32) {
dallocx(ptr as *mut c_void, mallocx_align(align))
}

#[inline(always)]
unsafe fn usable_size(&self, size: uint, align: u32) -> uint {
nallocx(size as size_t, mallocx_align(align)) as uint
}
}

pub static default: DefaultAllocator = DefaultAllocator;
```

# Alternatives

## Zeroed memory

The `Allocator` trait does not provide a way to ask for zeroed memory. Allocators based on
mmap/mremap already pay this cost for large allocations but the tide is likely going to shift to
new underlying APIs like the Linux vrange work. This optimization (`calloc`) is not currently
used by anything in Rust, as it's quite hard to fit it into any generic code.

The `alloc` and `realloc` functions could take a `zero: bool` parameter for leveraging the guarantee
provided by functions like `mmap` and `mremap`. It would be possible to add support for this in an
almost completely backwards compatible way by adding two new default methods with the parameter.

## Sized reallocation/deallocation

The old size passed to `realloc` and `dealloc` is an optional performance enhancement. There is some
debate about whether this is worth having. I have included it because there is no drawback within
the current language and standard libraries, so it's an obvious performance enhancement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you happen to have some hard numbers for this improvement? I'm sure there were benchmarks run to convince people to add the parameters to the C++ allocators, but a cursory search didn't find anything about sized deallocation vs. non-sized deallocation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you happen to have some hard numbers for this improvement?

I can tell you that it removes the need for metadata headers from simple allocators like those based solely on free lists for various size classes. It will reduce memory usage by up to 50%, and that comes with a large performance benefit.

For general purpose allocators where this data may already be around, it's still faster. In jemalloc and TCMalloc, an optional size parameter will save at least one cache miss. Taking advantage of the guarantee would require a complete redesign.

When an object is deallocated, we compute its page number and look it up in the central array to find the corresponding span object. The span tells us whether or not the object is small, and its size-class if it is small.


In a future version of Rust, `~[T]` will be an owned slice stored as `(ptr, len)`. Converting from
an owned slice to a `Vec<T>` will be a no-op. However, conversion from `Vec<T>` to `~[T]` can not be
free due to the Option-like `enum` optimization for non-nullable pointers. There is a choice between
calling `free` on zero-size allocations during the conversion, or branching in every `~[T]`
destructor based on a comparison with the reserved sentinel address. If `dealloc` requires a size, a
`shrink_to_fit()` call will be the minimum requirement.

The niche for `~[T]` is not yet known, so it would be premature to sacrifice performance relative to
C++ allocators to optimize one aspect of it. A vector should be left as `Vec<T>` to avoid losing
track of excess capacity, and except in recursive data structures there is little cost for the
capacity field.

## Alignment parameter

This parameter is required to be a power of 2, so it could take `log2(alignment)` instead. Either
way, the standard library is going to need to expose a convenience function for retrieving this for
a specific type.

# Unresolved questions

The finer details of the API for allocator support in containers is beyond the scope of this RFC. It
only aims to define the `Allocator` trait for making Rust's default allocator configurable and
building the foundation for containers. A sample design would be `Vec<T, A = DefaultAllocator>` with
extra static methods taking an *instance* of the allocator type in order to support stateful
allocators.