diff --git a/text/0000-pin.md b/text/0000-pin.md new file mode 100644 index 00000000000..7facb8ea260 --- /dev/null +++ b/text/0000-pin.md @@ -0,0 +1,403 @@ +- Feature Name: pins_and_needles +- Start Date: 2018-02-19 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +Introduce new APIs to libcore / libstd to serve as safe abstractions for data +which cannot be safely moved around. + +# Motivation +[motivation]: #motivation + +A longstanding problem for Rust has been dealing with types that should not be +moved. A common motivation for this is when a struct contains a pointer into +its own representation - moving that struct would invalidate that pointer. This +use case has become especially important recently with work on generators. +Because generators essentially reify a stackframe into an object that can be +manipulated in code, it is likely for idiomatic usage of a generator to result +in such a self-referential type, if it is allowed. + +This proposal adds an API to std which would allow you to guarantee that a +particular value will never move again, enabling safe APIs that rely on +self-references to exist. + +# Guide-level explanation + +The core goal of this RFC is to **provide a reference type where the referent is guaranteed to never move before being dropped**. We want to do this with a minimum disruption to the type system, and in fact, this RFC shows that we can achieve the goal without *any* type system changes. + +Let's take that goal apart, piece by piece, from the perspective of the futures (i.e. async/await) use case: + +- **Reference type**. The reason we need a reference type is that, when working with things like futures, we generally want to combine smaller futures into larger ones, and only at the top level put an entire resulting future into some immovable location. Thus, we need a reference type for methods like `poll`, so that we can break apart a large future into its smaller components, while retaining the guarantee about immobility. + +- **Never to move before being dropped**. Again looking at the futures case, once we being `poll`ing a future, we want it to be able to store references into itself, which is possible if we can guarantee that the whole future will never move. We don't try to track *whether* such references exist at the type level, since that would involve cumbersome typestate; instead, we simply decree that by the time you initially `poll`, you promise to never move an immobile future again. + +At the same time, we want to support futures (and iterators, etc.) that *can* move. While it's possible to do so by providing two distinct `Future` (or `Iterator`, etc) traits, such designs incur unacceptable ergonomic costs. + +The key insight of this RFC is that we can create a new library type, `Pin<'a, T>`, which encompasses *both* moveable and immobile referents. The type is paired with a new auto trait, `Unpin`, which determines the meaning of `Pin<'a, T>`: + +- If `T: Unpin` (which is the default), then `Pin<'a, T>` is entirely equivalent to `&'a mut T`. +- If `T: !Unpin`, then `Pin<'a, T>` provides a unique reference to a `T` with lifetime `'a`, but only provides `&'a T` access safely. It also guarantees that the referent will *never* be moved. However, getting `&'a mut T` access is unsafe, because operations like `mem::replace` mean that `&mut` access is enough to move data out of the referent; you must promise not to do so. + +To be clear: the *sole* function of `Unpin` is to control the meaning of `Pin`. Making `Unpin` an auto trait means that the vast majority of types are automatically "movable", so `Pin` degenerates to `&mut`. In the case that you need immobility, you *opt out* of `Unpin`, and then `Pin` becomes meaningful for your type. + +Putting this all together, we arrive at the following definition of `Future`: + +```rust +trait Future { + type Item; + type Error; + + fn poll(self: Pin, cx: &mut task::Context) -> Poll; +} +``` + +By default when implementing `Future` for a struct, this definition is equivalent to today's, which takes `&mut self`. But if you want to allow self-referencing in your future, you just opt out of `Unpin`, and `Pin` takes care of the rest. + +This RFC also provides a pinned analogiy to `Box` called `PinBox`. It work alongs the same lines as the `Pin` type discussed here - if the type implements `Unpin`, it functions the same as the unpinned `Box`; if the type has opted out of `Unpin`, it guarantees that they type behind the reference will not be moved again. + +# Reference-level explanation + +## The `Unpin` auto trait + +This new auto trait is added to the `core::marker` and `std::marker` modules: + +```rust +pub unsafe auto trait Unpin { } +``` + +A type that implements `Unpin` can be moved out of one of the pinned reference +types discussed later in this RFC. Otherwise, they do not expose a safe API +which allows you to move a value out of them. Because `Unpin` is an auto trait, +most types in Rust implement `Unpin`. The types which don't are primarily +self-referential types, like certain generators. + +This trait is a lang item, but only to generate negative impls for certain +generators. Unlike previous `?Move` proposals, and unlike some traits like +`Sized` and `Copy`, this trait does not impose any compiler-based semantics +types that do or don't implement it. Instead, the semantics are entirely +enforced through library APIs which use `Unpin` as a marker. + +## `Pin` + +The `Pin` struct is added to both `core::mem` and `std::mem`. It is a new kind +of reference, with stronger requirements than `&mut T` + +```rust +#[fundamental] +pub struct Pin<'a, T: ?Sized + 'a> { + data: &'a mut T, +} +``` + +### Safe APIs + +`Pin` implements `Deref`, but only implements `DerefMut` if the type it +references implements `Unpin`. This way, it is not safe to call `mem::swap` or +`mem::replace` when the type referenced does not implement `Unpin`. + +```rust +impl<'a, T: ?Sized> Deref for Pin<'a, T> { ... } + +impl<'a, T: Unpin + ?Sized> DerefMut for Pin<'a, T> { ... } +``` + +It can only be safely constructed from references to types that implement +`Unpin`: + +```rust +impl<'a, T: Unpin + ?Sized> Pin<'a, T> { + pub fn new(reference: &'a mut T) -> Pin<'a, T> { ... } +} +``` + +It also has a function called `borrow`, which allows it to be transformed to a +pin of a shorter lifetime: + +```rust +impl<'a, T: ?Sized> Pin<'a, T> { + pub fn borrow<'b>(this: &'b mut Pin<'a, T>) -> Pin<'b, T> { ... } +} +``` + +It may also implement additional APIs as is useful for type conversions, such +as `AsRef`, `From`, and so on. `Pin` implements `CoerceUnsized` as necessary to +make coercing them into trait objects possible. + +### Unsafe APIs + +`Pin` can be unsafely constructed from mutable references to types that may not +implement `Unpin`. Users who use this constructor must know that the type +they are passing a reference to will never be moved again after the `Pin` is +constructed, even after the lifetime of the reference has ended. (For example, +it is always unsafe to construct a `Pin` from a reference you did not create, +because you don't know what will happen once the lifetime of that reference +ends.) + +```rust +impl<'a, T: ?Sized> Pin<'a, T> { + pub unsafe fn new_unchecked(reference: &'a mut T) -> Pin<'a, T> { ... } +} +``` + +`Pin` also has an API which allows it to be converted into a mutable reference +for a type that doesn't implement `Unpin`. Users who use this API must +guarantee that they never move out of the mutable reference they receive. + +```rust +impl<'a, T: ?Sized> Pin<'a, T> { + pub unsafe fn get_mut<'b>(this: &'b mut Pin<'a, T>) -> &'b mut T { ... } +} +``` + +Finally, as a convenience, `Pin` implements an unsafe `map` function, which +makes it easier to project through a field. Users calling this function must +guarantee that the value returned will not move as long as the referent of this +pin doesn't move (e.g. it is a private field of the value). They also must not +move out of the mutable reference they receive as the closure argument: + +```rust +impl<'a, T: ?Sized> Pin<'a, T> { + pub unsafe fn map<'b, U, F>(this: &'b mut Pin<'a, T>, f: F) -> Pin<'b, U> + where F: FnOnce(&mut T) -> &mut U + { ... } +} + +// for example: +struct Foo { + bar: Bar, +} + +let foo_pin: Pin; + +let bar_pin: Pin = unsafe { Pin::map(&mut foo_pin, |foo| &mut foo.bar) }; +// Equivalent to: +let bar_pin: Pin = unsafe { + let foo: &mut Foo = Pin::get_mut(&mut foo_pin); + Pin::new_unchecked(&mut foo.bar) +}; +``` + +## `PinBox` + +The `PinBox` type is added to alloc::boxed and std::boxed. It is analogous to +the `Box` type in the same way that `Pin` is analogous to the reference types, +and it has a similar API. + +```rust +#[fundamental] +pub struct PinBox { + inner: Box, +} +``` + +### Safe API + +Unlike `Pin`, it is safe to construct a `PinBox` from a `T` and from a +`Box`, even if the type does not implement `Unpin`: + +```rust +impl PinBox { + pub fn new(data: T) -> PinBox { ... } +} + +impl From> for PinBox { + fn from(boxed: Box) -> PinBox { ... } +} +``` + +It also provides the same Deref impls as `Pin` does: + +```rust +impl Deref for PinBox { ... } +impl DerefMut for PinBox { ... } +``` + +If the data implements `Unpin`, its also safe to convert a `PinBox` into a +`Box`: + +```rust +impl From> for Box { ... } +``` + +Finally, it is safe to get a `Pin` from borrows of `PinBox`: + +```rust +impl PinBox { + fn as_pin<'a>(&'a mut self) -> Pin<'a, T> { ... } +} +``` + +These APIs make `PinBox` a reasonable way of handling data which does not +implement `!Unpin`. Once you heap allocate that data inside of a `PinBox`, you +know that it will never change address again, and you can hand out `Pin` +references to that data. + +### Unsafe API + +`PinBox` can be unsafely converted into `&mut T` and `Box` even if the type +it references does not implement `Unpin`, similar to `Pin`: + +```rust +impl PinBox { + pub unsafe fn get_mut<'a>(this: &'a mut PinBox) -> &'a mut T { ... } + pub unsafe fn into_inner(this: PinBox) -> Box { ... } +} +``` + +## Immovable generators + +Today, the unstable generators feature has an option to create generators which +contain references that live across yield points - these are, in effect, +internal references into the generator's state machine. Because internal +references are invalidated if the type is moved, these kinds of generators +("immovable generators") are currently unsafe to create. + +Once the arbitrary_self_types feature becomes object safe, we will make three +changes to the generator API: + +1. We will change the `resume` method to take self by `self: Pin` + instead of `&mut self`. +2. We will implement `!Unpin` for the anonymous type of an immovable generator. +3. We will make it safe to define an immovable generator. + +This is an example of how the APIs in this RFC allow for self-referential data +types to be created safely. + +# Drawbacks +[drawbacks]: #drawbacks + +This adds additional APIs to std, including an auto trait. Such additions +should not be taken lightly, and only included if they are well-justified by +the abstractions they express. + +# Rationale and alternatives +[alternatives]: #alternatives + +## Comparison to `?Move` + +One previous proposal was to add a built-in `Move` trait, similar to `Sized`. A +type that did not implement `Move` could not be moved after it had been +referenced. + +This solution had some problems. First, the `?Move` bound ended up "infecting" +many different APIs where it wasn't relevant, and introduced a breaking change +in several cases where the API bound changed in a non-backwards compatible way. + +In a certain sense, this proposal is a much more narrowly scoped version of +`?Move`. With `?Move`, *any* reference could act as the "Pin" reference does +here. However, because of this flexibility, the negative consequences of having +a type that can't be moved had a much broader impact. + +Instead, we require APIs to opt into supporting immovability (a niche case) by +operating with the `Pin` type, avoiding "infecting" the basic reference type +with concerns around immovable types. + +## Comparison to using `unsafe` APIs + +Another alternative we've considered was to just have the APIs which require +immovability be `unsafe`. It would be up to the users of these APIs to review +and guarantee that they never moved the self-referential types. For example, +generator would look like this: + +```rust +trait Generator { + type Yield; + type Return; + + unsafe fn resume(&mut self) -> CoResult; +} +``` + +This would require no extensions to the standard library, but would place the +burden on every user who wants to call resume to guarantee (at the risk of +memory insafety) that their types were not moved, or that they were moveable. +This seemed like a worse trade off than adding these APIs. + +## Anchor as a wrapper type and `StableDeref` + +In a previous iteration of this RFC, there was a wrapper type called `Anchor` +that could "anchor" any smart pointer, and there was a hierarchy of traits +relating to the stability of the referent of different pointer types. This has +been replaced with `PinBox`. + +The primary benefit of this approach was that it was partially integrated with +crates like owning-ref and rental, which also use a hierarchy of stability +traits. However, because of differences in the requirements, the traits used by +owning-ref et al ended up being a non-overlapping subset of the traits proposed +by this RFC from the traits used by the Anchor type. Merging these into a +single hierarchy provided relatively little benefit. + +And the only types that implemented all of the necessary traits to be put into +an Anchor before were `Box` and `Vec`. Because you cannot get mutable +access to the smart pointer (unless the referent implements `Unpin`), an +`Anchor>` was not really any different from an `Anchor>` in the +previous iteration of the RFC. For this reason, replacing `Anchor` with +`PinBox` and just supporting `PinBox<[T]>` reduced the API complexity without +losing any expressiveness. + +## Stack pinning API (potential future extension) + +This API supports pinning `!Unpin` types in the heap. However, they can also +be safely held in place in the stack, allowing a safe API for creating a `Pin` +referencing a stack allocated `!Unpin` type. + +This API is small, and does not become a part of anyone's public API. For that +reason, we'll start by allowing it to grow out of tree, in third party crates, +before including it in std. Here a version of the API, for reference purposes: + +```rust +pub fn pinned(data: T) -> PinTemporary<'a, T> { + PinTemporary { data, _marker: PhantomData } +} + +struct PinTemporary<'a, T: 'a> { + data: T, + _marker: PhantomData<&'a &'a mut ()>, +} + +impl<'a, T> PinTemporary<'a, T> { + pub fn into_pin(&'a mut self) -> Pin<'a, T> { + unsafe { Pin::new_unchecked(&mut self.data) } + } +} +``` + +## Making `Pin` a built-in type (potential future extension) + +The `Pin` type could instead be a new kind of first-class reference - `&'a pin +T`. This would have some advantages - it would be trivial to project through +fields, for example, and "stack pinning" would not require an API, it would be +natural. However, it has the downside of adding a new reference type, a very +big language change. + +For now, we're happy to stick with the `Pin` struct in std, and if this type is +ever added, turn the `Pin` type into an alias for the reference type. + +## Having both `Pin` and `PinMut` + +Instead of just having `Pin`, the type called `Pin` could instead be `PinMut`, +and we could have a type called `Pin`, which is like `PinMut`, but only +contains a shared, immutable reference. + +Because we've determined that it should be safe to immutably dereference +`Pin`/`PinMut`, this `Pin` type would not provide significant guarantees that a +normal immutable reference does not. If a user needs to pass around references +to data stored pinned, an `&Pin` (under the definition of `Pin` provided in +this RFC) would suffice. For this reason, the `Pin`/`PinMut` distinction +introduced extra types and complexity without any impactful benefit. + +# Unresolved questions +[unresolved]: #unresolved-questions + +In addition to the future extensions discussed above, the APIs of the three pin +types in std will grow over time as they implement more common conversion +traits and so on. + +We may also choose to require that `Pin` uphold stricter guarantees, requiring +that `Unpin` data inside the `Pin` not leak unless the memory remains valid for +the remainder of the program lifetime. This would make the stack API documented +above unsound, but might also enable other APIs to make use of these guarantees +to ensure that a destructor always runs if the memory becomes invalid.