Skip to content

Commit

Permalink
Merge pull request #2 from RalfJung/uninitialized-uninhabited
Browse files Browse the repository at this point in the history
Update unintiialized RFC
  • Loading branch information
canndrew authored Aug 7, 2018
2 parents 835f860 + 8ae636b commit acaf534
Showing 1 changed file with 60 additions and 26 deletions.
86 changes: 60 additions & 26 deletions text/0000-uninitialized-uninhabited.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,17 @@
# Summary
[summary]: #summary

Deprecate `mem::uninitialized::<T>` and replace it with a `MaybeUninit<T>` type
for safer and more principled handling of uninitialized data.
Deprecate `mem::uninitialized::<T>` and `mem::zeroed::<T>` and replace them with
a `MaybeUninit<T>` type for safer and more principled handling of uninitialized
data.

# Motivation
[motivation]: #motivation

The problems with `uninitialized` centre around its usage with uninhabited
types. The concept of "uninitialized data" is extremely problematic when it
comes into contact with types like `!` or `Void`.
types, and its interaction with Rust's type layout invariants. The concept of
"uninitialized data" is extremely problematic when it comes into contact with
types like `!` or `Void`.

For any given type, there may be valid and invalid bit-representations. For
example, the type `u8` consists of a single byte and all possible bytes can be
Expand Down Expand Up @@ -53,6 +55,18 @@ fn mem::uninitialized::<!>() -> !
Yet calling this function does not diverge! It just breaks everything then eats
your laundry instead.

This problem is most prominent with `!` but also applies to other types that
have restrictions on the values they can carry. For example,
`Some(mem::uninitialized::<bool>()).is_none()` could actually return `true`
because uninitialized memory could violate the invariant that a `bool` is always
`[00000000]` or `[00000001]` -- and Rust relies on this invariant when doing
enum layout. So, `mem::uninitialized::<bool>()` is instantaneous undefined
behavior just like `mem::uninitialized::<!>()`. This also affects `mem::zeroed`
when considering types where the all-`0` bit pattern is not valid, like
references: `mem::zeroed::<&'static i32>()` is instantaneous undefined behavior.

## Tracking uninitializedness in the type

An alternative way of representing uninitialized data is through a union type:

```rust
Expand All @@ -63,14 +77,16 @@ union MaybeUninit<T> {
```

Instead of creating an "uninitialized value", we can create a `MaybeUninit`
initialized with `uninit = ()`. Then, once we know that the value in the union
initialized with `uninit: ()`. Then, once we know that the value in the union
is valid, we can extract it with `my_uninit.value`. This is a better way of
handling uninitialized data because it doesn't involve lying to the type system
and pretending that we have a value when we don't. It also better represents
what's actually going on: we never *really* have a value of type `T` when we're
using `uninitialized::<T>`, what we have is some memory that contains either a
value (`value: T`) or nothing (`uninit: ()`), with it being the programmer's
responsibility to keep track of which state we're in.
responsibility to keep track of which state we're in. Notice that creating a
`MaybeUninit<T>` is safe for any `T`! Only when accessing `my_uninit.value`,
we have to be careful to ensure this has been properly initialized.

To see how this can replace `uninitialized` and fix bugs in the process,
consider the following code:
Expand Down Expand Up @@ -143,72 +159,90 @@ library as a replacement.
Add the aforementioned `MaybeUninit` type to the standard library:

```rust
#[repr(transparent)]
union MaybeUninit<T> {
pub union MaybeUninit<T> {
uninit: (),
value: T,
value: ManuallyDrop<T>,
}
```

The type should have at least the following interface
([Playground link](https://play.rust-lang.org/?gist=81f5ab9a7e7107c9583de21382ef4333&version=nightly&mode=debug&edition=2015)):

```rust
impl<T> MaybeUninit<T> {
/// Create a new `MaybeUninit` in an uninitialized state.
///
/// Note that dropping a `MaybeUninit` will never call `T`'s drop code.
/// It is your responsibility to make sure `T` gets dropped if it got initialized.
pub fn uninitialized() -> MaybeUninit<T> {
MaybeUninit {
uninit: (),
}
}

/// Create a new `MaybeUninit` in an uninitialized state, with the memory being
/// filled with `0` bytes. It depends on `T` whether that already makes for
/// proper initialization. For example, `MaybeUninit<usize>::zeroed()` is initialized,
/// but `MaybeUninit<&'static i32>::zeroed()` is not because references must not
/// be null.
///
/// Note that dropping a `MaybeUninit` will never call `T`'s drop code.
/// It is your responsibility to make sure `T` gets dropped if it got initialized.
pub fn zeroed() -> MaybeUninit<T> {
let mut u = MaybeUninit::<T>::uninitialized();
unsafe { u.as_mut_ptr().write_bytes(0u8, 1); }
u
}

/// Set the value of the `MaybeUninit`. The overwrites any previous value without dropping it.
pub fn set(&mut self, val: T) -> &mut T {
pub fn set(&mut self, val: T) {
unsafe {
self.value = val;
&mut self.value
self.value = ManuallyDrop::new(val);
}
}

/// Take the value of the `MaybeUninit`, putting it into an uninitialized state.
/// Extract the value from the `MaybeUninit` container. This is a great way
/// to ensure that the data will get dropped, because the resulting `T` is
/// subject to the usual drop handling.
///
/// # Unsafety
///
/// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized
/// state, otherwise undefined behaviour will result.
pub unsafe fn get(&self) -> T {
std::ptr::read(&self.value)
/// state, otherwise this will immediately cause undefined behavior.
pub unsafe fn into_inner(self) -> T {
std::ptr::read(&*self.value)
}

/// Get a reference to the contained value.
///
/// # Unsafety
///
/// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized
/// state, otherwise undefined behaviour will result.
/// state, otherwise this will immediately cause undefined behavior.
pub unsafe fn get_ref(&self) -> &T {
&self.value
&*self.value
}

/// Get a mutable reference to the contained value.
///
/// # Unsafety
///
/// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized
/// state, otherwise undefined behaviour will result.
/// state, otherwise this will immediately cause undefined behavior.
pub unsafe fn get_mut(&mut self) -> &mut T {
&mut self.value
&mut *self.value
}

/// Get a pointer to the contained value. This pointer will only be valid if the `MaybeUninit`
/// is in an initialized state.
/// Get a pointer to the contained value. Reading from this pointer will be undefined
/// behavior unless the `MaybeUninit` is initialized.
pub fn as_ptr(&self) -> *const T {
self as *const MaybeUninit<T> as *const T
unsafe { &*self.value as *const T }
}

/// Get a mutable pointer to the contained value. This pointer will only be valid if the
/// `MaybeUninit` is in an initialized state.
/// Get a mutable pointer to the contained value. Reading from this pointer will be undefined
/// behavior unless the `MaybeUninit` is initialized.
pub fn as_mut_ptr(&mut self) -> *mut T {
self as *mut MaybeUninit<T> as *mut T
unsafe { &mut *self.value as *mut T }
}
}
```
Expand Down

0 comments on commit acaf534

Please sign in to comment.