Skip to content

Commit

Permalink
Document slice DSTs and size validity
Browse files Browse the repository at this point in the history
Closes #1263
  • Loading branch information
joshlf committed May 16, 2024
1 parent ab5d05c commit 2e6927b
Show file tree
Hide file tree
Showing 2 changed files with 268 additions and 0 deletions.
133 changes: 133 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,139 @@ Zerocopy provides byte-order aware integer types that support these
conversions; see the `byteorder` module. These types are especially useful
for network parsing.

## Dynamically-sized types

Zerocopy supports slice-based dynamically sized types ("slice DSTs", or just
"DSTs" for short) via the `KnownLayout` trait.

A slice DST is a type whose trailing field is either a slice or another
slice DST, rather than a type with fixed size. For example:

```rust
#[repr(C)]
struct PacketHeader {
...
}

#[repr(C)]
struct Packet {
header: PacketHeader,
body: [u8],
}
```

It can be useful to think of slice DSTs as a generalization of slices - in
other words, a normal slice is just the special case of a slice DST with
zero leading fields. In particular:
- Like slices, slice DSTs can have different lengths at runtime
- Like slices, slice DSTs cannot be passed by-value, but only by reference
or via other indirection such as `Box`
- Like slices, a reference (or `Box`, or other pointer type) to a slice DST
encodes the number of elements in the trailing slice field

### Slice DST layout

Just like other composite Rust types, the layout of a slice DST is not
well-defined unless it is specified using an explicit `#[repr(...)]`
attribute such as `#[repr(C)]`. [Other representations are
supported][reprs], but in this section, we'll use `#[repr(C)]` as our
example.

A `#[repr(C)]` slice DST is laid out [just like sized `#[repr(C)]`
types][repr-c-structs], but the presenence of a variable-length field
introduces the possibility of *dynamic padding*. In particular, it may be
necessary to add trailing padding *after* the trailing slice field in order
to satisfy the outer type's alignment, and the amount of padding required
may be a function of the length of the trailing slice field. This is just a
natural consequence of the normal `#[repr(C)]` rules applied to slice DSTs,
but it can result in surprising behavior. For example, consider the
following type:

```rust
#[repr(C)]
struct Foo {
a: u32,
b: u8,
z: [u16],
}
```

Assuming that `u32` has alignment 4 (this is not true on all platforms),
then `Foo` has alignment 4 as well. Here is the smallest possible value for
`Foo`:

```
byte offset | 01234567
field | aaaab---
><
```

In this value, `z` has length 0. Abiding by `#[repr(C)]`, the lowest offset
that we can place `z` at is 5, but since `z` has alignment 2, we need to
round up to offset 6. This means that there is one byte of padding between
`b` and `z`, then 0 bytes of `z` itself (denoted `><` in this diagram), and
then two bytes of padding after `z` in order to satisfy the overall
alignment of `Foo`. The size of this instance is 8 bytes.

What about if `z` has length 1?

```
byte offset | 01234567
field | aaaab-zz
```

In this instance, `z` has length 1, and thus takes up 2 bytes. That means
that we no longer need padding after `z` in order to satisfy `Foo`'s
alignment. We've now seen two different values of `Foo` with two different
lengths of `z`, but they both have the same size - 8 bytes.

What about if `z` has length 2?

```
byte offset | 012345678901
field | aaaab-zzzz--
```

Now `z` has length 2, and thus takes up 4 bytes. This brings our un-padded
size to 10, and so we now need another 2 bytes of padding after `z` to
satisfy `Foo`'s alignment.

Again, all of this is just a logical consequence of the `#[repr(C)]` rules
applied to slice DSTs, but it can be surprising that the amount of trailing
padding becomes a function of the trailing slice field's length, and thus
can only be computed at runtime.

[reprs]: https://doc.rust-lang.org/reference/type-layout.html#representations
[repr-c-structs]: https://doc.rust-lang.org/reference/type-layout.html#reprc-structs

### What is a valid size?

There are two places in zerocopy's API that we refer to "a valid size" of a
type. In normal casts or conversions, where the source is a byte slice, we
need to know whether the source byte slice is a valid size of the
destination type. In prefix or suffix casts, we need to know whether *there
exists* a valid size of the destination type which fits in the source byte
slice and, if so, what the largest such size is.

As outlined above, a slice DST's size is defined by the number of elements
in its trailing slice field. However, there is not necessarily a 1-to-1
mapping between trailing slice field length and overall size. As we saw in
the previous section with the type `Foo`, instances with both 0 and 1
elements in the trailing `z` field result in a `Foo` whose size is 8 bytes.

When we say "x is a valid size of `T`", we mean one of two things:
- If `T: Sized`, then we mean that `x == size_of::<T>()`
- If `T` is a slice DST, then we mean that there exists a `len` such that the instance of
`T` with `len` trailing slice elements has size `x`

When we say "largest possible size of `T` that fits in a byte slice", we
mean one of two things:
- If `T: Sized`, then we mean `size_of::<T>()` if the byte slice is at least
`size_of::<T>()` bytes long
- If `T` is a slice DST, then we mean to consider all values, `len`, such
that the instance of `T` with `len` trailing slice elements fits in the
byte slice, and to choose the largest such `len`, if any

## Cargo Features

- **`alloc`**
Expand Down
135 changes: 135 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,141 @@
//! conversions; see the [`byteorder`] module. These types are especially useful
//! for network parsing.
//!
//! # Dynamically-sized types
//!
//! Zerocopy supports slice-based dynamically sized types ("slice DSTs", or just
//! "DSTs" for short) via the [`KnownLayout`] trait.
//!
//! A slice DST is a type whose trailing field is either a slice or another
//! slice DST, rather than a type with fixed size. For example:
//!
//! ```
//! #[repr(C)]
//! struct PacketHeader {
//! # /*
//! ...
//! # */
//! }
//!
//! #[repr(C)]
//! struct Packet {
//! header: PacketHeader,
//! body: [u8],
//! }
//! ```
//!
//! It can be useful to think of slice DSTs as a generalization of slices - in
//! other words, a normal slice is just the special case of a slice DST with
//! zero leading fields. In particular:
//! - Like slices, slice DSTs can have different lengths at runtime
//! - Like slices, slice DSTs cannot be passed by-value, but only by reference
//! or via other indirection such as `Box`
//! - Like slices, a reference (or `Box`, or other pointer type) to a slice DST
//! encodes the number of elements in the trailing slice field
//!
//! ## Slice DST layout
//!
//! Just like other composite Rust types, the layout of a slice DST is not
//! well-defined unless it is specified using an explicit `#[repr(...)]`
//! attribute such as `#[repr(C)]`. [Other representations are
//! supported][reprs], but in this section, we'll use `#[repr(C)]` as our
//! example.
//!
//! A `#[repr(C)]` slice DST is laid out [just like sized `#[repr(C)]`
//! types][repr-c-structs], but the presenence of a variable-length field
//! introduces the possibility of *dynamic padding*. In particular, it may be
//! necessary to add trailing padding *after* the trailing slice field in order
//! to satisfy the outer type's alignment, and the amount of padding required
//! may be a function of the length of the trailing slice field. This is just a
//! natural consequence of the normal `#[repr(C)]` rules applied to slice DSTs,
//! but it can result in surprising behavior. For example, consider the
//! following type:
//!
//! ```
//! #[repr(C)]
//! struct Foo {
//! a: u32,
//! b: u8,
//! z: [u16],
//! }
//! ```
//!
//! Assuming that `u32` has alignment 4 (this is not true on all platforms),
//! then `Foo` has alignment 4 as well. Here is the smallest possible value for
//! `Foo`:
//!
//! ```text
//! byte offset | 01234567
//! field | aaaab---
//! ><
//! ```
//!
//! In this value, `z` has length 0. Abiding by `#[repr(C)]`, the lowest offset
//! that we can place `z` at is 5, but since `z` has alignment 2, we need to
//! round up to offset 6. This means that there is one byte of padding between
//! `b` and `z`, then 0 bytes of `z` itself (denoted `><` in this diagram), and
//! then two bytes of padding after `z` in order to satisfy the overall
//! alignment of `Foo`. The size of this instance is 8 bytes.
//!
//! What about if `z` has length 1?
//!
//! ```text
//! byte offset | 01234567
//! field | aaaab-zz
//! ```
//!
//! In this instance, `z` has length 1, and thus takes up 2 bytes. That means
//! that we no longer need padding after `z` in order to satisfy `Foo`'s
//! alignment. We've now seen two different values of `Foo` with two different
//! lengths of `z`, but they both have the same size - 8 bytes.
//!
//! What about if `z` has length 2?
//!
//! ```text
//! byte offset | 012345678901
//! field | aaaab-zzzz--
//! ```
//!
//! Now `z` has length 2, and thus takes up 4 bytes. This brings our un-padded
//! size to 10, and so we now need another 2 bytes of padding after `z` to
//! satisfy `Foo`'s alignment.
//!
//! Again, all of this is just a logical consequence of the `#[repr(C)]` rules
//! applied to slice DSTs, but it can be surprising that the amount of trailing
//! padding becomes a function of the trailing slice field's length, and thus
//! can only be computed at runtime.
//!
//! [reprs]: https://doc.rust-lang.org/reference/type-layout.html#representations
//! [repr-c-structs]: https://doc.rust-lang.org/reference/type-layout.html#reprc-structs
//!
//! ## What is a valid size?
//!
//! There are two places in zerocopy's API that we refer to "a valid size" of a
//! type. In normal casts or conversions, where the source is a byte slice, we
//! need to know whether the source byte slice is a valid size of the
//! destination type. In prefix or suffix casts, we need to know whether *there
//! exists* a valid size of the destination type which fits in the source byte
//! slice and, if so, what the largest such size is.
//!
//! As outlined above, a slice DST's size is defined by the number of elements
//! in its trailing slice field. However, there is not necessarily a 1-to-1
//! mapping between trailing slice field length and overall size. As we saw in
//! the previous section with the type `Foo`, instances with both 0 and 1
//! elements in the trailing `z` field result in a `Foo` whose size is 8 bytes.
//!
//! When we say "x is a valid size of `T`", we mean one of two things:
//! - If `T: Sized`, then we mean that `x == size_of::<T>()`
//! - If `T` is a slice DST, then we mean that there exists a `len` such that the instance of
//! `T` with `len` trailing slice elements has size `x`
//!
//! When we say "largest possible size of `T` that fits in a byte slice", we
//! mean one of two things:
//! - If `T: Sized`, then we mean `size_of::<T>()` if the byte slice is at least
//! `size_of::<T>()` bytes long
//! - If `T` is a slice DST, then we mean to consider all values, `len`, such
//! that the instance of `T` with `len` trailing slice elements fits in the
//! byte slice, and to choose the largest such `len`, if any
//!
//! # Cargo Features
//!
//! - **`alloc`**
Expand Down

0 comments on commit 2e6927b

Please sign in to comment.