-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guarantee slice representation #3775
base: master
Are you sure you want to change the base?
Conversation
fb67199
to
73ad005
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably clearly distinguish between "slices" and "pointers/references to slices", as they are different types (and the layout of slices themselves is already defined (edit: fixed link)).
text/0000-guaranteed-slice-repr.md
Outdated
The validity requirements for the in-memory slice representation are the same | ||
as [those documented on `std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html). | ||
Namely: | ||
|
||
* `data` must be non-null, valid for reads for `len * mem::size_of::<T>()` many bytes, | ||
and it must be properly aligned. This means in particular: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validity requirements for the in-memory slice representation are the same | |
as [those documented on `std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html). | |
Namely: | |
* `data` must be non-null, valid for reads for `len * mem::size_of::<T>()` many bytes, | |
and it must be properly aligned. This means in particular: | |
The validity requirements for the in-memory representation of raw slice pointers are simply the validity requirements of the fields (i.e. initialized). | |
The validity requirements for the in-memory representation of slice references are the same | |
as [those documented on `std::slice::from_raw_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html) for shared slice references, and [those documented on `std::slice::from_raw_parts_mut`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts_mut.html) for unique slice references. | |
Namely: | |
* `data` must be non-null, valid for reads (shared reference) or writes (mutable reference) for `len * mem::size_of::<T>()` many bytes, | |
and it must be properly aligned. This means in particular: |
Raw pointers to slices don't need to necessarily point to anything, and shared and unique references have different (but analogous) reqirements.
Also: We might not want to say "validity requirements" here, as the "validity requirements" of references, and the concept of "validity requirements" in general, are not necessarily settled: rust-lang/unsafe-code-guidelines#412 rust-lang/unsafe-code-guidelines#539
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might not want to say "validity requirements" here
Yes, "validity requirements" is an overloaded term. I'd be happy to change it here if you have suggestions for a better one!
text/0000-guaranteed-slice-repr.md
Outdated
} | ||
``` | ||
|
||
The precise ABI of slices is not guaranteed, so `&[T]` may not be passed by-value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wny not go ahead and define this ABI as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our ABI for this type is not exactly the ABI of the equivalent C struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there another way to encode a C type that has an equivalent ABI to whatever Rust does right now? Or otherwise define the ABI in a way that other languages could somehow comply.
And then promise Rust will not ever change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, its return ABI falls outside what can be obtained via C code on some platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thanks for explaining. That's.. really unfortunate.
One could still rigorously define the ABI to be what rustc does right now, and let callers in other languages use per-architecture inline assembly if they want to (this should be possible right?). But that's a much harder sell, and it's reasonable to leave this out of this RFC
#[repr(C)] | ||
struct Slice<T> { | ||
data: *const T, | ||
len: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this specific order? Why not length before pointer? Are there any plausible reasons to prefer one over the other, e.g., based on target architecture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that our current layout algorithm is best able to exploit any niche in len
if len
is first in this ordering. And we currently cannot use a "double niche" on two different fields very effectively. I can work on improving this, but I cannot promise a specific result yet.
Due to various quirks of our existing APIs, the len
niche would not apply to
- zero-sized types
- the "raw" version of this:
*mut [T]
and*const [T]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@workingjubile Why would len
have a niche? I would expect data
to be the field with a niche, as it is non-null for &[T]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The total size
len * mem::size_of::<T>()
of the slice must be no larger than isize::MAX, and adding that size to data must not “wrap around” the address space. See the safety documentation of pointer::offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, clever! Thanks for describing that-- I'll mention it in the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cramertj It's really important that the length have a niche in &[T]
, as has been FCPed by opsem in rust-lang/unsafe-code-guidelines#510, because that allows us to add range metadata to the lengths in every function that takes a &[T]
, and correspondingly lets LLVM know that, for example, (i + j)/2
on in-bounds indexes can't overflow (well, for non-ZST T
s, which TBH are the only ones that matter for loop-over-slice optimizations).
Sadly *const [T]
doesn't have the niche, though, because https://doc.rust-lang.org/std/ptr/fn.slice_from_raw_parts.html was stabilized before this was realized and because casting *const [()]
to *const [i32]
works and doesn't change the metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, also, we want to ensure to leave space for size-and-alignment based niches in references too.
I guess for a slice there's always the "zero size" case, so that simplifies to just alignment niches, but that still means (-align, align)
are all impossible. (Which simplifies to just "it's not null" for something with align-1.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as has been FCPed by opsem in rust-lang/unsafe-code-guidelines#510
Theoretically, we could have been even stricter, additionally requiring that ptr + length not overflow the address space.
5ec195d
to
09a64ea
Compare
standard library's ability to change and take advantage of new optimization | ||
opportunities. | ||
|
||
# Rationale and alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crABI provides a way to pass slices to/from C without locking in our representation #3470, that is probably worth mentioning somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a related way, maybe we could do something like guarantee the layout only in repr(C)
types or in extern "C"
function signatures? Which retains the flexibility to change ordering in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that February 2025 is the right time to stabilize this layout, but I do think it is the right time to start discussing how to do so, why we should do so, and removing any blockers to it. Even if we didn't stabilize this layout, we could provide more ways to statically resolve the layout details so that code could interact with it. It's somewhat embarrassing that ptr_metadata
isn't stabilized, after all.
Nonetheless, I have a question: what code specifically do we want to enable with this choice? Is it Rust code or external over-FFI code? In the latter case, refusing to stabilize any parameter-passing or return-passing details would remain somewhat frustrating and makes this less of an improvement over the status quo.
```rust | ||
#[repr(C)] | ||
struct Slice<T> { | ||
data: *const T, | ||
len: usize, | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the RFC does describe that the non-null requirements are upheld, because *const T where T: Sized
lacks a "niche", the way this is written could be misread to preclude our "niches", which obviously cannot be the case because we promise niche-based transformations in certain cases. There are two major niches we may wish to be able to exploit:
- The well-known "null pointer" niche for
data
. This one is a promised transformation. - A less well-known case is that if
T
is non-zero-sized, the size of the slice reference is bounded byisize::MAX
, which means that we have a very large niche on thelen
as the top bit can never be set.
This RFC should probably at least mention the guaranteed niche transformations exist and their implications, just to make clear it is not contradicting or overruling them. A weaker version was described in RFC 3391 but we strengthened that decision to generalize to similarly-shaped enums, not just Result or Option, post-hoc. I documented them in this test, I don't know where to look it up in the reference: https://github.com/rust-lang/rust/blob/ed49386d3aa3a445a9889707fd405df01723eced/tests/ui/rfcs/rfc-3391-result-ffi-guarantees.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another niche that isn't currently exploited but may be in the future is alignment, e.g. the pointer in &[u32]
is not only non-null but addresses 1, 2, and 3 are also invalid (assuming align_of::<u32>() == 4
). This is a smaller niche than the length being at most isize::MAX / size_of::<T>()
, but it also applies for ZSTs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, perhaps more generally we should be clear that "the representation is stable" still allows for transformations to happen "around it", particularly when it comes to ADT tag layout. This means that unsafe
code that interacts with a blob of bytes that happens to be EnumType<&[T]>
should be cautious. This was always the case but will only "get worse" over time.
The precise ABI of slice references is not guaranteed, so `&[T]` may not be | ||
passed by-value or returned by-value from an `extern "C" fn`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I don't think is well-described here, or at least I feel that based on some other comments it might be at risk of being glossed over when some people engage with this RFC, is that calling convention (AKA parameter and return passing) and in-memory layout are both very different things.
Often, "ABI" is used to describe both, because they are both technically part of the literal "application binary interface". Due to arguments and returns sometimes passing through the stack, and almost always when the number of them is large enough, in-memory layout becomes very relevant for almost every calling convention. But they're not the same.
In particular, every calling convention is a beautiful and unique snowflake. Thus, we could define the calling convention for &[T]
specifically when passed over extern "C"
and extern "C-unwind"
without defining it in other cases, and especially without defining it for extern "Rust"
(i.e. the "default" ABI).
BUT I would advise caution in promising compatibility in a very specific way like "just for extern "C"
" like that, as people will misread, misinterpret, and fail to consider edge cases. We cannot solve such entirely, but we should be very careful about tentative almost-promises. It might only be appropriate to mention as a future possibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to amend the RFC to clarify this point! The sentence you highlighted seems like it covers this to me, but perhaps I need to make it bolder/brighter/a headline.
Do you have other suggestions for how I could make this more straightforward?
#[repr(C)] | ||
struct Slice<T> { | ||
data: *const T, | ||
len: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that our current layout algorithm is best able to exploit any niche in len
if len
is first in this ordering. And we currently cannot use a "double niche" on two different fields very effectively. I can work on improving this, but I cannot promise a specific result yet.
Due to various quirks of our existing APIs, the len
niche would not apply to
- zero-sized types
- the "raw" version of this:
*mut [T]
and*const [T]
Slice references are represented with a pointer and length pair. Their in-memory | ||
layout is the same as a `#[repr(C)]` struct like the following: | ||
|
||
```rust | ||
#[repr(C)] | ||
struct Slice<T> { | ||
data: *const T, | ||
len: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wished to extend Rust to allow references to unsized types that encompass unsized types... for example, &[[T]]
, or more interestingly, &[dyn Trait]
... then one of the likely choices we would make is to have "triple-wide" (or larger!) references, instead of this "double-wide" one. Thus we would only apply the corresponding layout this RFC describes (or any other such specified layout) in the case of where T: Sized
.
I feel this "multiple metadata" possibility should be considered and either explicitly reserved as a future possibility or explicitly dismissed. I feel accepting this RFC as-is could be interpreted as foreclosing it, but there might forever be grumbling about how "it doesn't say...!" Thus if we don't want to do that, we should be clear, and if we're still open to that possibility, we should also be clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but IMO it seems difficult to imagine how &[[T]]
would work-- even with triple-wide references, the items in the collection could be heterogeneous.
I suppose we could make it work if all of the elements were the same type, e.g. coercing &[String]
into &[dyn Debug]
.
Personally, I'd be happy to restrict this RFC to specify that this only applies to &[T] where T: Sized
. Would that resolve your concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could make the RFC compatible with &[[U]]
by simply relaxing the bound to T: ?Sized
:
&[[U]]
->
{ data: *const [U], len: usize }
->
{ data: { data: *const U, len: usize }, len: usize }
so I don't see how this definition of Slice<T>
forecloses the multi-meta possibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but IMO it seems difficult to imagine how
&[[T]]
would work-- even with triple-wide references, the items in the collection could be heterogeneous.
You could either require homogeneity like you suggest, or you could have the metadata be its own pointer to a separate region of memory, sharing the lifetime of the data pointer. (I wrote a very experimental crate that does something like the latter: https://github.com/Jules-Bertholet/unsized-vec)
But since current Rust doesn’t support this, and nothing in this RFC would interfere with someday adopting either option, it’s not a concern IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could make the RFC compatible with
&[[U]]
by simply relaxing the bound toT: ?Sized
another valid extension is to just insert more fields for metadata between the pointer and length or change the length to be a struct-- these match how the pointer metadata APIs work more closely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another valid extension is to just insert more fields for metadata between the pointer and length or change the length to be a struct-- these match how the pointer metadata APIs work more closely.
changing the metadata (length) from usize
to a struct would be rust-lang/libs-team#246
You could either require homogeneity like you suggest, or you could have the metadata be its own pointer to a separate region of memory, sharing the lifetime of the data pointer.
if we still want to have zero-cost unsizing the type &[[U]]
must only be able to represent rectangular matrices:
// unsizing from &[i32; 5] to &[i32] is free
let y: &[i32] = &[1, 2, 3, 4, 5];
// unsizing a `&[[f64; 3]; 2]` to `&[[f64]]` should also not involve any allocation.
let x: &[[f64]] = &[[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing the metadata (length) from
usize
to a struct would be rust-lang/libs-team#246
I don't really think so, that's just wrapping the metadata type in a struct whenever you're not using it as part of a pointer type. This is explicitly similar to mem::Discriminant
which wraps an enum's discriminant in a struct, even though the discriminant might be a well-known type, e.g. u8
for a repr(u8)
fieldless enum.
for [T]
where T: Sized
, the metadata type would still be usize
; the metadata could be a struct or some other type when using custom extern
types -- that kind of custom extern
type doesn't exist in Rust yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both homogeneous &[dyn Trait]
and rectangular &[[T]]
sound useful, especially because the retangular case benefits from more optimizations.
We could consider unsizing options for more general types too like struct Foo<T>(T,T)
, so then a homogeneous Foo<dyn Trait>
.
## Compatibility with C++ `std::span` | ||
|
||
The largest drawback of this layout and set of validity requirements is that it | ||
may preclude `&[T]` from being representationally equivalent to C++'s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep getting tripped up reading this section by thinking it says we're committing to something that diverges from C++. I would phrase it as something like
Because C++ does not guarantee a layout for
std::span
, stabilizing one for Rust now means C++ could choose to stabilize a different layout and prevent the two from ever being compatible. However, this is unlikely to ever happen because MSVC uses the same representation proposed in this RFC and has stringent stability requirements.
* Unlike Rust, `std::span` allows the `data` pointer to be `nullptr`. One | ||
possibile workaround for this would be to guarantee that `Option<&[T]>` uses | ||
`data: std::ptr::null(), len: 0` to represent the `None` case, making | ||
`std::span<T>` equivalent to `Option<&[T]>` for non-zero-sized types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be worth a perf run to see if there's any measurable impact of doing this.
Types like `*const ()` / `&()` are widely used to pass around pointers today. | ||
We cannot make them zero-sized, and it would be surprising to make a | ||
different choice for `&[()]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should have warned against this and made everyone use *const u8
instead. Seems impossible to make a change like this now though.
Either option may offer modest performance benefits for highly generic code | ||
which happens to create empty slices of uninhabited types, but this is unlikely | ||
to be worth the cost of maintaining a special case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, we could opt not to guarantee this since no one seems to have much of a use case for this right now anyway.
My use-case is FFI code (specifically, autogenerated C++ interop). Rust code can usually rely on calls to |
size and alignment of slices were not guaranteed. However, the Rust compiler | ||
accepts the `repr(C)` declaration above without warning. | ||
|
||
# Guide-level explanation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You appear to have deleted the reference-level explanation section from the template.
I would encourage you to split this back into two parts: the guide level informal explanation that talks about how it allows you to read things from C, but then also have a reference-level explanation of exactly what it's guaranteeing, without ever using the word "layout", because "layout" means too many different things to different people. (Some people think it means just std::alloc::Layout
, some include field offsets, some include validity invariants, etc.)
I'm absolutely in favour of doing this RFC, see this old Zulip thread, so long as it's clearly scoped to the parts that really are uncontroversial.
Notably, I don't think that any description that include *const T
is correct in a reference-level section if it applies to &[_]
references, because it at least needs to be NonNull<T>
-- but it also needs to be aligned, and a bunch of other things.
So I want to see a precise, positively-specified list of exactly what we're RFCing.
A quick stab at it:
- we're only talking about
{&,&mut,*const,*mut} [T]
withT: Sized
- we're only talking about platforms where
sizeof(*const impl Thin)
==sizeof(usize)
. - the size is
2 * sizeof(usize)
. - we're only talking about platforms where
alignof(*const impl Thin)
==alignof(usize)
. - the align is
alignof(usize)
. - the pointer component is at offset
0
, which points to the first element - the length component is at offset
sizeof(usize)
, and contains the length in units of elements (not of bytes)
I don't know if we stop there, but if that's all we're committing to I think it's uncontroversial -- and I bet it's what a whole bunch of unsafe code in the ecosystem already assumes anyway, de facto, since it's been true since at least 1.1.0.
I don't know what, if anything, we want to promise or require about the validity invariant, especially since that's already defined to be different between &[i32]
and *const [i32]
. Perhaps this RFC should just not say anything about it -- reading a rust slice from C doesn't need to know anything about it.
This RFC guarantees the representation of
&str
and&[T]
to be equivalent to arepr(C)
struct containing a pointer and a length.Rendered