Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate size and stride for types #1397

Open
Amanieu opened this issue Dec 6, 2015 · 25 comments
Open

Separate size and stride for types #1397

Amanieu opened this issue Dec 6, 2015 · 25 comments
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.

Comments

@Amanieu
Copy link
Member

Amanieu commented Dec 6, 2015

The Swift ABI defines an interesting struct layout which defines the size of a type separately from its stride:

Note that this differs from C or LLVM's normal layout rules in that size and stride are distinct; whereas C layout requires that an embedded struct's size be padded out to its alignment and that nothing be laid out there, Swift layout allows an outer struct to lay out fields in the inner struct's tail padding, alignment permitting.

While rust doesn't define a stable ABI yet, I think using such a layout would be a breaking change due to assumptions that unsafe code make about mem::size_of.

In particular, unsafe code might assume that, when given a pointer let x: *mut T, all bytes from x to x + mem::size_of::<T>() can safely be overwritten. This would be incorrect when the padding bytes at the end of T contain fields of an outer struct.

Is such an optimization worth considering? From a quick look at the rust source, it seems that all uses of mem::size_of will continue to work fine if we define it as returning the stride (aligned size). I haven't looked at any external crates yet but I think they would continue to work fine as well. In any case this would be a breaking change and should be considered carefully.

@petrochenkov
Copy link
Contributor

There's some discussion about this in rust-lang/rust#28951

@eddyb
Copy link
Member

eddyb commented Dec 6, 2015

I like how Swift has implemented various features we've only discussed.
This is valuable, as it can attest to the practicality of having such features, without time-consuming investigation on our part.
Thanks @Amanieu for bringing it up!

@mahkoh
Copy link
Contributor

mahkoh commented Dec 6, 2015

Neither of the following must become UB due to changes made to object layout.

fn dummy(p: &T) {
    unsafe {
        // getelementptr inbounds
        p.offset(1);
    }
}

unsafe fn get_second(p: *const T) -> T {
    ptr::read(p.offset(1))
}

@eddyb
Copy link
Member

eddyb commented Dec 6, 2015

@mahkoh .offset already does C-like array indexing, i.e. p.offset(1) == (p as *const T).offset(aligned_size_of::<T>()) as *const T, whether or not size_of remains aligned_size_of (which it probably will, from the discussion I've seen so far).

@Amanieu
Copy link
Member Author

Amanieu commented Dec 6, 2015

I think this is easily solved by requiring memory allocation to use the aligned size of an object instead of the packed size. That way p.offset(1), which uses the aligned size, will still be only 1 byte past the end of a valid object, which avoid undefined behavior.

@mahkoh
Copy link
Contributor

mahkoh commented Dec 6, 2015

@eddyb: Then the object cannot be smaller than the aligned_size_of because otherwise the first function is UB.

@Amanieu: The first function can take a reference to T that lives on the stack, the heap, is a field in a struct, etc. Every instance of T that can be passed to the first function must have the aligned_size_of. Otherwise the behavior of offset is undefined.

@mahkoh
Copy link
Contributor

mahkoh commented Dec 6, 2015

@eddyb: Ah, I see what you meant. I'll update the code above.

@mahkoh
Copy link
Contributor

mahkoh commented Dec 6, 2015

And to bring size_of back into the picture:

fn dummy2(p: &T) {
    unsafe {
        // getelementptr inbounds
        (p as *const _ as *const u8).offset(mem::size_of::<T>());
    }
}

must not be UB either.

@Amanieu
Copy link
Member Author

Amanieu commented Dec 6, 2015

@mahkoh: This isn't UB because, from my understanding, getelementptr only requires the target to be within the same "allocated object". This means that it is OK for the resulting pointer to end up in the middle of another object, as long as that other object is part of the same allocation as the original object. This means that it should be fine as long as rust invokes the LLVM alloca instruction with the aligned size instead of the packed size when allocating objects on the stack.

@mahkoh
Copy link
Contributor

mahkoh commented Dec 20, 2015

Still, this optimization seems to be impossible due to the issue mentioned in the OP:

In particular, unsafe code might assume that, when given a pointer let x: *mut T, all bytes
from x to x + mem::size_of::() can safely be overwritten.

The user is actually guaranteed this property for Copy types by the documentation:

Types that can be copied by simply copying bits (i.e. memcpy).

While mem::size_of is not explicitly mentioned, this seems to be unambiguous. If the memcpy comment didn't exist, one might argue otherwise, but memcpy always requires an explicit size argument in bytes.

This was referenced Apr 7, 2016
@pnkfelix
Copy link
Member

Ah, I had overlooked this issue, but this was one of the things I was thinking about while writing the Allocator RFC (#1398). I believe the Layout API presented there is forward-compatible with separate notions of stride/size, but it would be best to double-check that claim carefully.

(Update: in fact I had already commented (rust-lang/rust#28951 (comment)) about having thoughts related to this topic, though that comment is quite wishy-washy about what direction we should go in...)

@pnkfelix pnkfelix added T-lang Relevant to the language team, which will review and decide on the RFC. I-nominated labels Apr 12, 2016
@pnkfelix
Copy link
Member

(nominating because I think the lang team should be aware of the question here.)

@petrochenkov
Copy link
Contributor

It looks like even arrays [T; N]/[T] can use the layout with omitted trailing padding (they are not included in the Swift document).
It doesn't create new problems in addition to the memcpy(sizeof)/memset(sizeof) problem already existing for non-array types.

Making arrays to follow the same rule has benefits of making [T; 3] binary compatible with (T, T, T), and [T; 1] binary compatible with T.

@Amanieu
Copy link
Member Author

Amanieu commented Apr 14, 2016

The first step towards supporting this would be to add mem::inner_size_of and mem::inner_size_of_val (names subject to the usual bikeshedding). These would initially return the same value as mem::size_of, but it would allow code to start preparing for the introduction of separate size and stride.

@Amanieu
Copy link
Member Author

Amanieu commented Apr 16, 2016

Funnily enough, it seems this would actually break the code I have written for atomic-rs.

This function might overwrite an adjacent value if it is embedded in the padding of the atomic object (which is just an UnsafeCell<T>).

The fix would be to use inner_size_of instead of size_of, however this shows that this feature can still be a breaking change for some code.

@canndrew
Copy link
Contributor

As pointed out in #1582, this would allow us to change the layout of tuples so that a tuple can be destructured into a single element and the rest of the tuple while still retaining a fairly space-efficient layout. That would, in turn, make it easier to support variadic generics.

@huonw
Copy link
Member

huonw commented Apr 22, 2016

My feeling is that inhibiting writing a 3-byte struct like (i16, bool) as a single 4 byte write everywhere (i.e. whether or not it is actually nested like ((i16, bool), bool)) is a larger downside than compressing any structs that actually look like that.

However, I don't have a great feeling for the number of actual odd-size structs, nor the number of nested ones, nor how often they hit memory. I guess one possibility for collecting data would be instrumenting the compiler to report/record this sort of info during compilation and run it on some programs (e.g. the bootstrap, servo, some things from the rest of the ecosystem) to get an idea (getting an idea for number of writes would require runtime instrumentation, I suppose). Theoretically it might good to also know about this in the precise of layout optimisations that reorder fields (e.g. (bool, i16) gets represented in memory as (i16, bool)) since those will likely maximise opportunities for stride != size to be beneficial (similarly, enum stuff like Option<i16> == (i16, bool)).

@Amanieu
Copy link
Member Author

Amanieu commented May 25, 2016

I just noticed a slight flaw with this scheme regarding DSTs. If the last element in a struct is even potentially unsized then it must remain the last element in the struct.

For example consider struct Mutex<T: ?Sized>(StaticMutex, UnsafeCell<T>). The UnsafeCell must be the last element in the struct, even if the struct is later monomorphized, since &Mutex<i32> can be coerced into &Mutex<Debug> for example.

This means that making a type support unsized type parameters with ?Sized can negatively impact its layout, even if in practice the vast majority of uses will be with sized types.

@pnkfelix
Copy link
Member

@Amanieu can't the layout vary based on the choice of types that the struct is instantiated with, and thus the constraints imposed on instantiations with an unsized type won't affect instantiations with all sized types?

@Amanieu
Copy link
Member Author

Amanieu commented May 25, 2016

@pnkfelix No. Look at the example I gave in my comment:

For example consider struct Mutex<T: ?Sized>(StaticMutex, UnsafeCell<T>). The UnsafeCell must be the last element in the struct, even if the struct is later monomorphized, since &Mutex<i32> can be coerced into &Mutex<Debug> for example.

Because &Mutex<i32> can be coerced into &Mutex<Debug>, both of these types must have the same layout, which in this case means that the last field in the struct must always be the potentially unsized field.

@pnkfelix
Copy link
Member

@Amanieu oh, sorry for not reading carefully enough

@retep998
Copy link
Member

This is basically rust-lang/rust#17027

@ssbr
Copy link
Contributor

ssbr commented Sep 15, 2022

This would also help C++ interop.

In C++20, C++ added the [[no_unique_address]] attribute. This attribute specifies that a field's tail padding can be reused by an adjacent object, similar to the #[no_trailing_padding] attribute proposed in rust-lang/rust#17027, but specified on the field, not the type. This is bad news for Rust/C++ interop, as Rust currently assumes, when given a &mut reference, that it can write to the whole object, including its tail padding. (It does not always do this, but it permits it -- see e.g. ptr::write_unaligned.) If [[no_unique_address]] is used pervasively in a C++ codebase (for instance, inside the STL), this means that in theory every C++ type with usable tail padding can only safely be shared with Rust behind a Pin<&mut T> or equivalent, and cannot be shared via &mut T even if it is otherwise compatible with Rust semantics (e.g. Clang's __is_trivially_relocatable returns true). (And if you do use Pin for this, then it can't be used by value, due to the existence of Pin::set.)

(In practice, this is not actually every type. Some types have no tail padding – for instance, int, std::string, or std::string_view. And in some ABIs, some types have tail padding, but that tail padding can't be reused by via [[no_unique_address]]. (In the Itanium ABI, this is all "POD" types, for a particular definition of "POD" -- this is done in part as a compromise to address the issue brought up in #1397 (comment) above.) But most classes that users define would only be safe to use through a &mut by coincidence, not by design.)

If we separated out size and stride, and specified that writes (via assignment, mem::swap, ptr::write, ptr::write_unaligned, etc.) only wrote out size number of bytes (which can be smaller than stride), then we could still use these types from Rust, as if they were any old Rust type -- without the use of Pin etc. Also, by doing this, we would make it possible to sensibly add some equivalent to [[no_unique_address]] to Rust, as proposed in rust-lang/rust#17027 -- allowing not just for more optimizations, but also allowing for matching the layout of a C++ type which uses [[no_unique_address]].)

Because this seems likely to be important for C++ interop, I'd be interested in taking this on / pushing this forward, but I don't really know what the next steps are. There is this issue in the RFCs repo, but there is not an RFC markdown file. Is the next step to write up a concrete proposal + rationale as a markdown file, following the RFC process?

@thomcc
Copy link
Member

thomcc commented Sep 15, 2022

Note that this would have to be done in a way that is backwards-compatible, and it should not be a pessimization. Both of make this very difficult.

Concretely we do need to keep that size_of::<T>() is the stride, since far too much code has been written that will break if it's less. So, we could add a new value occupied_bytes_of::<T>() (e.g. what you're calling the size -- the number of bytes that we write to on assignments), but by default this needs to be the size, otherwise it seems very likely to be a pessimization. I don't think this is a deal-breaker for C++ interop, but I think it would end up looking a lot like the #[no_trailing_padding] attribute.

In terms of your question, yes, the next steps would be either:

  1. File an RFC against this repo.
  2. Write a Pre-RFC to gather feedback, and post it on https://internals.rust-lang.org/. I would encourage this route since I think there's a lot of subtle constraints and open design possibilities here.

@ssbr
Copy link
Contributor

ssbr commented Dec 7, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests