-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Padding Bytes Guarantees? #174
Comments
I also use a similar unsafe marker to make safe zeroing https://docs.rs/lokacore/0.1.0/lokacore/trait.Zeroable.html, so I suppose this applies to me as well. |
Indeed, this should be properly guaranteed somewhere, at some point. My current understanding is that, for structs, your condition is sufficient: in our current implementation, padding bytes can and do change as the struct gets copied, so rustc cannot rely on padding having a particular value. |
Part of why I'm cautious is that I can totally imagine optimizations that are enabled by assuming a particular value. For instance, if you assume all padding of a type Making this guarantee will rule out performing that optimization. (That said, I think the benefits of that optimization pale in comparison to the sort of optimizations enabled by |
@jswrenn absolutely agreed. That's why I said current implementation. (I added emphasis on that now.) |
Assuming anything about padding bytes would make it impossible to use |
See oconnor663/shared_child.rs#17: it would also make it impossible to initialize a struct with So indeed I find it hard to imagine that we'd ever rely on padding having a particular value, or being preserved, or anything like that. If you want to be really sure, I'd expect this to be a rather uncontroversial RFC; the main issue is wording this precisely enough. |
I drafted an RFC around the time I filed this issue, but put it on hold for typelayout. The field-by-field initialization case is especially compelling. I'll make a PR in the next day or so. :) |
And the key issue with oconnor663/shared_child.rs#17 is that I think a lot of structs in |
@oconnor663 I've added a section about this to my RFC's motivation section and filed an issue on libc documenting the problem: rust-lang/libc#1451 |
Assuming anything different from C about padding bytes for let mut x: repr_c_type = c_ffi_fn();
c_ffi_fn2(&mut x); When passing a AFAICT, this sub-section of the nomicon (https://doc.rust-lang.org/nomicon/ffi.html#interoperability-with-foreign-code) guarantees that repr(C) types, including their padding, are C compatible. We could tune what values padding type supports as long as doing so does not break the C FFI use case. |
@gnzlbg I'm convinced that the C FFI guarantees imply that Rust must never assume that padding bytes for However, there is still no such guarantee for |
@gnzlbg raised the point that This still leaves the question of whether it is okay to transmute an all-0 integer slice -- or any integer slice -- to a struct involving padding (assuming the fields themselves are okay with whatever the integer values at their offsets are). Everyone seems to strongly agree that the answer is "yes, when creating a value of struct type you can have any data in the padding". I think the only open question is whether this needs an RFC, and where this should be documented. Is that an accurate summary of the current status? As pointed out before, for
However, even if we end up wanting to use padding as niches, I strongly feel we should make sure that "0" is not considered a niche, to keep |
I don't remember exactly what the issues where with using the padding bytes of a |
Consider |
Worse, it's not just the compiler, as users can and do write unsafe code that copies all |
Yeah I think I agree. So probably the only open question here is whether it is worth having an RFC to set this in stone, or if we can just submit PRs to appropriate sections of Reference and Nomicon. |
So given: let mut x: Option<(u8, u16)> = Some(y);
let mut ref: &mut (u8, u16) = x.as_mut().unwrap();
*ref = (42, 42); The write through
We can't do anything about untyped memcpys, and I agree that users can and do write this code. I don't know whether users that are writing this code today are relying on unspecified or undefined behavior or not. Even if they are, I don't know how much code this would break, potentially a lot of code. Do we have an example of something users are already doing here that we just cannot break ? I imagine that something like abomonation might be using untyped memcpys for fast serialization, but abomonation is quite clear that it might not be a sound abstraction. Does serde do something like this as well? |
"Untyped" memcpys are foundational, and for good reasons. Even just in std, just off the top of my head, |
Why does How would the untyped |
It's less that e.g. At least with the stride/size distinction, you can still use
PS:
Trailing padding aside, for large types it uses an algorithm that swaps in fixed-size chunks. This caps the size of the temporary buffer needed. Taking care to preserve internal padding would greatly complicate that code and slow it down, and even if you did that, it can't be phrased as a typed copy without changing the space usage for the worse. |
This is a good point. It is not completely clear to me that this is a net negative. For example, we currently emit a memcpy for 6 bytes for
This is an answer to myself, but @rkruppe is right in that these are broken. For example, consider |
Oh my, yes. You just convinced me even more that we should not do this.^^ |
Not sure I understood the reasoning here. If Rust guarantees that padding bytes are zero. Then we should be able to As mentioned here, Rust already uses non-zero discriminant for None because I guess we still have to somehow deal with initializing a struct through |
To go off of @manuthambi, we could introduce a type like this: #[repr(u8)]
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord)]
enum ZeroPadding {
Zero = 0,
} The only valid value is zero, so it is still safe to call |
But how would we even guarantee that? LLVM doesn't give us a way to do this, as far as I know. But also, I fail to see the connection to the original question here. That question was "is it allowed to set padding bytes to 0". The answer is "in current implementations, yes" and "most likely this will also be part of an eventual spec". |
I am not familiar with rustc-llvm interface, but along the lines of what @KrishnaSannasi said, can't we just not tell LLVM that it is padding? Just mark it as a valid byte with a 0 value. I am worried that if we declare padding bytes can be anything, then unsafe code will be written assuming that, and we will close off these optimizations forever. |
Regarding This could be done by either picking one field to "own" the padding bytes, so when that field is written, all of the padding is also set to zero, or each field could "own" the padding which follows it, so when a field is written the compiler must also write zeros to the following padding, This works, but only if |
The language in the unsafe code guidelines here allowing copying |
At least for For |
Okay I think I understood what you were going for in these comments. I think you were suggesting we change the way Rust types are laid out, and introduce a "different kind of padding". As I said that's unrelated to UCG, which is tasked with describing the current behavior and suggesting rules (that could become normative through an RFC) for how that behavior is permitted to change in the future. I opened a rustc issue for the suggestion: rust-lang/rust#70230 |
Exactly! Thanks for opening the new issue :) |
FYI, |
Yeah but that doesn't say anything about whether more "lossy" ways of copying would also work. |
The context of this issue is determining when it is safe for a struct to implement
FromZeros
(à lazerocopy::FromBytes
), a marker trait implemented for a typeT
iff any sequence of initialized zeroed bytes of lengthsize_of::<T>()
is a valid instance ofT
. (For such types,mem::zeroed
is safe!)Per the reference:
Therefore: for a struct
T
to beFromZeros
, the fields ofT
must also beFromZeros
.I'm trying to determine whether this requirement alone is sufficient.
My understanding is that the padding bytes between fields is expressly undefined (i.e., it can be anything, including uninitialized). Will
Rust
ever rely on padding having a particular value?If so, for a struct
T
to beFromZeros
, it must have no padding, as a0u8
might not correspond to a valid padding byte forT
. This poses a very severe limitation on what structs could beFromZeros
(and, by extension,FromBytes
).The text was updated successfully, but these errors were encountered: