Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What *exactly* is the behavior of reading from padding data? #395

Closed
Manishearth opened this issue Mar 5, 2023 · 10 comments · Fixed by #396
Closed

What *exactly* is the behavior of reading from padding data? #395

Manishearth opened this issue Mar 5, 2023 · 10 comments · Fixed by #396

Comments

@Manishearth
Copy link
Member

I've been working under the assumption that "padding bytes are uninit" and while you can write to them just fine, reading from them is always uninit.

I'm getting this in part from the glossary:

Copying Pad ignores the source byte, and writes any value to the target byte. Or, equivalently (in terms of Abstract Machine behavior), copying Pad marks the target byte as uninitialized.

@djkoloski has another plausible-sounding read of the situation:

The way I read it was that Pad is only used in the struct definition, and so only performs its special copy behavior when the struct is moved (i.e. as part of the struct). Basically the struct type is valid to overlay on top of the bytes, moving the value out won't modify those original padding bytes, and you can cast back to a [u8] and read them. If you copied/moved the value, then the new value will have uninit padding.

His contention is that if you know what bytes have been written to the padding area, it's safe to read them, for example in this code.

While trying to figure out how to teach uninitialized values in Rust, we need to know more precise semantics.


I suspect the best way to tease this apart might be to pose a bunch of scenarios, and fish for a Yes/No/Maybe/Depends answer ("Maybe" = "UCG hasn't ruled on specifics here, see <issue>", "Depends" = "not enough information in the example to give a single answer").

Let's say that I have a type Foo, for which mem::zeroed() produces a valid representation (i.e. all of its non-padding bytes have 0 as a valid representation). It also has padding. It could be something like struct Foo(u8, u32), I don't care too much.

Also, let's have the functions:

unsafe fn type_to_bytes<T>(x: &T) -> &[u8] {
   slice::from_raw_parts(x as *const T as *const u8, mem::size_of::<T>())
}
unsafe fn type_to_bytes_mut<T>(x: &mut T) -> &mut [u8] { ... }
unsafe fn bytes_to_type<T>(x: &[u8]) -> &T { ... } // assuming alignment etc
unsafe fn bytes_to_type_mut<T>(x: &mut [u8]) -> &mut T { ... } // assuming alignment etc

Scenario 1

let bytes = [0u8; mem::size_of::<Foo>()];
let foo: Foo = mem::transmute(bytes);
let view: &[u8] = type_to_bytes(&foo);
println!("{view:?}");

Is this print statement UB due to reading from uninitialized memory?

Scenario 2

let foo: Foo = Foo::new();
let view: &[u8] = type_to_bytes(&foo);
println!("{view:?}");

Is this print statement UB due to reading from uninitialized memory?

Scenario 3

let foo: Foo = Foo::new();
let view: &mut [u8] = type_to_bytes_mut(&,ut foo);
view[1] = 42; // assume 1 is a padding byte
println!("{}", view[1]);

The print will just print 42, yes?

Scenario 4

let foo: Foo = Foo::new();
{
    let view: &mut [u8] = type_to_bytes_mut(&mut foo);
    view[1] = 42; // assume 1 is a padding byte
}
let view2: &[u8] = type_to_bytes(&foo);
println!("{}", view2[1]);

Is this print UB? Does it print 42?

Scenario 5

let bytes = [0u8; mem::size_of::<Foo>()];
{
    let foo_ref: &mut Foo = bytes_to_type_mut(&bytes);
   *foo_ref = Foo::new();
}
println!("{}", bytes);

Is this print statement UB due to reading from uninitialized memory?

(Probably not?)

Scenario 6

let bytes = [0u8; mem::size_of::<Foo>()];
{
    let foo_ref: &mut Foo = bytes_to_type_mut(&bytes);
   *foo_ref = Foo::new();
   {
       let bytes: &[u8] = type_to_bytes(foo_ref);
       println!("{}", bytes);   
   }
}

Is this print statement UB due to reading from uninitialized memory? (ooh, tricky)

@CAD97
Copy link

CAD97 commented Mar 5, 2023

I believe that the current position is that a typed copy of Foo sets the padding bytes to an uninitialized state, but that otherwise the byte value of memory is always preserved.

  • Scenario 1: UB. Padding bytes were uninitialized on line 2.
  • Scenario 2: UB. Padding bytes were never initialized.
  • Scenario 3: Prints 421. Memory is untyped, and &mut [u8] cares not who else might care about the memory later.
  • Scenario 4: Prints 42. The same as the previous; looking at a value with a more permissive type doesn't change the bytes in memory.
  • Scenario 5: UB. Writing Foo::new() to memory writes uninitialized padding bytes, and the print tries to interpret that as u8.
  • Scenario 6: UB. The same as the previous example, printing the same bytes as the same nouninit type.

Footnotes

  1. Assuming that reference validity is always shallow, and that writing to a Copy type doesn't assert the previous value is valid.

@saethlin
Copy link
Member

saethlin commented Mar 5, 2023

Because you're working on learning materials: I think it is important to lay out and be very clear that Rust has (or the current position is that Rust has) typed reads/writes, and does not have typed memory. I think your initial confusion/uncertainty on these questions is a good demonstration of this. In my experience, newcomers to this subject really want to reason about the rules here as if Rust has typed memory, and it can be challenging to get out of that line of thinking.

@RalfJung
Copy link
Member

RalfJung commented Mar 5, 2023

I believe that the current position is that a typed copy of Foo sets the padding bytes to an initialized state, but that otherwise the byte value of memory is always preserved.

Should be uninitialized state, but otherwise I agree.

I think it is important to lay out and be very clear that Rust has (or the current position is that Rust has) typed reads/writes, and does not have typed memory.

👍
In particular, since padding is a type-driven concept, that means there is no such thing as "this byte of memory is a padding byte". Padding bytes only arise during particular operations, e.g. when doing a load/store at type Foo. In memory, there is nothing that would distinguish padding from non-padding.

@thomcc
Copy link
Member

thomcc commented Mar 5, 2023

There are cases in windows APIs where we need to perform reads of padding bytes. In particular, these usually involve a tail [WCHAR; 1] or something used to emulate a flexible array member which isn't quite located at the end of the struct. This is almost always behind a heap-allocated pointer that the kernel (or a system library in userspace, who can say) writes to that you read from.

So certainly typed memory preventing padding bytes from being accessed would be bad here.

@CAD97
Copy link

CAD97 commented Mar 5, 2023

... what kind of typo had me write "initialized" instead of "uninitialized" 🙃

@Manishearth
Copy link
Member Author

I think it is important to lay out and be very clear that Rust has (or the current position is that Rust has) typed reads/writes, and does not have typed memory

Yeah this is my understanding, but I've heard previously that padding is weird. I suspect it's just weird because it's one of the situations where this really gets tricky.

... what kind of typo had me write "initialized" instead of "uninitialized" upside_down_face

Sorry, the UCG team has made its ruling, everything is initialized now.

@Manishearth
Copy link
Member Author

It sounds like as a UCG issue this is resolved, but I'll leave this issue open so I can improve the glossary.

@Manishearth
Copy link
Member Author

Also I assume let foo = mem::zeroed() will still have uninitialized padding, since the "copy" from the temporary to the foo is typed.

@Manishearth
Copy link
Member Author

#396

@RalfJung
Copy link
Member

RalfJung commented Mar 7, 2023

There are cases in windows APIs where we need to perform reads of padding bytes. In particular, these usually involve a tail [WCHAR; 1] or something used to emulate a flexible array member which isn't quite located at the end of the struct. This is almost always behind a heap-allocated pointer that the kernel (or a system library in userspace, who can say) writes to that you read from.

So certainly typed memory preventing padding bytes from being accessed would be bad here.

For those situations you're not going to do copies at that struct type though? That would obviously fail to copy the flexible array part. So I don't see how there's a problem with padding here.

FWIW, padding is similarly reset during struct assignments in C.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants