What exactly is the behavior of reading from padding data? #395

Manishearth · 2023-03-05T02:31:12Z

I've been working under the assumption that "padding bytes are uninit" and while you can write to them just fine, reading from them is always uninit.

I'm getting this in part from the glossary:

Copying Pad ignores the source byte, and writes any value to the target byte. Or, equivalently (in terms of Abstract Machine behavior), copying Pad marks the target byte as uninitialized.

@djkoloski has another plausible-sounding read of the situation:

The way I read it was that Pad is only used in the struct definition, and so only performs its special copy behavior when the struct is moved (i.e. as part of the struct). Basically the struct type is valid to overlay on top of the bytes, moving the value out won't modify those original padding bytes, and you can cast back to a [u8] and read them. If you copied/moved the value, then the new value will have uninit padding.

His contention is that if you know what bytes have been written to the padding area, it's safe to read them, for example in this code.

While trying to figure out how to teach uninitialized values in Rust, we need to know more precise semantics.

I suspect the best way to tease this apart might be to pose a bunch of scenarios, and fish for a Yes/No/Maybe/Depends answer ("Maybe" = "UCG hasn't ruled on specifics here, see <issue>", "Depends" = "not enough information in the example to give a single answer").

Let's say that I have a type Foo, for which mem::zeroed() produces a valid representation (i.e. all of its non-padding bytes have 0 as a valid representation). It also has padding. It could be something like struct Foo(u8, u32), I don't care too much.

Also, let's have the functions:

unsafe fn type_to_bytes<T>(x: &T) -> &[u8] {
   slice::from_raw_parts(x as *const T as *const u8, mem::size_of::<T>())
}
unsafe fn type_to_bytes_mut<T>(x: &mut T) -> &mut [u8] { ... }
unsafe fn bytes_to_type<T>(x: &[u8]) -> &T { ... } // assuming alignment etc
unsafe fn bytes_to_type_mut<T>(x: &mut [u8]) -> &mut T { ... } // assuming alignment etc

Scenario 1

let bytes = [0u8; mem::size_of::<Foo>()];
let foo: Foo = mem::transmute(bytes);
let view: &[u8] = type_to_bytes(&foo);
println!("{view:?}");

Is this print statement UB due to reading from uninitialized memory?

Scenario 2

let foo: Foo = Foo::new();
let view: &[u8] = type_to_bytes(&foo);
println!("{view:?}");

Is this print statement UB due to reading from uninitialized memory?

Scenario 3

let foo: Foo = Foo::new();
let view: &mut [u8] = type_to_bytes_mut(&,ut foo);
view[1] = 42; // assume 1 is a padding byte
println!("{}", view[1]);

The print will just print 42, yes?

Scenario 4

let foo: Foo = Foo::new();
{
    let view: &mut [u8] = type_to_bytes_mut(&mut foo);
    view[1] = 42; // assume 1 is a padding byte
}
let view2: &[u8] = type_to_bytes(&foo);
println!("{}", view2[1]);

Is this print UB? Does it print 42?

Scenario 5

let bytes = [0u8; mem::size_of::<Foo>()];
{
    let foo_ref: &mut Foo = bytes_to_type_mut(&bytes);
   *foo_ref = Foo::new();
}
println!("{}", bytes);

Is this print statement UB due to reading from uninitialized memory?

(Probably not?)

Scenario 6

let bytes = [0u8; mem::size_of::<Foo>()];
{
    let foo_ref: &mut Foo = bytes_to_type_mut(&bytes);
   *foo_ref = Foo::new();
   {
       let bytes: &[u8] = type_to_bytes(foo_ref);
       println!("{}", bytes);   
   }
}

Is this print statement UB due to reading from uninitialized memory? (ooh, tricky)

The text was updated successfully, but these errors were encountered:

CAD97 · 2023-03-05T10:49:06Z

I believe that the current position is that a typed copy of Foo sets the padding bytes to an uninitialized state, but that otherwise the byte value of memory is always preserved.

Scenario 1: UB. Padding bytes were uninitialized on line 2.
Scenario 2: UB. Padding bytes were never initialized.
Scenario 3: Prints 42¹. Memory is untyped, and &mut [u8] cares not who else might care about the memory later.
Scenario 4: Prints 42. The same as the previous; looking at a value with a more permissive type doesn't change the bytes in memory.
Scenario 5: UB. Writing Foo::new() to memory writes uninitialized padding bytes, and the print tries to interpret that as u8.
Scenario 6: UB. The same as the previous example, printing the same bytes as the same nouninit type.

Assuming that reference validity is always shallow, and that writing to a Copy type doesn't assert the previous value is valid. ↩

saethlin · 2023-03-05T17:41:53Z

Because you're working on learning materials: I think it is important to lay out and be very clear that Rust has (or the current position is that Rust has) typed reads/writes, and does not have typed memory. I think your initial confusion/uncertainty on these questions is a good demonstration of this. In my experience, newcomers to this subject really want to reason about the rules here as if Rust has typed memory, and it can be challenging to get out of that line of thinking.

RalfJung · 2023-03-05T17:48:32Z

I believe that the current position is that a typed copy of Foo sets the padding bytes to an initialized state, but that otherwise the byte value of memory is always preserved.

Should be uninitialized state, but otherwise I agree.

I think it is important to lay out and be very clear that Rust has (or the current position is that Rust has) typed reads/writes, and does not have typed memory.

👍
In particular, since padding is a type-driven concept, that means there is no such thing as "this byte of memory is a padding byte". Padding bytes only arise during particular operations, e.g. when doing a load/store at type Foo. In memory, there is nothing that would distinguish padding from non-padding.

thomcc · 2023-03-05T17:57:05Z

There are cases in windows APIs where we need to perform reads of padding bytes. In particular, these usually involve a tail [WCHAR; 1] or something used to emulate a flexible array member which isn't quite located at the end of the struct. This is almost always behind a heap-allocated pointer that the kernel (or a system library in userspace, who can say) writes to that you read from.

So certainly typed memory preventing padding bytes from being accessed would be bad here.

CAD97 · 2023-03-05T19:28:15Z

... what kind of typo had me write "initialized" instead of "uninitialized" 🙃

Manishearth · 2023-03-05T20:03:05Z

I think it is important to lay out and be very clear that Rust has (or the current position is that Rust has) typed reads/writes, and does not have typed memory

Yeah this is my understanding, but I've heard previously that padding is weird. I suspect it's just weird because it's one of the situations where this really gets tricky.

... what kind of typo had me write "initialized" instead of "uninitialized" upside_down_face

Sorry, the UCG team has made its ruling, everything is initialized now.

Manishearth · 2023-03-05T20:07:38Z

It sounds like as a UCG issue this is resolved, but I'll leave this issue open so I can improve the glossary.

Manishearth · 2023-03-05T20:16:22Z

Also I assume let foo = mem::zeroed() will still have uninitialized padding, since the "copy" from the temporary to the foo is typed.

Manishearth · 2023-03-05T20:19:57Z

#396

RalfJung · 2023-03-07T11:01:23Z

There are cases in windows APIs where we need to perform reads of padding bytes. In particular, these usually involve a tail [WCHAR; 1] or something used to emulate a flexible array member which isn't quite located at the end of the struct. This is almost always behind a heap-allocated pointer that the kernel (or a system library in userspace, who can say) writes to that you read from.

So certainly typed memory preventing padding bytes from being accessed would be bad here.

For those situations you're not going to do copies at that struct type though? That would obviously fail to copy the flexible array part. So I don't see how there's a problem with padding here.

FWIW, padding is similarly reset during struct assignments in C.

This was referenced Mar 5, 2023

Section on uninitialized memory google/learn_unsafe_rust#5

Merged

We should have a consistent way of talking about rules to follow vs Actual Unsoundness google/learn_unsafe_rust#10

Open

Manishearth mentioned this issue Mar 5, 2023

Clarify padding #396

Merged

RalfJung closed this as completed in #396 Mar 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What exactly is the behavior of reading from padding data? #395

What exactly is the behavior of reading from padding data? #395

Manishearth commented Mar 5, 2023

CAD97 commented Mar 5, 2023 •

edited

Loading

saethlin commented Mar 5, 2023 •

edited

Loading

RalfJung commented Mar 5, 2023

thomcc commented Mar 5, 2023

CAD97 commented Mar 5, 2023

Manishearth commented Mar 5, 2023

Manishearth commented Mar 5, 2023

Manishearth commented Mar 5, 2023

Manishearth commented Mar 5, 2023

RalfJung commented Mar 7, 2023

What *exactly* is the behavior of reading from padding data? #395

What *exactly* is the behavior of reading from padding data? #395

Comments

Manishearth commented Mar 5, 2023

Scenario 1

Scenario 2

Scenario 3

Scenario 4

Scenario 5

Scenario 6

CAD97 commented Mar 5, 2023 • edited Loading

Footnotes

saethlin commented Mar 5, 2023 • edited Loading

RalfJung commented Mar 5, 2023

thomcc commented Mar 5, 2023

CAD97 commented Mar 5, 2023

Manishearth commented Mar 5, 2023

Manishearth commented Mar 5, 2023

Manishearth commented Mar 5, 2023

Manishearth commented Mar 5, 2023

RalfJung commented Mar 7, 2023

What exactly is the behavior of reading from padding data? #395

What exactly is the behavior of reading from padding data? #395

CAD97 commented Mar 5, 2023 •

edited

Loading

saethlin commented Mar 5, 2023 •

edited

Loading