pointer addition semantics are unclear #130211

jwong101 · 2024-09-10T23:26:12Z

Location

core::ptr::add

Lines 810 to 854 in 33855f8

    
           /// Adds an offset to a pointer (convenience for `.offset(count as isize)`). 
        
           /// 
        
           /// `count` is in units of T; e.g., a `count` of 3 represents a pointer 
        
           /// offset of `3 * size_of::<T>()` bytes. 
        
           /// 
        
           /// # Safety 
        
           /// 
        
           /// If any of the following conditions are violated, the result is Undefined Behavior: 
        
           /// 
        
           /// * The computed offset, `count * size_of::<T>()` bytes, must not overflow `isize`. 
        
           /// 
        
           /// * If the computed offset is non-zero, then `self` must be derived from a pointer to some 
        
           ///   [allocated object], and the entire memory range between `self` and the result must be in 
        
           ///   bounds of that allocated object. In particular, this range must not "wrap around" the edge 
        
           ///   of the address space. 
        
           /// 
        
           /// Allocated objects can never be larger than `isize::MAX` bytes, so if the computed offset 
        
           /// stays in bounds of the allocated object, it is guaranteed to satisfy the first requirement. 
        
           /// This implies, for instance, that `vec.as_ptr().add(vec.len())` (for `vec: Vec<T>`) is always 
        
           /// safe. 
        
           /// 
        
           /// Consider using [`wrapping_add`] instead if these constraints are 
        
           /// difficult to satisfy. The only advantage of this method is that it 
        
           /// enables more aggressive compiler optimizations. 
        
           /// 
        
           /// [`wrapping_add`]: #method.wrapping_add 
        
           /// [allocated object]: crate::ptr#allocated-object 
        
           /// 
        
           /// # Examples 
        
           /// 
        
           /// ``` 
        
           /// let s: &str = "123"; 
        
           /// let ptr: *const u8 = s.as_ptr(); 
        
           /// 
        
           /// unsafe { 
        
           ///     assert_eq!(*ptr.add(1), b'2'); 
        
           ///     assert_eq!(*ptr.add(2), b'3'); 
        
           /// } 
        
           /// ``` 
        
           #[stable(feature = "pointer_methods", since = "1.26.0")] 
        
           #[must_use = "returns a new pointer rather than modifying its argument"] 
        
           #[rustc_const_stable(feature = "const_ptr_offset", since = "1.61.0")] 
        
           #[inline(always)] 
        
           #[cfg_attr(miri, track_caller)] // even without panics, this helps for Miri backtraces 
        
           pub const unsafe fn add(self, count: usize) -> Self

core::ptr::offset

rust/library/core/src/ptr/const_ptr.rs

Lines 349 to 393 in 33855f8

    
           /// Adds an offset to a pointer. 
        
           /// 
        
           /// `count` is in units of T; e.g., a `count` of 3 represents a pointer 
        
           /// offset of `3 * size_of::<T>()` bytes. 
        
           /// 
        
           /// # Safety 
        
           /// 
        
           /// If any of the following conditions are violated, the result is Undefined Behavior: 
        
           /// 
        
           /// * The computed offset, `count * size_of::<T>()` bytes, must not overflow `isize`. 
        
           /// 
        
           /// * If the computed offset is non-zero, then `self` must be derived from a pointer to some 
        
           ///   [allocated object], and the entire memory range between `self` and the result must be in 
        
           ///   bounds of that allocated object. In particular, this range must not "wrap around" the edge 
        
           ///   of the address space. 
        
           /// 
        
           /// Allocated objects can never be larger than `isize::MAX` bytes, so if the computed offset 
        
           /// stays in bounds of the allocated object, it is guaranteed to satisfy the first requirement. 
        
           /// This implies, for instance, that `vec.as_ptr().add(vec.len())` (for `vec: Vec<T>`) is always 
        
           /// safe. 
        
           /// 
        
           /// Consider using [`wrapping_offset`] instead if these constraints are 
        
           /// difficult to satisfy. The only advantage of this method is that it 
        
           /// enables more aggressive compiler optimizations. 
        
           /// 
        
           /// [`wrapping_offset`]: #method.wrapping_offset 
        
           /// [allocated object]: crate::ptr#allocated-object 
        
           /// 
        
           /// # Examples 
        
           /// 
        
           /// ``` 
        
           /// let s: &str = "123"; 
        
           /// let ptr: *const u8 = s.as_ptr(); 
        
           /// 
        
           /// unsafe { 
        
           ///     assert_eq!(*ptr.offset(1) as char, '2'); 
        
           ///     assert_eq!(*ptr.offset(2) as char, '3'); 
        
           /// } 
        
           /// ``` 
        
           #[stable(feature = "rust1", since = "1.0.0")] 
        
           #[must_use = "returns a new pointer rather than modifying its argument"] 
        
           #[rustc_const_stable(feature = "const_ptr_offset", since = "1.61.0")] 
        
           #[inline(always)] 
        
           #[cfg_attr(miri, track_caller)] // even without panics, this helps for Miri backtraces 
        
           pub const unsafe fn offset(self, count: isize) -> *const T

Summary

The documentation for core::ptr::add currently states that it:

rust/library/core/src/ptr/const_ptr.rs

Line 810 in 33855f8

/// Adds an offset to a pointer (convenience for `.offset(count as isize)`).

Except it's not a convenience for just doing .offset(count as isize), both according to Miri and the constant interpreter:

#[no_mangle]
pub const fn mul() -> i32 {
    const {
        let x = &[0i32; 2];
        let x = x.as_ptr().wrapping_add(1);
        unsafe {
            // has UB according to MIRI and fails compilation on nightly if you comment this out
            // mul nuw i64 -1, 4 is poison
            // x.add(!0).read();
            
            // DB and passes compilation
            // mul nsw i64 -1, 4 is defined
            x.offset(-1).read();
        }
        x[0]
    }
}

So one might infer that the following requirement:

rust/library/core/src/ptr/const_ptr.rs

Line 819 in 33855f8

    
           /// * The computed offset, `count * size_of::<T>()` bytes, must not overflow `isize`.

means that the count * size_of::<T>() must not overflow in the unsigned sense for add and signed sense for offset.

However, it's unclear if overflowing isize means the unsigned multiplication result has to be positive, or if the unsigned multiplication also has to not wrap in the signed sense. Both Miri and the constant interpreter allow the following:

#[no_mangle]
pub const fn mulnsw() -> i32 {
    const {
        let x = &[0i32; 2];
        let x = x.as_ptr().wrapping_add(1);
        let offset = !0usize >> 2; // will be -4 when multiplied
        let mut ret = 0;
        unsafe {
            // has no UB according to MIRI and does not fail compilation on nightly
            // does not wrap in the unsigned sense
            // however wraps in the signed sense
            // has UB in our codegen
            ret = x.add(offset).read();
            
            // UB in codegen due to signed overflow
            // both MIRI and constant interpreter fail
            // ret = x.offset(offset as isize).read();
        }
        ret
    }
}

However, x.add(offset) will also get lowered to getelementptr inbounds i32, %x, i64 %offset, which has UB according to the LLVM langref.

The multiplication of an index by the type size does not wrap the pointer index type in a signed sense (mul nsw).

If the core::ptr::add does have defined behavior in the above case, then we can't directly use getelementptr inbounds for types larger than a byte, without doing a mul nuw beforehand, and then doing getelementptr inbounds i8. However, if it's UB, then the docs should make it clear.

So core::ptr::add still carries the responsibility of preventing signed overflow in the multiplication(at least according to our codegen semantics). However, it's unclear if "must not overflow isize", means that the unsigned result has to fit in [0, isize::MAX], as the following passes:

#[no_mangle]
pub const fn addaddr() -> i32 {
    const {
        let x = &[0i32; 2];
        let x = x.as_ptr().wrapping_add(1).cast::<u8>();
        unsafe {
            // -1 * 1 doesn't overflow in both the signed and unsigned sense
            // however the result doesn't fit in [0, isize::MAX]
            // passes MIRI and the constant interpreter
            x.byte_add(!0).read();
        }
        x[0]
    }
}

To summarize:

Does the multiplication of count * size_of::<T>() need to not overflow in both the signed and unsigned sense for core::ptr::add?

In core::ptr::add, is the resulting offset treated as a signed integer or unsigned integer for the following requirement:

rust/library/core/src/ptr/const_ptr.rs

Lines 821 to 824 in 33855f8

    
           /// * If the computed offset is non-zero, then `self` must be derived from a pointer to some 
        
           ///   [allocated object], and the entire memory range between `self` and the result must be in 
        
           ///   bounds of that allocated object. In particular, this range must not "wrap around" the edge 
        
           ///   of the address space.

Rationale for inquiry:

LLVM recently added a nuw attribute to the getelementptr instruction in order to inform the optimizer that the offset is non-negative. I was checking the core::ptr::add documentation to see if we could use it for that, however, I found some contradictory information w.r.t the docs and the current behavior of the constant interpreter and Miri.

The text was updated successfully, but these errors were encountered:

lukas-code · 2024-09-11T01:33:17Z

This is a bug in const eval and miri. ptr.add(off) is always supposed to behave exactly like ptr.offset(off as isize).

regression in #128482 cc @RalfJung

We are just computing off * size_of<T>() here without casting to isize first, so I suspect that to be the issue:

rust/compiler/rustc_const_eval/src/interpret/operator.rs

Lines 306 to 316 in 6f7229c

    
           // We cannot overflow i64 as a type's size must be <= isize::MAX. 
        
           let pointee_size = i64::try_from(pointee_layout.size.bytes()).unwrap(); 
        
           let pointee_size = ImmTy::from_int(pointee_size, right.layout); 
        
           // Multiply element size and element count. 
        
           let (val, overflowed) = self 
        
               .binary_op(mir::BinOp::MulWithOverflow, right, &pointee_size)? 
        
               .to_scalar_pair(); 
        
           // This must not overflow. 
        
           if overflowed.to_bool()? { 
        
               throw_ub!(PointerArithOverflow) 
        
           }

@rustbot label -needs-triage -A-docs +C-bug +T-compiler +A-const-eval +regression-from-stable-to-beta

nikic · 2024-09-11T07:06:16Z

This is a bug in const eval and miri. ptr.add(off) is always supposed to behave exactly like ptr.offset(off as isize).

Surely that is not the case? We want to emit getelementptr inbounds nuw for ptr.add(), making use of the fact that the index is non-negative. We added this attribute in LLVM pretty much specifically so ptr.add() can use it. We could obviously add yet another method to encode this, but given that we already have a variant that exists specifically to add an unsigned offset, I would hope that nobody wants to go there.

To me the current wording "The computed offset, count * size_of::() bytes, must not overflow isize" implies no overflow in both the unsigned and signed sense. Unsigned because this is an unsigned multiplication, and signed due to the additional restriction that the result must fit in isize as well.

cc @scottmcm @RalfJung

RalfJung · 2024-09-11T07:51:21Z

My interpretation of "computation must not overflow an isize" is "do the multiplication on mathematical values; the result must fit in the value range of an isize". But you are right that this contradicts the part about "convenience for offset(x as isize)". Regarding the Miri behavior, it seems the intrinsic at some point got changed to directly accept a usize <#110837> but opsem/miri teams did not get informed so the logic in Miri does not make sense for this way of using the intrinsic.

RalfJung · 2024-09-11T15:05:01Z

#130229 should fix this. Thanks for catching this, @jwong101 !

byte_add(!0) multiplies usize::MAX * 1 (not -1 * 1!), the result is usize::MAX, and that does not fit in an isize, so this is always UB because it "overflows an isize".

In core::ptr::add, is the resulting offset treated as a signed integer or unsigned integer for the following requirement:

Since the resulting offset must fit in an isize, this should not make a difference. (I would say it is treated as a mathematical integer.)

RalfJung · 2024-09-17T16:41:08Z

I have a clarification question about this example:

#[no_mangle]
pub const fn mul() -> i32 {
    const {
        let x = &[0i32; 2];
        let x = x.as_ptr().wrapping_add(1);
        unsafe {
            // has UB according to MIRI and fails compilation on nightly if you comment this out
            // mul nuw i64 -1, 4 is poison
            // x.add(!0).read();
            
            // DB and passes compilation
            // mul nsw i64 -1, 4 is defined
            x.offset(-1).read();
        }
        x[0]
    }
}

Where does the mul nuw come from? As far as I can tell by looking at our codegen backend, our inbounds_gep does not care whether the RHS is signed or unsigned. So under current codegen, add(x) truly is equivalent to offset(x as isize), right? Just Miri and const-eval do not treat it equivalently? And we want codegen to treat it not equivalently, but that has not happened yet?

lukas-code · 2024-09-17T17:33:30Z

So under current codegen, add(x) truly is equivalent to offset(x as isize), right? Just Miri and const-eval do not treat it equivalently?

That's also what I figured by looking at the source code and the reason why I initially thought that Miri/CTFE treating them differently was just a bug.

RalfJung · 2024-09-17T17:37:33Z

It was definitely an accident -- Miri did multiplication and overflow check at "whatever the type of the count is", so when usize counts were allowed, those used usize multiplication (and then the result got implicitly cast into isize). That code was only ever tested for isize counts, but it happened to do "something" for usize that doesn't explode immediately, but ended up with somewhat odd UB checks.

RalfJung · 2024-09-17T17:38:52Z

With #130239, the unsigned intrinsic is now properly supported by Miri (the only change required was to make the final cast to isize be UB if it does not preserve the mathematical value); the docs update is happening in #130229.

jwong101 · 2024-09-17T19:12:03Z

Where does the mul nuw come from? As far as I can tell by looking at our codegen backend, our inbounds_gep does not care whether the RHS is signed or unsigned

Sorry for the confusion. I was just using mul nuw to showcase how Miri + the constant interpreter were reporting unsigned multiplication overflow as UB.

So under current codegen, add(x) truly is equivalent to offset(x as isize), right? Just Miri and const-eval do not treat it equivalently?

Yeah, we just use inbounds atm, which only requires mul nsw and that the addition of the signed offset doesn't wrap the address space.

And we want codegen to treat it not equivalently, but that has not happened yet?

Yeah, I was curious on if it was legal for us to use the new nuw attribute, which was why I checked the docs + what Miri did.

jwong101 · 2024-09-17T19:18:25Z

That code was only ever tested for isize counts, but it happened to do "something" for usize that doesn't explode immediately, but ended up with somewhat odd UB checks.

Yeah, I was initially confused when I saw that add was allowed to go backwards, because I could've sworn that I saw Miri reporting that as UB awhile ago.

Though now that I think about it, I was probably just using ptr::add at the start of an object, which is UB out of the "inbounds of an allocated object" requirement.

Rollup merge of rust-lang#130229 - RalfJung:ptr-offset-unsigned, r=scottmcm ptr::add/sub: do not claim equivalence with `offset(c as isize)` In rust-lang#110837, the `offset` intrinsic got changed to also allow a `usize` offset parameter. The intention is that this will do an unsigned multiplication with the size, and we have UB if that overflows -- and we also have UB if the result is larger than `usize::MAX`, i.e., if a subsequent cast to `isize` would wrap. ~~The LLVM backend sets some attributes accordingly.~~ This updates the docs for `add`/`sub` to match that intent, in preparation for adjusting codegen to exploit this UB. We use this opportunity to clarify what the exact requirements are: we compute the offset using mathematical multiplication (so it's no problem to have an `isize * usize` multiplication, we just multiply integers), and the result must fit in an `isize`. Cc `@rust-lang/opsem` `@nikic` rust-lang#130239 updates Miri to detect this UB. `sub` still has some cases of UB not reflected in the underlying intrinsic semantics (and Miri does not catch): when we subtract `usize::MAX`, then after casting to `isize` that's just `-1` so we end up adding one unit without noticing any UB, but actually the offset we gave does not fit in an `isize`. Miri will currently still not complain for such cases: ```rust fn main() { let x = &[0i32; 2]; let x = x.as_ptr(); // This should be UB, we are subtracting way too much. unsafe { x.sub(usize::MAX).read() }; } ``` However, the LLVM IR we generate here also is UB-free. This is "just" library UB but not language UB. Cc `@saethlin;` might be worth adding precondition checks against overflow on `offset`/`add`/`sub`? Fixes rust-lang#130211

jwong101 added the A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools label Sep 10, 2024

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 10, 2024

RalfJung mentioned this issue Sep 11, 2024

ptr::add/sub: do not claim equivalence with offset(c as isize) #130229

Merged

bors closed this as completed in 97cdc8e Oct 1, 2024

apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pointer addition semantics are unclear #130211

pointer addition semantics are unclear #130211

jwong101 commented Sep 10, 2024 •

edited

Loading

lukas-code commented Sep 11, 2024

nikic commented Sep 11, 2024 •

edited

Loading

RalfJung commented Sep 11, 2024 via email •

edited

Loading

RalfJung commented Sep 11, 2024 •

edited

Loading

RalfJung commented Sep 17, 2024 •

edited

Loading

lukas-code commented Sep 17, 2024

RalfJung commented Sep 17, 2024

RalfJung commented Sep 17, 2024

jwong101 commented Sep 17, 2024

jwong101 commented Sep 17, 2024

pointer addition semantics are unclear #130211

pointer addition semantics are unclear #130211

Comments

jwong101 commented Sep 10, 2024 • edited Loading

Location

Summary

lukas-code commented Sep 11, 2024

nikic commented Sep 11, 2024 • edited Loading

RalfJung commented Sep 11, 2024 via email • edited Loading

RalfJung commented Sep 11, 2024 • edited Loading

RalfJung commented Sep 17, 2024 • edited Loading

lukas-code commented Sep 17, 2024

RalfJung commented Sep 17, 2024

RalfJung commented Sep 17, 2024

jwong101 commented Sep 17, 2024

jwong101 commented Sep 17, 2024

jwong101 commented Sep 10, 2024 •

edited

Loading

nikic commented Sep 11, 2024 •

edited

Loading

RalfJung commented Sep 11, 2024 via email •

edited

Loading

RalfJung commented Sep 11, 2024 •

edited

Loading

RalfJung commented Sep 17, 2024 •

edited

Loading