Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The compiler looses vector capacity information #82801

Open
MSxDOS opened this issue Mar 5, 2021 · 11 comments
Open

The compiler looses vector capacity information #82801

MSxDOS opened this issue Mar 5, 2021 · 11 comments
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@MSxDOS
Copy link

MSxDOS commented Mar 5, 2021

In this example:

pub fn test() -> Vec<u32> {
    let mut vec = Vec::with_capacity(3);
    let mut i = 0;
    while i < 3 {
        vec.push(32);
        i += 1;
    }
    vec
}

the compiler produces sub-optimal assembly.
https://rust.godbolt.org/z/1T3Knodn8

@nikic
Copy link
Contributor

nikic commented Mar 5, 2021

Don't know what role the MIR opts play here, but on the LLVM side this looks like a phase ordering issue. The final IR has:

  %10 = bitcast i64* %4 to <2 x i64>*
  store <2 x i64> <i64 3, i64 2>, <2 x i64>* %10, align 8
  %_6.idx.val.i.2 = load i64, i64* %4, align 8
  %_3.i.2 = icmp eq i64 %_6.idx.val.i.2, 2

The load can be load store forwarded to 3 and everything else folds from there. A sequence of -gvn -simplifycfg -sroa does it.

@nikic nikic self-assigned this Mar 5, 2021
@nikic
Copy link
Contributor

nikic commented Mar 6, 2021

Took a while to root cause this, but should be fixed by llvm/llvm-project@3fedaf2. The issue was that GVN did not fully fold the function because available loads were materialized suboptimally, leading to unnecessary poisoning of MDA cache entries.

https://reviews.llvm.org/D98114 would also fix fix this case, though it's arguably more of a workaround.

@nikic
Copy link
Contributor

nikic commented Aug 22, 2021

I'm not completely sure what I'm supposed to be looking for here, but I think this has not been fixed by the LLVM 13 upgrade. It looks like the IR substantially changed in 1.53 and the goal now is presumably to eliminate the branch that contains the do_reserve_and_handle call.

@MSxDOS
Copy link
Author

MSxDOS commented Aug 22, 2021

It wasn't fixed by LLVM 13, tested locally as the compiler on godbolt is outdated.

It fails to see that self.len == self.buf.capacity() inside push can't be true in the first place.
Adding

if vec.len() == vec.capacity() {
    unsafe { std::hint::unreachable_unchecked() }
}

in the beginning of the loop removes the reserve branch entirely.

@MSxDOS MSxDOS changed the title The compiler looses vector capacity information when MIR optimizations are enabled The compiler looses vector capacity information Aug 22, 2021
@nikic
Copy link
Contributor

nikic commented Aug 22, 2021

Principally, a combination of loop load PRE and SCCP should be able to handle this. However, loop load PRE will currently fail for two reasons: First, the clobbering instruction (do_reserve_and_handle) is an invoke terminator, which would need edge splitting support in loop load PRE. Second, we don't have a guarantee that the call won't free the vec. The second issue is the more problematic one.

@MSxDOS
Copy link
Author

MSxDOS commented Dec 3, 2021

The example now produces good assembly if the loop gets unrolled (possibly improved by #91352), no change otherwise.

@rustbot rustbot added A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 3, 2021
@nikic nikic removed their assignment Apr 3, 2023
@SUPERCILEX
Copy link
Contributor

Probably going to be fixed by #114370

@kornelski
Copy link
Contributor

A very similar case generates 3 separate calls to reserve_for_push:

pub fn test(vec: &mut Vec<u32>) {
    vec.reserve(3);
    let mut i = 0;
    while i < 3 {
        vec.push(32);
        i += 1;
    }
}

https://rust.godbolt.org/z/cr7zf6vqq

@krtab
Copy link
Contributor

krtab commented Oct 17, 2023

Unfortunately, I don't think that neither this issue nor #105156 will be fixed by either:

The issue is actually pretty hard to fix. One needs the compiler to deduce that given an assume that after the reserve the, capacity - len >= 3 (easy), there is at each point capacity > len.

For now, I haven't even got LLVM to deduce that capacity > len for the first push (which should come from the fact that 3 > 0 and no overflow occurs).

The diff I have tried variations of:

diff --git a/library/alloc/src/vec/mod.rs b/library/alloc/src/vec/mod.rs
index 35015238e6e..acf18bb6dc9 100644
--- a/library/alloc/src/vec/mod.rs
+++ b/library/alloc/src/vec/mod.rs
@@ -907,6 +907,10 @@ pub fn capacity(&self) -> usize {
     #[stable(feature = "rust1", since = "1.0.0")]
     pub fn reserve(&mut self, additional: usize) {
         self.buf.reserve(self.len, additional);
+        unsafe {
+            core::intrinsics::assume(self.buf.capacity() >= self.len.unchecked_add(additional));
+            // core::intrinsics::assume(additional == 0 || self.buf.capacity() > self.len);
+        }
     }

     /// Reserves the minimum capacity for at least `additional` more elements to
@@ -1878,6 +1882,7 @@ fn drop(&mut self) {
     pub fn push(&mut self, value: T) {
         // This will panic or abort if we would allocate > isize::MAX bytes
         // or if the length increment would overflow for zero-sized types.
+        let old_delta = unsafe {self.buf.capacity().unchecked_sub(self.len)};
         if self.len == self.buf.capacity() {
             self.buf.reserve_for_push(self.len);
         }
@@ -1885,6 +1890,8 @@ pub fn push(&mut self, value: T) {
             let end = self.as_mut_ptr().add(self.len);
             ptr::write(end, value);
             self.len += 1;
+            let new_delta = self.buf.capacity().unchecked_sub(self.len);
+            core::intrinsics::assume(old_delta == 0 || old_delta == new_delta.unchecked_add(1));
         }
     }

Commenting the commented assert allow to remove the first push but not subsequent ones, so I'm not sure it is really worth a PR...

bors added a commit to rust-lang-ci/rust that referenced this issue Oct 21, 2023
Hint optimizer about reserved capacity

The fact that `Vec` had capacity increased is not known to the optimizer, because functions like `do_reserve_and_handle` are not inlined. (rust-lang#82801) Because of that, code such as:

```rust
vec.try_reserve(123)?;
vec.extend_from_slice(&s[..123]);
```

Tries to reserve the capacity **twice**. This is unnecessary code bloat.

https://rust.godbolt.org/z/YWY16Ezej

Adding a hint after reserve optimizes out the next check of `self.needs_to_grow(len, additional)`.
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 6, 2023
Hint optimizer about reserved capacity

The fact that `Vec` had capacity increased is not known to the optimizer, because functions like `do_reserve_and_handle` are not inlined. (rust-lang#82801) Because of that, code such as:

```rust
vec.try_reserve(123)?;
vec.extend_from_slice(&s[..123]);
```

Tries to reserve the capacity **twice**. This is unnecessary code bloat.

https://rust.godbolt.org/z/YWY16Ezej

Adding a hint after reserve optimizes out the next check of `self.needs_to_grow(len, additional)`.
@krtab
Copy link
Contributor

krtab commented Dec 15, 2023

llvm/llvm-project#75524 has been merged today.

It makes it so that LLVM understands that "x + y <= z" implies "x <= z" when y is non-negative. In our case it means that "len + additional <= capacity" implies "len <= capacity".

Once rustc's LLVM is updated to take it into account this is likely to change what rustc can optimize or not regarding this issue.

@c410-f3r
Copy link
Contributor

Any news?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants