Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unlikely intrinsic does not work #88767

Closed
stepancheg opened this issue Sep 8, 2021 · 13 comments
Closed

unlikely intrinsic does not work #88767

stepancheg opened this issue Sep 8, 2021 · 13 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@stepancheg
Copy link
Contributor

stepancheg commented Sep 8, 2021

Code:

#![feature(core_intrinsics)]

use std::intrinsics::unlikely;

extern "C" {
    fn aaa() -> u32;
    fn bbb() -> u32;
}

pub unsafe fn foo(a: bool) -> u32 {
    if unlikely(a) { aaa() } else { bbb() }
}

Compiler explorer

rustc 1.57.0-nightly (fdf6505 2021-09-07)

Generates assembly:

example::foo:
        test    edi, edi
        je      .LBB0_1
        jmp     qword ptr [rip + aaa@GOTPCREL]
.LBB0_1:
        jmp     qword ptr [rip + bbb@GOTPCREL]

If unlikely worked, call to aaa would be after jump (it works if aaa is marked #[cold] for example).

Unlikely is present in Rust MIR:

fn foo(_1: bool) -> u32 {
    debug a => _1;                       // in scope 0 at /app/example.rs:10:19: 10:20
    let mut _0: u32;                     // return place in scope 0 at /app/example.rs:10:31: 10:34
    let mut _2: bool;                    // in scope 0 at /app/example.rs:11:8: 11:19
    let mut _3: bool;                    // in scope 0 at /app/example.rs:11:17: 11:18

    bb0: {
        _3 = _1;                         // scope 0 at /app/example.rs:11:17: 11:18
        _2 = unlikely(move _3) -> bb1;   // scope 0 at /app/example.rs:11:8: 11:19
                                         // mir::Constant
                                         // + span: /app/example.rs:11:8: 11:16
                                         // + literal: Const { ty: extern "rust-intrinsic" fn(bool) -> bool {std::intrinsics::unlikely}, val: Value(Scalar(<ZST>)) }
    }

    bb1: {
        switchInt(move _2) -> [false: bb3, otherwise: bb2]; // scope 0 at /app/example.rs:11:8: 11:19
    }

    bb2: {
        _0 = aaa() -> bb4;               // scope 0 at /app/example.rs:11:22: 11:27
                                         // mir::Constant
                                         // + span: /app/example.rs:11:22: 11:25
                                         // + literal: Const { ty: unsafe extern "C" fn() -> u32 {aaa}, val: Value(Scalar(<ZST>)) }
    }

    bb3: {
        _0 = bbb() -> bb4;               // scope 0 at /app/example.rs:11:37: 11:42
                                         // mir::Constant
                                         // + span: /app/example.rs:11:37: 11:40
                                         // + literal: Const { ty: unsafe extern "C" fn() -> u32 {bbb}, val: Value(Scalar(<ZST>)) }
    }

    bb4: {
        return;                          // scope 0 at /app/example.rs:12:2: 12:2
    }
}

but it is lost in LLVM IR:

define i32 @_ZN7example3foo17hdda9a45d5ad75400E(i1 zeroext %a) unnamed_addr #0 !dbg !6 {
start:
  br i1 %a, label %bb2, label %bb3, !dbg !10

bb3:                                              ; preds = %start
  %0 = tail call i32 @bbb(), !dbg !11
  br label %bb4, !dbg !11

bb2:                                              ; preds = %start
  %1 = tail call i32 @aaa(), !dbg !12
  br label %bb4, !dbg !12

bb4:                                              ; preds = %bb3, %bb2
  %.0 = phi i32 [ %1, %bb2 ], [ %0, %bb3 ], !dbg !13
  ret i32 %.0, !dbg !14
}
@nagisa
Copy link
Member

nagisa commented Sep 9, 2021

Rust generates the call to expect so what's happening here is that some LLVM optimization pass simply considers it to be profitable to remove the call.

@nagisa nagisa added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Sep 9, 2021
@nikic
Copy link
Contributor

nikic commented Sep 9, 2021

This probably got broken by https://reviews.llvm.org/D100213. Expect intrinsics are now lowered before SROA.

@nikic
Copy link
Contributor

nikic commented Apr 21, 2022

The probably most reliable way to fix this would be to not use llvm.expect at all, and directly emit !prof metadata in rustc.

@nikic nikic added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Apr 21, 2022
@Kobzol
Copy link
Contributor

Kobzol commented Apr 21, 2022

Just to be clear, you mean this? So instead of using llvm.expect on the if condition value, we would generate the branch weight metadata directly on the br instruction, do I understand it correctly?

@nikic
Copy link
Contributor

nikic commented Apr 21, 2022

Yes, that's right. This is something we'd definitely need for the more elaborate attribute-based approached discussed in #26179, so it would probably make sense to switch to this approach for the intrinsic as well.

@RalfJung
Copy link
Member

This doesn't really work for the intrinsic, since the intrinsic can be used anywhere, not just in an if.

It would be somewhat disconcerning if the intrinsic somehow "pattern-matched on the surrounding code" to do its thing.

@Kobzol
Copy link
Contributor

Kobzol commented Apr 21, 2022

I think that the plan in #26179 is to migrate to an attribute (e.g. #[likely]), which would mostly be used on conditions and stuff. But I also suppose that we can keep both the likely/unlikely functions and also generate the metadata on branches if likely/unlikely is used directly inside an if (for example).

@nikic
Copy link
Contributor

nikic commented Apr 21, 2022

@RalfJung The intrinsic already only works when used directly in an if (well, I should say "used to work", currently it just doesn't work at all). llvm.expect is lowered and discarded during very early optimization, so it doesn't support any non-trivial patterns (like using it inside a function that gets inlined etc).

@nikic
Copy link
Contributor

nikic commented Apr 21, 2022

Basically, this intrinsic is already "pattern-matched on the surrounding code". But this pattern matching currently happens in LLVM, and is geared towards the IR emitted by Clang rather than rustc. Moving that pattern matching into rustc would make it more robust/predictable.

@RalfJung
Copy link
Member

Fair. In the end it's semantically a no-op so I don't care quite as much, but attributes seem like the much better design for this sort of thing so I am glad to hear that that's what we are moving towards. :)

@SUPERCILEX
Copy link
Contributor

llvm.expect is lowered and discarded during very early optimization, so it doesn't support any non-trivial patterns (like using it inside a function that gets inlined etc).

Is that an llvm bug? I guess I'm trying to understand how this broke and what the next steps are to fix the intrinsics. This bug and the new attributes should be considered separately.

@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
@MSxDOS
Copy link

MSxDOS commented Aug 28, 2023

Any progress ?

bors added a commit to rust-lang-ci/rust that referenced this issue Jan 22, 2024
#[cold] on match arms

### Edit

This should be in T-lang. I'm not sure how I can change it.

There is discussion: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Allow.20.23.5Bcold.5D.20on.20match.20and.20if.20arms

### Summary

Adds the possibility to use `#[cold]` attribute on match arms to hint the optimizer that the arm is unlikely to be taken.

### Motivation

These hints are sometimes thought to help branch prediction, but the effect is probably marginal. Modern CPUs don't support hints on conditional branch instructions. They either have the current branch in the BTB (branch prediction buffer), or not, in which case the branch is predicted not to be taken.

These hints are, however, helpful in letting the compiler know what is the fast path, so it can be optimized at the expense of the slow path.

`grep`-ing the LLVM code for BlockFrequencyInfo and BranchProbabilityInfo shows that these hints are used at many places in the optimizer. Such as:
- block placement - improve locality by making the fast path compact and move everything else out of the way
- inlining, loop unrolling - these optimizations can be less aggressive on the cold path therefore reducing code size
- register allocation - preferably keep in registers the data needed on the fast path

### History

RFC 1131 ( rust-lang#26179 ) added `likely` and `unlikely` intrinsics, which get converted to `llvm.expect.i8`. However this LLVM instruction is fragile and may get removed by some optimization passes. The problems with the intrinsics have been reported several times: rust-lang#96276 , rust-lang#96275 , rust-lang#88767

### Other languages

Clang and GCC C++ compilers provide `__builtin_expect`. Since C++20, it is also possible to use `[[likely]]` and `[[unlikely]]` attributes.

Use:
```
if (__builtin_expect(condition, false)) { ... this branch is UNlikely ... }

if (condition) [[likely]] { ... this branch is likely... }
```

Note that while clang provides `__builtin_expect`, it does not convert it to `llvm.expect.i8`. Instead, it looks at the surrounding code and if there is a condition, emits branch weight metadata for conditional branches.

### Design

Implementing `likely`/`unlikely` type of functions properly to emit branch weights would add significant complexity to the compiler. Additionally, these functions are not easy to use with `match` arms.

Replacing the functions with attributes is easier to implement and will also work with `match`.

A question remains whether these attributes should be named `likely`/`unlikely` as in C++, or if we could reuse the already existing `#[cold]` attribute. `#[cold]` has the same meaning as `unlikely`, i.e., marking the slow path, but it can currently only be used on entire functions.

I personally prefer `#[cold]` because it already exists in Rust and is a short word that looks better in code. It has one disadvantage though.
This code:
```
if cond #[likely] { ... }
```
becomes:
```
if cond { ... } #[cold] { ... empty cold branch ... }
```

In this PR, I implemented the possibility to add `#[cold]` attribute on match arms. Use is as follows:
```
match x {
    #[cold] true => { ... } // the true arm is UNlikely
    _ => { ... } // the false arm is likely
}
```

### Limitations
The implementation only works on bool, or integers with single value arm and an otherwise arm. Extending it to other types and to `if` statements should not be too difficult.
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 17, 2024
Likely unlikely fix

RFC 1131 ( rust-lang#26179 ) added likely/unlikely intrinsics, but they have been broken for a while: rust-lang#96276 , rust-lang#96275 , rust-lang#88767 . This PR tries to fix them.

Changes:
- added a new `cold_path()` intrinsic
- `likely()` and `unlikely()` changed to regular functions implemented using `cold_path()`
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Nov 18, 2024
Likely unlikely fix

RFC 1131 ( rust-lang/rust#26179 ) added likely/unlikely intrinsics, but they have been broken for a while: rust-lang/rust#96276 , rust-lang/rust#96275 , rust-lang/rust#88767 . This PR tries to fix them.

Changes:
- added a new `cold_path()` intrinsic
- `likely()` and `unlikely()` changed to regular functions implemented using `cold_path()`
@Amanieu
Copy link
Member

Amanieu commented Nov 23, 2024

On nightly this now works correctly. You can verify this by checking the example in the top comment on godbolt: switching between likely and unlikely with invert the order of the calls.

@Amanieu Amanieu closed this as completed Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants