major performance regression between Rust 1.50 and beta when using target-cpu=native #83027

BurntSushi · 2021-03-11T19:25:13Z

I'll just start with some reproduction steps that I'm hoping someone else will be able to reproduce. This assumes you've compiled ripgrep with Rust 1.50 to a binary named rg-stable_1.50 and also compiled ripgrep with Rust nightly 2021-03-09 to a binary named rg-nightly_2021-03-09 (alternatively, compile with the beta release, as I've reproduced the problem there in a subsequent comment):

$ curl -LO 'https://burntsushi.net/stuff/subtitles2016-sample.en.gz'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  265M  100  265M    0     0  32.1M      0  0:00:08  0:00:08 --:--:-- 33.4M

$ gunzip subtitles2016-sample.en.gz

$ time rg-stable_1.50 -c --no-mmap -a '[a-z]' subtitles2016-sample.en
31813587

real    1.601
user    1.467
sys     0.133
maxmem  7 MB
faults  0

$ time rg-nightly_2021-03-09 -c --no-mmap -a '[a-z]' subtitles2016-sample.en
31813587

real    3.973
user    3.837
sys     0.133
maxmem  7 MB
faults  0

Here is the relevant part of the profile I extracted by running the ripgrep compiled with nightly under perf:

The key difference between Rust nightly and stable is the fact that it looks like i8x32::new isn't being inlined. But it's not the only one. There are other functions showing up in the profile, like core::core_arch::x86::m256iExt::as_i32x8, that aren't being inlined either. These are trivial cast functions, and them not being inlined is likely a bug. (So an alternative title for this issue might be, "some trivial functions aren't getting inlined in hot code paths." But I figured I'd start with the actual problem I'm seeing in case my analysis is wrong.)

Initially I assumed that maybe something had changed in stdarch recently related to these code paths, but I don't see anything. So I'm a bit worried that perhaps something else changed that impacted inlining decisions, and this is an indirect effect. Alas, I'm stuck at this point and would love some help getting to the bottom of it.

It's possible, perhaps even likely, that this is related to #60637. I note that it is used to justify some inline(always) annotations, but fn new is left at just #[inline].

Perhaps there is a quick fix where we need to go over some of the lower level SIMD routines and make sure they're tagged with inline(always). But really, it seems to me like these functions really should be inlined automatically. I note that this doesn't look like a cross crate problem that might typically be a reason for preventing inlining. In particular, _mm256_setr_epi8 is being inlined (as one would expect), but the call to i8x32 in its implementation is the thing not being inlined. So this seems pretty suspicious to me.

Apologies for not narrowing this down more. A good next step might be to find the specific version of nightly that introduced this problem.

The text was updated successfully, but these errors were encountered:

BurntSushi · 2021-03-11T19:27:31Z

Oh, also, I did try to find a smaller reproduction. Since the regression is ultimately rooted in the SIMD implementation found in the memchr crate, I tried compiling this program with stable vs Rust nightly:

use memchr::memchr;

fn main() {
    let haystack = "abcdefghijklmnopqrstuvwxyz".repeat(15);

    for _ in 0..100_000_000 {
        assert_eq!(None, memchr(b'@', haystack.as_bytes()));
    }
}

But both versions of the program inlined all the routines I would expect.

Mark-Simulacrum · 2021-03-11T19:28:56Z

Can you check if beta (1.51) reproduces this regression? My immediate guess is that it's caused by the LLVM 12 upgrade, which landed in #81451. cc @rust-lang/wg-llvm

BurntSushi · 2021-03-11T19:33:06Z

Yes, I am able to reproduce on beta too:

$ time rg-beta_1.51 -c --no-mmap -a '[a-z]' subtitles2016-sample.en
31813587

real    3.921
user    3.802
sys     0.117
maxmem  7 MB
faults  0

camelid · 2021-03-11T19:43:44Z

Needs MCVE because OP said #83027 (comment) did not reproduce the bug.

tmiasko · 2021-03-12T15:26:43Z

Could you describe full reproduction steps, including any custom options and features used when building ripgrep? Do you use target-cpu=native? What is your CPU, as shown by rustc --print target-cpus? Changes from #80749 could also be relevant.

I couldn't reproduce the issue.

spastorino · 2021-03-12T15:32:17Z

Also, are you compiling ripgrep master branch or something else?

BurntSushi · 2021-03-12T16:23:19Z

Ah!!! Thank you so much for mentioning RUSTFLAGS. My script for compiling ripgrep does indeed have target-cpu=native set. The TL;DR from below is that this appears necessary in order to witness the regression. (IMO, this makes this issue a bit lower in priority since compiling with target-cpu=native is a bit more rare.) The really good news is that I was able to come up with a much smaller reproduction. Although, not quite minimal. Read on.

Some preliminaries for checking my environment:

$ uname -a
Linux frink 5.11.4-arch1-1 #1 SMP PREEMPT Sun, 07 Mar 2021 18:00:49 +0000 x86_64 GNU/Linux
$ lscpu | rg Model
Model:                           79
Model name:                      Intel(R) Core(TM) i7-6900K CPU @ 3.20GHz
$ rustc --print target-cpus | rg native
    native         - Select the CPU of the current host (currently broadwell).
$ rustc +stable --version
rustc 1.50.0 (cb75ad5db 2021-02-10)
$ rustc +beta --version
rustc 1.51.0-beta.4 (4d25f4607 2021-03-05)
$ cd /tmp
$ git clone https://github.com/BurntSushi/ripgrep
$ cd ripgrep
$ git rev-parse HEAD
c7730d1f3a366e42fdd497a1e0db4bf090de415c

Compile four different binaries. stable, stable + target-cpu=native, beta and beta + target-cpu=native. Only beta+native has the performance regression.

$ cargo clean && cargo +stable build --release && cp ./target/release/rg ./rg-stable
$ cargo clean && RUSTFLAGS="-C target-cpu=native" cargo +stable build --release && cp ./target/release/rg ./rg-stable-native
$ cargo clean && cargo +beta build --release && cp ./target/release/rg ./rg-beta
$ cargo clean && RUSTFLAGS="-C target-cpu=native" cargo +beta build --release && cp ./target/release/rg ./rg-beta-native

And to show that only beta+native has the issue (the curl command for getting the subtitles is in my OP):

$ time ./rg-stable -c --no-mmap -a '[a-z]' /tmp/subtitles2016-sample.en
31813587

real    1.477
user    1.352
sys     0.123
maxmem  7 MB
faults  0

$ time ./rg-stable-native -c --no-mmap -a '[a-z]' /tmp/subtitles2016-sample.en
31813587

real    1.568
user    1.417
sys     0.150
maxmem  7 MB
faults  0

$ time ./rg-beta -c --no-mmap -a '[a-z]' /tmp/subtitles2016-sample.en
31813587

real    1.557
user    1.416
sys     0.140
maxmem  7 MB
faults  0

$ time ./rg-beta-native -c --no-mmap -a '[a-z]' /tmp/subtitles2016-sample.en
31813587

real    3.916
user    3.807
sys     0.107
maxmem  7 MB
faults  0

So given the new focus on target-cpu=native, I tried my smaller program above that tried to reproduce this with the memchr crate directly, and that did work:

$ cat Cargo.toml
[package]
name = "memchr-perf-regression"
version = "0.1.0"
authors = ["Andrew Gallant <[email protected]>"]
edition = "2018"

[dependencies]
memchr = "2"

$ cat src/main.rs
use memchr::memchr;

fn main() {
    let haystack = "abcdefghijklmnopqrstuvwxyz".repeat(15);

    for _ in 0..100_000_000 {
        assert_eq!(None, memchr(b'@', haystack.as_bytes()));
    }
}

Now compile two binaries: one with beta and one with beta and target-cpu=native:

$ cargo +beta build --release && cp target/release/memchr-perf-regression ./regress-beta
$ RUSTFLAGS="-C target-cpu=native" cargo +beta build --release && cp target/release/memchr-perf-regression ./regress-beta-native

And now run them:

$ time ./regress-beta

real    0.676
user    0.672
sys     0.003
maxmem  7 MB
faults  0

$ time ./regress-beta-native

real    12.773
user    12.768
sys     0.000
maxmem  7 MB
faults  0

I've run perf on the latter command and attached a screenshot of the results. As with the bigger ripgrep example, neither core::core_arch::x86::m256iExt::as_i32x8 nor core::core_arch::simd::i8x32::new are inlined:

nagisa · 2021-03-12T16:48:09Z

What is your native? Does it work if you specify that explicitly? I can guess that your regression might be caused because it is unsound to inline between functions that use different feature sets, and libstd/core will be using generic cpu when they are compiled in CI.

Possible cause: #80749

Does this go away with -Zbuild-std?

BurntSushi · 2021-03-12T17:00:12Z

@nagisa Broadwell:

$ lscpu | rg Model
Model:                           79
Model name:                      Intel(R) Core(TM) i7-6900K CPU @ 3.20GHz
$ rustc --print target-cpus | rg native
    native         - Select the CPU of the current host (currently broadwell).

I can guess that your regression might be caused because it is unsound to inline between functions that use different feature sets, and libstd/core will be using generic cpu when they are compiled in CI.

What is the best way to fix it?

Does this go away with -Zbuild-std?

I've never tried using -Zbuild-std before, so I'm not sure if I'm doing something wrong, but it doesn't seem to work:

$ RUSTFLAGS="-C target-cpu=native -Zbuild-std" cargo +nightly build --release && cp target/release/memchr-perf-regression ./regress-nightly_2021-03-10-native-buildstd
error: failed to run `rustc` to learn about target-specific information

Caused by:
  process didn't exit successfully: `rustc - --crate-name ___ --print=file-names -C target-cpu=native -Zbuild-std --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=cfg` (exit code: 1)
  --- stderr
  error: unknown debugging option: `build-std`

nagisa · 2021-03-12T17:07:31Z

-Zbuild-std is a cargo flag. Documentation available here.

What is the best way to fix it?

Perhaps using a -Ctarget-cpu=broadwell helps? If my hypothesis of the cause is correct, I'd say that this is something that people working on SIMD support need to figure out how to support ergonomically (maybe by adding #[inline(always)] to everything? not clear to me if that'd be sound, but its the only thing I'm coming up with on the spot)

BurntSushi · 2021-03-12T17:21:22Z

Ah thanks for the link. I ran this:

$ RUSTFLAGS="-C target-cpu=native" cargo +nightly build -Zbuild-std --target x86_64-unknown-linux-gnu --release && cp target/x86_64-unknown-linux-gnu/release/memchr-perf-regression ./regress-nightly_2021-03-10-native-buildstd

But the regression remains:

$ time ./regress-nightly_2021-03-10

real    0.724
user    0.720
sys     0.003
maxmem  7 MB
faults  0

$ time ./regress-nightly_2021-03-10-native

real    12.158
user    12.150
sys     0.003
maxmem  7 MB
faults  0

$ time ./regress-nightly_2021-03-10-native-buildstd

real    12.271
user    12.263
sys     0.003
maxmem  7 MB
faults  0

Perhaps using a -Ctarget-cpu=broadwell helps?

I guess it helps in the strictest sense that it doesn't have the performance regression:

$ RUSTFLAGS="-C target-cpu=broadwell" cargo +nightly build --release && cp target/release/memchr-perf-regression ./regress-nightly_2021-03-10-broadwell
   Compiling memchr v2.3.4
   Compiling memchr-perf-regression v0.1.0 (/tmp/memchr-perf-regression)
    Finished release [optimized] target(s) in 1.11s

$ time ./regress-nightly_2021-03-10-broadwell

real    0.767
user    0.763
sys     0.003
maxmem  7 MB
faults  0

But I think what I meant was, "how do we not get a performance regression when using target-cpu=native"?

I'd say that this is something that people working on SIMD support need to figure out how to support ergonomically (maybe by adding #[inline(always)] to everything? not clear to me if that'd be sound, but its the only thing I'm coming up with on the spot)

Hmmm... Okay. cc @Amanieu

Is there a more succinct/higher-level description of why #80749 is possibly the cause here? I guess what I mean to say is, what changed that stopped the inlining from happening here?

nagisa · 2021-03-12T18:03:47Z

Is there a more succinct/higher-level description of why #80749 is possibly the cause here? I guess what I mean to say is, what changed that stopped the inlining from happening here?

I'm happy to try and explain it here; I don't recall there being a good description of this elsewhere:

It is not valid for a function to be inlined into another if the feature sets differ between them. On x86_64 in particular this is exemplified by potentially differing ABIs and registers when a feature is available and when it isn't. As the features are tracked at a per-function level, LLVM is forced to disable inlining of such differing functions so that their features don't get lost. The linked PR specifies an exact list of features that shall be applied to all functions that don't specify anything otherwise, so I suspect conflicts in memchr code occur quite naturally when there's interaction between SIMD and regular code.

With that in mind I would've expected -Zbuild-std to help with this, because that way all the code in the binary is compiled with the same feature-set everywhere (again), but it seems like there's still something missing in the equation here and a more minimal example would still be very helpful.

lqd · 2021-03-12T19:28:53Z

@BurntSushi I can reproduce on skylake, including with the following:

#[cfg(target_arch = "x86")]
use std::arch::x86::*;

#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;

use std::intrinsics::transmute;

fn main() {
    #[target_feature(enable = "avx2")]
    unsafe fn test() {
        let a = _mm256_set_epi32(1, 1, 1, 1, 1, 1, 1, 1);
        let b = _mm256_set_epi32(2, 2, 2, 2, 2, 2, 2, 2);
        
        let e = _mm256_set_epi32(3, 3, 3, 3, 3, 3, 3, 3);
        let r = _mm256_add_epi32(a, b);

        assert_eq_m256i(e, r);
    }

    if is_x86_feature_detected!("avx2") {
        unsafe { test() }
    } else {
        panic!("avx2 feature not detected");
    }
}

#[target_feature(enable = "avx")]
pub unsafe fn assert_eq_m256i(a: __m256i, b: __m256i) {
    assert_eq!(transmute::<_, [u64; 4]>(a), transmute::<_, [u64; 4]>(b))
}

Building without -C target-cpu or with -C target-cpu=skylake will inline the as_i32x8 functions, but not with -C target-cpu=native for me:

$ objdump -d regress| grep as_i32x8
$ objdump -d regress-skylake| grep as_i32x8
$ objdump -d regress-native| grep as_i32x8
0000000000006960 <_ZN4core9core_arch3x868m256iExt8as_i32x817h6f0a02a3bdc3d3e7E>:
    6ade:       e8 7d fe ff ff          callq  6960 <_ZN4core9core_arch3x868m256iExt8as_i32x817h6f0a02a3bdc3d3e7E>
    6b0a:       e8 51 fe ff ff          callq  6960 <_ZN4core9core_arch3x868m256iExt8as_i32x817h6f0a02a3bdc3d3e7E>

BurntSushi · 2021-03-12T20:11:58Z

@lqd Thanks! Hopefully that helps dig into this a bit more.

@nagisa

It is not valid for a function to be inlined into another if the feature sets differ between them.

So just to be super precise, did you mean "differ" literally? As in, if I have a function compiled with just the sse2 feature but the caller is compiled with sse2,avx, then I would assume that said function could be inlined even though the feature sets are technically distinct.

I'm assuming that you mean, "if the caller's feature set is not a superset of the function, then the function cannot be inlined." If that assumption is wrong, then I think my mental model is broken.

The linked PR specifies an exact list of features that shall be applied to all functions that don't specify anything otherwise, so I suspect conflicts in memchr code occur quite naturally when there's interaction between SIMD and regular code.

Hmmm okay. So let me try to play this back to you in my own words to make sure I grok this. So let's pick a function that isn't getting inlined, say, as_i32x8. It has no target_feature attribute and is only #[inline]. Since it's part of std, it's compiled with the lowest common denominator on x86_64, so its actual features when I compile the repro above don't include avx. And that prevents inlining? But it seems like it should be allowed to be inlined because the calling code is a superset?

I think the key here is that functions like as_i32x8 are being used as a sort of internal platform independent vector type that isn't necessarily tied to AVX. (Although it has been a while since I've touched stdarch.) So they end up getting used in the implementation of AVX specific intrinsics, and we generally expect them to get inlined.

So I guess what I don't quite grok is what it is about target-cpu=native specifically that is preventing inlining here where as other settings work. And yeah, I also don't understand why build-std doesn't fix this, which I think makes me at least as confused as you. (But very likely more so.)

I think your point above about these sorts of functions being tagged with inline(always) as being unsound also, unfortunately, sounds right to me. Unless rustc can guarantee that a function tagged inline(always) won't get inlined into calling code that can't handle that function's ABI. I had thought rustc had some logic to handle that case that maybe @alexcrichton added, but my memory is super hazy. But in this case, it seems like inlining is safe and okay here, so that's why I'm confused.

alexcrichton · 2021-03-12T21:21:20Z

LLVM should inline based on subsets, not exact matches. If it's not then that's a bug.

I can't reproduce with -Ctarget-cpu=native myself unfortunately. @BurntSushi can you gist the LLVM IR for this file when the inlining doesn't happen? That will help illuminate why LLVM isn't inlining

#[cfg(target_arch = "x86")]
use std::arch::x86::*;

#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;

extern "C" {
    fn black_box(a: *const u8);
}

pub fn foo() {
    #[target_feature(enable = "avx2")]
    unsafe fn test() {
        let a = _mm256_set_epi32(1, 1, 1, 1, 1, 1, 1, 1);
        let b = _mm256_set_epi32(2, 2, 2, 2, 2, 2, 2, 2);

        let e = _mm256_set_epi32(3, 3, 3, 3, 3, 3, 3, 3);
        let r = _mm256_add_epi32(a, b);

        assert_eq_m256i(e, r);
    }

    if is_x86_feature_detected!("avx2") {
        unsafe { test() }
    } else {
        loop {}
    }
}

#[target_feature(enable = "avx")]
pub unsafe fn assert_eq_m256i(a: __m256i, b: __m256i) {
    black_box(&a as *const _ as *const _);
    black_box(&b as *const _ as *const _);
}

lqd · 2021-03-12T21:22:47Z

note: we're also talking about this in https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/major.20performance.20regression.20between.20Rust.201.2E50.20and.20.2383027

we have a repro that is easier to work with https://godbolt.org/z/3vTv3s

nagisa · 2021-03-12T21:26:04Z

I'm assuming that you mean, "if the caller's feature set is not a superset of the function, then the function cannot be inlined." If that assumption is wrong, then I think my mental model is broken.

I'm sorry for my confusing wording. Its not exactly superset, but when features are compatible. Subset-superset relationship does not always imply compatibility, though it usually does, and for x86_64, as far as I can tell, if the callee has a subset of features, it is compatible for inlining.

So let me try to play this back to you in my own words to make sure I grok this. So let's pick a function that isn't getting inlined, say, as_i32x8. It has no target_feature attribute and is only #[inline]. Since it's part of std, it's compiled with the lowest common denominator on x86_64, so its actual features when I compile the repro above don't include avx.

After some thinking I think what may be happening here is somewhat different. I'll output some LLVM-IR in the further explanation as well as some rust code. Everything (MCVE) together is in this godbolt.

So… when a #[target_feature(enable="avx2")] is specified on top of a function, as such:

#[target_feature(enable = "avx2")]
pub unsafe fn _mm256_add_epi32(a: __m256i, b: __m256i) -> __m256i { ... }

It will translate to a function that looks a lot like this:

define void @_mm256_add_epi32(%__m256i* %0, %__m256i* %1, %__m256i* %2) unnamed_addr #0 { ... }

attributes #0 = { ... "target-cpu"="skylake-avx512" "target-features"="+avx2" }

Similarly, when a function as such is compiled:

pub(crate) trait m256iExt: Sized {
    // ...
    // #[target_feature(default)]
    fn as_i32x8(self) -> i32x8 {
        unsafe { transmute(self.as_m256i()) }
    }
}

It will become a:

define internal fastcc void @as_i32x8(<8 x i32>* %0, %__m256i* %1) unnamed_addr #0 { ... }

attributes #0 = { ... "target-cpu"="skylake-avx512" } ; uses global default target-features!

Now, AFAICT LLVM will not "combine" the per-function target features to the list of global features, but rather overwrite. And so what ought to happen here is that we have a _mm256_add_epi32 with a single target feature (avx2), and a as_i32x8 with whatever the globally set features are (with -Ctarget-cpu=native maybe the entire list of all the features your CPU supports)?

Now, some further exploration with the godbolt example has showed some pretty weird behaviours, so I'm not exactly sure if what I'm saying is entirely correct.

Everything is inlined just fine if we remove the #[inline] annotations;
as_i32x8 is inlined into _mm256_add_epi32 if we make _mm256_add_epi32 a inline(never), despite the supposedly mismatching target-features.

So I think my theory may be plausible to some extent, but also probably incorrect given the two weird behaviours above...

nagisa · 2021-03-12T21:28:53Z

In short, there's at least one bug with -Ctarget-cpu=native handling – we should be prepending the features that we set globally to the target-features set of every function as well. Whether that will help with this bug or not, I'm not sure.

lqd · 2021-03-14T08:46:20Z

Not that there were many doubts left, but I've indeed bisected this to c87ef0a. That is #80749 as expected.

…ochenkov Adjust `-Ctarget-cpu=native` handling in cg_llvm When cg_llvm encounters the `-Ctarget-cpu=native` it computes an explciit set of features that applies to the target in order to correctly compile code for the host CPU (because e.g. `skylake` alone is not sufficient to tell if some of the instructions are available or not). However there were a couple of issues with how we did this. Firstly, the order in which features were overriden wasn't quite right – conceptually you'd expect `-Ctarget-cpu=native` option to override the features that are implicitly set by the target definition. However due to how other `-Ctarget-cpu` values are handled we must adopt the following order of priority: * Features from -Ctarget-cpu=*; are overriden by * Features implied by --target; are overriden by * Features from -Ctarget-feature; are overriden by * function specific features. Another problem was in that the function level `target-features` attribute would overwrite the entire set of the globally enabled features, rather than just the features the `#[target_feature(enable/disable)]` specified. With something like `-Ctarget-cpu=native` we'd end up in a situation wherein a function without `#[target_feature(enable)]` annotation would have a broader set of features compared to a function with one such attribute. This turned out to be a cause of heavy run-time regressions in some code using these function-level attributes in conjunction with `-Ctarget-cpu=native`, for example. With this PR rustc is more careful about specifying the entire set of features for functions that use `#[target_feature(enable/disable)]` or `#[instruction_set]` attributes. Sadly testing the original reproducer for this behaviour is quite impossible – we cannot rely on `-Ctarget-cpu=native` to be anything in particular on developer or CI machines. cc rust-lang#83027 `@BurntSushi`

apiraino · 2021-03-17T11:37:25Z

Assigning P-high as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.

@rustbot label -I-prioritize +P-high

BurntSushi · 2021-03-18T15:39:56Z

This does appear fixed by #83084!

$ time ./rg-nightly-2021-03-10 -c --no-mmap -a '[a-z]' /tmp/subtitles2016-sample.en
31813587

real    3.945
user    3.812
sys     0.130
maxmem  7 MB
faults  0

$ time ./rg-nightly-2021-03-17 -c --no-mmap -a '[a-z]' /tmp/subtitles2016-sample.en
31813587

real    1.507
user    1.372
sys     0.133
maxmem  7 MB
faults  0

Thanks again @nagisa and everyone who helped diagnose this problem. :-)

jonas-schievink added I-slow Issue: Problems and improvements with respect to performance of generated code. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Mar 11, 2021

rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 11, 2021

BurntSushi changed the title ~~major performance regression between Rust 1.50 and nightly~~ major performance regression between Rust 1.50 and beta Mar 11, 2021

Mark-Simulacrum added regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Mar 11, 2021

Mark-Simulacrum added this to the 1.51.0 milestone Mar 11, 2021

Mark-Simulacrum added the E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc label Mar 11, 2021

camelid added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Mar 11, 2021

BurntSushi changed the title ~~major performance regression between Rust 1.50 and beta~~ major performance regression between Rust 1.50 and beta when using target-cpu=native Mar 12, 2021

nagisa removed the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Mar 13, 2021

nagisa mentioned this issue Mar 13, 2021

Adjust -Ctarget-cpu=native handling in cg_llvm #83084

Merged

lqd removed the E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc label Mar 14, 2021

nagisa mentioned this issue Mar 14, 2021

Miscompilation of AVX2 code under --release #79865

Closed

nikic mentioned this issue Mar 15, 2021

Intrinsics not inlined despite cfg. #83137

Closed

JohnTitor added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Mar 17, 2021

rustbot added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Mar 17, 2021

apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Mar 17, 2021

BurntSushi closed this as completed Mar 18, 2021

as-com mentioned this issue Mar 24, 2021

AES hash is significantly slower than fallback for short strings on Broadwell tkaitchuck/aHash#66

Closed

hkratz mentioned this issue May 3, 2021

core:: SIMD primitives not inlined with stable Rust (1.51.0) with target-cpu=native simd-lite/simd-json#189

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

major performance regression between Rust 1.50 and beta when using target-cpu=native #83027

major performance regression between Rust 1.50 and beta when using target-cpu=native #83027

BurntSushi commented Mar 11, 2021 •

edited

Loading

BurntSushi commented Mar 11, 2021

Mark-Simulacrum commented Mar 11, 2021

BurntSushi commented Mar 11, 2021

camelid commented Mar 11, 2021

tmiasko commented Mar 12, 2021

spastorino commented Mar 12, 2021

BurntSushi commented Mar 12, 2021

nagisa commented Mar 12, 2021 •

edited

Loading

BurntSushi commented Mar 12, 2021

nagisa commented Mar 12, 2021 •

edited

Loading

BurntSushi commented Mar 12, 2021

nagisa commented Mar 12, 2021

lqd commented Mar 12, 2021

BurntSushi commented Mar 12, 2021

alexcrichton commented Mar 12, 2021

lqd commented Mar 12, 2021

nagisa commented Mar 12, 2021 •

edited

Loading

nagisa commented Mar 12, 2021

lqd commented Mar 14, 2021 •

edited

Loading

apiraino commented Mar 17, 2021

BurntSushi commented Mar 18, 2021

major performance regression between Rust 1.50 and beta when using target-cpu=native #83027

major performance regression between Rust 1.50 and beta when using target-cpu=native #83027

Comments

BurntSushi commented Mar 11, 2021 • edited Loading

BurntSushi commented Mar 11, 2021

Mark-Simulacrum commented Mar 11, 2021

BurntSushi commented Mar 11, 2021

camelid commented Mar 11, 2021

tmiasko commented Mar 12, 2021

spastorino commented Mar 12, 2021

BurntSushi commented Mar 12, 2021

nagisa commented Mar 12, 2021 • edited Loading

BurntSushi commented Mar 12, 2021

nagisa commented Mar 12, 2021 • edited Loading

BurntSushi commented Mar 12, 2021

nagisa commented Mar 12, 2021

lqd commented Mar 12, 2021

BurntSushi commented Mar 12, 2021

alexcrichton commented Mar 12, 2021

lqd commented Mar 12, 2021

nagisa commented Mar 12, 2021 • edited Loading

nagisa commented Mar 12, 2021

lqd commented Mar 14, 2021 • edited Loading

apiraino commented Mar 17, 2021

BurntSushi commented Mar 18, 2021

BurntSushi commented Mar 11, 2021 •

edited

Loading

nagisa commented Mar 12, 2021 •

edited

Loading

nagisa commented Mar 12, 2021 •

edited

Loading

nagisa commented Mar 12, 2021 •

edited

Loading

lqd commented Mar 14, 2021 •

edited

Loading