Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try inline(usually) more #130685

Conversation

workingjubilee
Copy link
Member

see #130679

figured I'd see what happens if you sed it in to the library.

@rustbot
Copy link
Collaborator

rustbot commented Sep 22, 2024

r? @thomcc

rustbot has assigned @thomcc.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added O-SGX Target: SGX O-unix Operating system: Unix-like O-windows Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 22, 2024
@rustbot
Copy link
Collaborator

rustbot commented Sep 22, 2024

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@workingjubilee workingjubilee marked this pull request as draft September 22, 2024 04:08
@workingjubilee
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 22, 2024
@workingjubilee workingjubilee added S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 22, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 22, 2024
…what-she-sed, r=<try>

try `inline(usually)` more

see rust-lang#130679

figured I'd see what happens if you sed it in to the library.
@bors
Copy link
Contributor

bors commented Sep 22, 2024

⌛ Trying commit e2999d9 with merge 5e7a352...

@rust-log-analyzer
Copy link
Collaborator

The job mingw-check-tidy failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

COPY host-x86_64/mingw-check/validate-toolstate.sh /scripts/
COPY host-x86_64/mingw-check/validate-error-codes.sh /scripts/

# NOTE: intentionally uses python2 for x.py so we can test it still works.
# validate-toolstate only runs in our CI, so it's ok for it to only support python3.
ENV SCRIPT TIDY_PRINT_DIFF=1 python2.7 ../x.py test \
           --stage 0 src/tools/tidy tidyselftest --extra-checks=py:lint,cpp:fmt
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
#    pip-compile --allow-unsafe --generate-hashes reuse-requirements.in
---
#13 2.801 Building wheels for collected packages: reuse
#13 2.802   Building wheel for reuse (pyproject.toml): started
#13 3.049   Building wheel for reuse (pyproject.toml): finished with status 'done'
#13 3.050   Created wheel for reuse: filename=reuse-4.0.3-cp310-cp310-manylinux_2_35_x86_64.whl size=132715 sha256=dfa09868353292d98f811d3efdb0d54d07389e808efc71d68e3b93c514bf8bec
#13 3.051   Stored in directory: /tmp/pip-ephem-wheel-cache-f5ch6zzz/wheels/3d/8d/0a/e0fc6aba4494b28a967ab5eaf951c121d9c677958714e34532
#13 3.053 Installing collected packages: boolean-py, binaryornot, tomlkit, reuse, python-debian, markupsafe, license-expression, jinja2, chardet, attrs
#13 3.443 Successfully installed attrs-23.2.0 binaryornot-0.4.4 boolean-py-4.0 chardet-5.2.0 jinja2-3.1.4 license-expression-30.3.0 markupsafe-2.1.5 python-debian-0.1.49 reuse-4.0.3 tomlkit-0.13.0
#13 3.443 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#13 3.968 Collecting virtualenv
#13 3.968 Collecting virtualenv
#13 4.021   Downloading virtualenv-20.26.5-py3-none-any.whl (6.0 MB)
#13 4.303      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.0/6.0 MB 21.4 MB/s eta 0:00:00
#13 4.344 Collecting distlib<1,>=0.3.7
#13 4.353   Downloading distlib-0.3.8-py2.py3-none-any.whl (468 kB)
#13 4.365      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.9/468.9 KB 43.4 MB/s eta 0:00:00
#13 4.397 Collecting platformdirs<5,>=3.9.1
#13 4.406   Downloading platformdirs-4.3.6-py3-none-any.whl (18 kB)
#13 4.442 Collecting filelock<4,>=3.12.2
#13 4.452   Downloading filelock-3.16.1-py3-none-any.whl (16 kB)
#13 4.532 Installing collected packages: distlib, platformdirs, filelock, virtualenv
#13 4.721 Successfully installed distlib-0.3.8 filelock-3.16.1 platformdirs-4.3.6 virtualenv-20.26.5
#13 DONE 4.8s

#14 [7/8] COPY host-x86_64/mingw-check/validate-toolstate.sh /scripts/
#14 DONE 0.0s
---
DirectMap4k:      204736 kB
DirectMap2M:     7135232 kB
DirectMap1G:    11534336 kB
##[endgroup]
Executing TIDY_PRINT_DIFF=1 python2.7 ../x.py test            --stage 0 src/tools/tidy tidyselftest --extra-checks=py:lint,cpp:fmt
+ TIDY_PRINT_DIFF=1 python2.7 ../x.py test --stage 0 src/tools/tidy tidyselftest --extra-checks=py:lint,cpp:fmt
    Finished `dev` profile [unoptimized] target(s) in 0.04s
##[endgroup]
downloading https://ci-artifacts.rust-lang.org/rustc-builds-alt/55043f067dcf7067e7c6ebccf3639af94ff57bda/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz
extracting /checkout/obj/build/cache/llvm-55043f067dcf7067e7c6ebccf3639af94ff57bda-true/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz to /checkout/obj/build/x86_64-unknown-linux-gnu/ci-llvm
---
   Compiling tidy v0.1.0 (/checkout/src/tools/tidy)
    Finished `release` profile [optimized] target(s) in 28.66s
##[endgroup]
fmt check
Diff in /checkout/library/alloc/src/task.rs:133:
 // trait dispatch - instead both impls call this function directly and
 // explicitly.
 #[cfg(target_has_atomic = "ptr")]
-#[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+#[cfg_attr(bootstrap, inline(always))]
+#[cfg_attr(not(bootstrap), inline(usually))]
 fn raw_waker<W: Wake + Send + Sync + 'static>(waker: Arc<W>) -> RawWaker {
     // Increment the reference count of the arc to clone it.
Diff in /checkout/library/alloc/src/task.rs:144:
     // This allows optimizing Waker::will_wake to a single pointer comparison of
     // the vtable pointers, rather than comparing all four function pointers
     // within the vtables.
     // within the vtables.
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     unsafe fn clone_waker<W: Wake + Send + Sync + 'static>(waker: *const ()) -> RawWaker {
         unsafe { Arc::increment_strong_count(waker as *const W) };
         RawWaker::new(
Diff in /checkout/library/alloc/src/task.rs:310:
 // the safety of `From<Rc<W>> for Waker` does not depend on the correct
 // trait dispatch - instead both impls call this function directly and
 // explicitly.
-#[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+#[cfg_attr(bootstrap, inline(always))]
+#[cfg_attr(not(bootstrap), inline(usually))]
 fn local_raw_waker<W: LocalWake + 'static>(waker: Rc<W>) -> RawWaker {
     // Increment the reference count of the Rc to clone it.
Diff in /checkout/library/alloc/src/task.rs:317:
Diff in /checkout/library/alloc/src/task.rs:317:
     // Refer to the comment on raw_waker's clone_waker regarding why this is
     // always inline.
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     unsafe fn clone_waker<W: LocalWake + 'static>(waker: *const ()) -> RawWaker {
         unsafe { Rc::increment_strong_count(waker as *const W) };
         RawWaker::new(
Diff in /checkout/library/alloc/src/ffi/c_str.rs:273:
 
         // Specialization for avoiding reallocation
         // Specialization for avoiding reallocation
-        #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))] // Without that it is not inlined into specializations
+        #[cfg_attr(bootstrap, inline(always))]
+        #[cfg_attr(not(bootstrap), inline(usually))] // Without that it is not inlined into specializations
         fn spec_new_impl_bytes(bytes: &[u8]) -> Result<CString, NulError> {
             // We cannot have such large slice that we would overflow here
             // but using `checked_add` allows LLVM to assume that capacity never overflows
Diff in /checkout/library/alloc/src/collections/linked_list.rs:1497:
     /// Provides a reference to the cursor's parent list.
     #[must_use]
     #[must_use]
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     #[unstable(feature = "linked_list_cursors", issue = "58533")]
     pub fn as_list(&self) -> &'a LinkedList<T, A> {
         self.list
Diff in /checkout/library/alloc/src/collections/linked_list.rs:1619:
     /// `CursorMut`, which means it cannot outlive the `CursorMut` and that the
     /// `CursorMut` is frozen for the lifetime of the reference.
     #[must_use]
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     #[unstable(feature = "linked_list_cursors", issue = "58533")]
     pub fn as_list(&self) -> &LinkedList<T, A> {
         self.list
Diff in /checkout/library/alloc/src/collections/binary_heap/mod.rs:801:
         let tail_len = self.len() - start;
 
 
-        #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+        #[cfg_attr(bootstrap, inline(always))]
+        #[cfg_attr(not(bootstrap), inline(usually))]
         fn log2_fast(x: usize) -> usize {
             (usize::BITS - x.leading_zeros() - 1) as usize
Diff in /checkout/library/alloc/src/rc.rs:354:
 }
 
 
 impl<T: ?Sized, A: Allocator> Rc<T, A> {
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     fn inner(&self) -> &RcBox<T> {
         // This unsafety is ok because while this Rc is alive we're guaranteed
Diff in /checkout/library/alloc/src/rc.rs:2207:
Diff in /checkout/library/alloc/src/rc.rs:2207:
 impl<T: ?Sized, A: Allocator> Deref for Rc<T, A> {
 
 
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     fn deref(&self) -> &T {
         &self.inner().value
Diff in /checkout/library/alloc/src/rc.rs:2453:
     ///
     ///
     /// assert_eq!(Some(Ordering::Less), five.partial_cmp(&Rc::new(6)));
     /// ```
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     fn partial_cmp(&self, other: &Rc<T, A>) -> Option<Ordering> {
         (**self).partial_cmp(&**other)
Diff in /checkout/library/alloc/src/rc.rs:2471:
     ///
     ///
     /// assert!(five < Rc::new(6));
     /// ```
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     fn lt(&self, other: &Rc<T, A>) -> bool {
         **self < **other
Diff in /checkout/library/alloc/src/rc.rs:2489:
     ///
     ///
     /// assert!(five <= Rc::new(5));
     /// ```
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     fn le(&self, other: &Rc<T, A>) -> bool {
         **self <= **other
Diff in /checkout/library/alloc/src/rc.rs:2507:
     ///
     ///
     /// assert!(five > Rc::new(4));
     /// ```
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     fn gt(&self, other: &Rc<T, A>) -> bool {
         **self > **other
Diff in /checkout/library/alloc/src/rc.rs:2525:
     ///
     ///
     /// assert!(five >= Rc::new(5));
     /// ```
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     fn ge(&self, other: &Rc<T, A>) -> bool {
         **self >= **other
Diff in /checkout/library/alloc/src/rc.rs:3527:
 }
 
 
 impl<T: ?Sized> RcInnerPtr for RcBox<T> {
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     fn weak_ref(&self) -> &Cell<usize> {
         &self.weak
Diff in /checkout/library/alloc/src/rc.rs:3534:
 
 
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     fn strong_ref(&self) -> &Cell<usize> {
     }
Diff in /checkout/library/alloc/src/rc.rs:3539:
 }
 
 
 impl<'a> RcInnerPtr for WeakInner<'a> {
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     fn weak_ref(&self) -> &Cell<usize> {
         self.weak
Diff in /checkout/library/alloc/src/rc.rs:3546:
 
 
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     fn strong_ref(&self) -> &Cell<usize> {
     }
Diff in /checkout/library/alloc/src/boxed.rs:246:
     /// let five = Box::new(5);
     /// ```
     /// ```
     #[cfg(not(no_global_oom_handling))]
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     #[stable(feature = "rust1", since = "1.0.0")]
     #[must_use]
     #[rustc_diagnostic_item = "box_new"]
Diff in /checkout/library/alloc/src/boxed.rs:316:
     #[cfg(not(no_global_oom_handling))]
     #[stable(feature = "pin", since = "1.33.0")]
     #[must_use]
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     pub fn pin(x: T) -> Pin<Box<T>> {
         Box::new(x).into()
Diff in /checkout/library/alloc/src/boxed.rs:609:
Diff in /checkout/library/alloc/src/boxed.rs:609:
     #[cfg(not(no_global_oom_handling))]
     #[unstable(feature = "allocator_api", issue = "32838")]
     #[must_use]
-    #[cfg_attr(bootstrap, inline(always))]#[cfg_attr(not(bootstrap), inline(usually))]
+    #[cfg_attr(bootstrap, inline(always))]
+    #[cfg_attr(not(bootstrap), inline(usually))]
     pub fn pin_in(x: T, alloc: A) -> Pin<Self>
     where
         A: 'static + Allocator,
fmt error: Running `"/checkout/obj/build/x86_64-unknown-linux-gnu/rustfmt/bin/rustfmt" "--config-path" "/checkout" "--edition" "2021" "--unstable-features" "--skip-children" "--check" "/checkout/src/librustdoc/config.rs" "/checkout/src/librustdoc/lib.rs" "/checkout/src/librustdoc/core.rs" "/checkout/src/librustdoc/externalfiles.rs" "/checkout/src/librustdoc/doctest/tests.rs" "/checkout/src/librustdoc/doctest/markdown.rs" "/checkout/src/librustdoc/doctest/runner.rs" "/checkout/src/librustdoc/doctest/rust.rs" "/checkout/src/librustdoc/doctest/make.rs" "/checkout/src/librustdoc/json/import_finder.rs" "/checkout/src/librustdoc/json/conversions.rs" "/checkout/src/librustdoc/json/mod.rs" "/checkout/src/librustdoc/formats/cache.rs" "/checkout/src/librustdoc/formats/item_type.rs" "/checkout/src/librustdoc/formats/renderer.rs" "/checkout/src/librustdoc/formats/mod.rs" "/checkout/src/librustdoc/fold.rs" "/checkout/src/librustdoc/visit_ast.rs" "/checkout/src/librustdoc/scrape_examples.rs" "/checkout/src/librustdoc/lint.rs" "/checkout/src/rustdoc-json-types/tests.rs" "/checkout/src/rustdoc-json-types/lib.rs" "/checkout/compiler/rustc_codegen_gcc/src/back/write.rs" "/checkout/compiler/rustc_codegen_gcc/src/back/mod.rs" "/checkout/compiler/rustc_codegen_gcc/src/back/lto.rs" "/checkout/compiler/rustc_codegen_gcc/src/intrinsic/llvm.rs" "/checkout/compiler/rustc_codegen_gcc/src/intrinsic/simd.rs" "/checkout/compiler/rustc_codegen_gcc/src/intrinsic/mod.rs" "/checkout/compiler/rustc_codegen_gcc/src/callee.rs" "/checkout/compiler/rustc_codegen_gcc/src/abi.rs" "/checkout/compiler/rustc_codegen_gcc/src/builder.rs" "/checkout/compiler/rustc_codegen_gcc/src/debuginfo.rs" "/checkout/compiler/rustc_codegen_gcc/src/archive.rs" "/checkout/compiler/rustc_codegen_gcc/src/gcc_util.rs" "/checkout/compiler/rustc_codegen_gcc/src/consts.rs" "/checkout/compiler/rustc_codegen_gcc/src/allocator.rs" "/checkout/compiler/rustc_codegen_gcc/src/attributes.rs" "/checkout/compiler/rustc_codegen_gcc/src/context.rs" "/checkout/compiler/rustc_codegen_gcc/src/errors.rs" "/checkout/compiler/rustc_codegen_gcc/src/declare.rs" "/checkout/compiler/rustc_codegen_gcc/src/base.rs" "/checkout/compiler/rustc_codegen_gcc/src/lib.rs" "/checkout/compiler/rustc_codegen_gcc/src/type_of.rs" "/checkout/compiler/rustc_codegen_gcc/src/asm.rs" "/checkout/compiler/rustc_codegen_gcc/src/coverageinfo.rs" "/checkout/compiler/rustc_codegen_gcc/src/mono_item.rs" "/checkout/compiler/rustc_codegen_gcc/src/common.rs" "/checkout/compiler/rustc_codegen_gcc/src/int.rs" "/checkout/compiler/rustc_codegen_gcc/src/type_.rs" "/checkout/library/alloc/src/sync/tests.rs" "/checkout/library/alloc/src/alloc.rs" "/checkout/library/alloc/src/alloc/tests.rs" "/checkout/library/alloc/src/tests.rs" "/checkout/library/alloc/src/rc/tests.rs" "/checkout/library/alloc/src/testing/crash_test.rs" "/checkout/library/alloc/src/testing/rng.rs" "/checkout/library/alloc/src/testing/ord_chaos.rs" "/checkout/library/alloc/src/testing/mod.rs" "/checkout/library/alloc/src/slice/tests.rs" "/checkout/library/alloc/src/macros.rs" "/checkout/library/alloc/src/rc.rs" "/checkout/library/alloc/src/boxed.rs" "/checkout/library/alloc/src/slice.rs" "/checkout/src/librustdoc/docfs.rs"` failed.
If you're running `tidy`, try again with `--bless`. Or, if you just want to format code, run `./x.py fmt` instead.
  local time: Sun Sep 22 04:13:47 UTC 2024
  network time: Sun, 22 Sep 2024 04:13:47 GMT
##[error]Process completed with exit code 1.
Post job cleanup.

@bors
Copy link
Contributor

bors commented Sep 22, 2024

☀️ Try build successful - checks-actions
Build commit: 5e7a352 (5e7a352c787b87cb3472e19667546db82e03d3c8)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5e7a352): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.3%] 6
Regressions ❌
(secondary)
0.7% [0.6%, 0.8%] 3
Improvements ✅
(primary)
-1.0% [-2.0%, -0.2%] 13
Improvements ✅
(secondary)
-0.8% [-1.1%, -0.3%] 10
All ❌✅ (primary) -0.6% [-2.0%, 0.3%] 19

Max RSS (memory usage)

Results (primary -1.0%, secondary 0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.4%, 0.6%] 2
Regressions ❌
(secondary)
3.0% [2.2%, 4.4%] 3
Improvements ✅
(primary)
-2.4% [-2.9%, -2.0%] 2
Improvements ✅
(secondary)
-2.7% [-3.2%, -2.1%] 3
All ❌✅ (primary) -1.0% [-2.9%, 0.6%] 4

Cycles

Results (primary 0.3%, secondary 6.8%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.6% [1.6%, 3.7%] 4
Regressions ❌
(secondary)
6.8% [2.3%, 10.3%] 7
Improvements ✅
(primary)
-1.5% [-2.0%, -1.0%] 5
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [-2.0%, 3.7%] 9

Binary size

Results (primary 0.4%, secondary 0.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.4% [0.0%, 0.9%] 63
Regressions ❌
(secondary)
0.8% [0.0%, 2.5%] 18
Improvements ✅
(primary)
-0.3% [-0.4%, -0.2%] 5
Improvements ✅
(secondary)
-0.0% [-0.0%, -0.0%] 10
All ❌✅ (primary) 0.4% [-0.4%, 0.9%] 68

Bootstrap: 769.437s -> 768.143s (-0.17%)
Artifact size: 341.48 MiB -> 341.57 MiB (0.02%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 22, 2024
@workingjubilee workingjubilee deleted the inline-usually-thats-what-she-sed branch September 22, 2024 18:30
@saethlin saethlin mentioned this pull request Sep 24, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 24, 2024
Add inline(usually)

I'm looking into what kind of things could recover the perf improvement detected in rust-lang#121417 (comment). I think it's worth spending quite a bit of effort to figure out how to capture a 45% incr-patched improvement.

As far as I can tell, the root cause of the problem is that we have taken very deliberate steps in the compiler to ensure that `#[inline(always)]`  causes inlining where possible, even when all optimizations are disabled. Some of the reasons that was done are now outdated or were misguided, but some some users still have a legitimate use for the current behavior, `@bjorn3` says:

> Unlike other targets the mere presence of a simd instruction is not allowed if the wasm runtime doesn't support simd. Other targets merely require it to never be executed at runtime.

I'm quite sure that the majority of users applied this attribute believing it does not cause inlining in unoptimized builds, or didn't appreciate the build time regressions that implies and would prefer it didn't if they knew. (if that's you, put a heart on this or say something elsewhere, don't reply on this PR)

I am going to _try_ to use the existing benchmark suite to evaluate a number of different approaches and take notes here, and hopefully I can collect enough data to shape any conversation about what we can do to help users.

The core of this PR is `InlineAttr::Usually` (name doesn't matter) which ensures that when optimizations are enabled that the function is inlined (usual exceptions like recursion apply). I think most users believe this is what `#[inline(always)]` does.

rust-lang#130685 (comment) Replaced `#[inline(always)]` with `#[inline(usually)]` in the standard library, and did not recover the same 45% incr-patched improvement in regex. It's a tidy net positive though, and I suspect that perf improvement would normally be big enough to motivate merging a change. I think that means the standard library's use of `#[inline(always)]` is imposing marginal compile time overhead on the ecosystem, but the bigger opportunities are probably in third-party crates.

rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` literally everywhere; this gets the desired incr-patched improvement but suffers quite a few check and doc regressions. I think that means that `alwaysinline` is more powerful than `function-inline-cost=0` in LLVM.

rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` when `-Copt-level=0`, which looks basically the same as rust-lang#121417 (comment) (omit `alwaysinline` when doing `-Copt-level=0` codegen).

TODO: Try function-inline-cost = -1000, perhaps penalties can add up and snuff inline(usually)?
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 26, 2024
Add inline(usually)

I'm looking into what kind of things could recover the perf improvement detected in rust-lang#121417 (comment). I think it's worth spending quite a bit of effort to figure out how to capture a 45% incr-patched improvement.

As far as I can tell, the root cause of the problem is that we have taken very deliberate steps in the compiler to ensure that `#[inline(always)]`  causes inlining where possible, even when all optimizations are disabled. Some of the reasons that was done are now outdated or were misguided. I think the only remaining use case is where the inlined body even without optimizations is cheaper to codegen or call, for example SIMD intrinsics may require a lot of code to put their arguments on the stack, which is slow to compile and run.

I'm quite sure that the majority of users applied this attribute believing it does not cause inlining in unoptimized builds, or didn't appreciate the build time regressions that implies and would prefer it didn't if they knew. (if that's you, put a heart on this or say something elsewhere, don't reply on this PR)

I am going to _try_ to use the existing benchmark suite to evaluate a number of different approaches and take notes here, and hopefully I can collect enough data to shape any conversation about what we can do to help users.

The core of this PR is `InlineAttr::Usually` (name doesn't matter) which ensures that when optimizations are enabled that the function is inlined (usual exceptions like recursion apply). I think most users believe this is what `#[inline(always)]` does.

rust-lang#130685 (comment) Replaced `#[inline(always)]` with `#[inline(usually)]` in the standard library, and did not recover the same 45% incr-patched improvement in regex. It's a tidy net positive though, and I suspect that perf improvement would normally be big enough to motivate merging a change. I think that means the standard library's use of `#[inline(always)]` is imposing marginal compile time overhead on the ecosystem, but the bigger opportunities are probably in third-party crates.

rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` literally everywhere; this gets the desired incr-patched improvement but suffers quite a few check and doc regressions. I think that means that `alwaysinline` is more powerful than `function-inline-cost=0` in LLVM.

rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` when `-Copt-level=0`, which looks basically the same as rust-lang#121417 (comment) (omit `alwaysinline` when doing `-Copt-level=0` codegen).

rust-lang#130679 (comment) replaces `alwaysinline` with a very negative inline cost, and it still has check and doc regressions. More investigation required on what the different inlining decision is.

rust-lang#130679 (comment) is a likely explanation of this, with some interesting implications; adding `inline(always)` to a function that was going to be inlined anyway can change change optimizations (usually it seems to improve things?).

TODO: stm32f4 and to a lesser extent bitmaps seem to compile slower and to larger binaries when we treat `inline(always)` as `inline(usually)`. Is that because of this? https://github.com/rust-lang/rust/blob/9e394f551c050ff03c6fc57f190e0761cf0be6e8/compiler/rustc_middle/src/mir/mono.rs#L141 If it's not, what happens if we infer `alwaysinline` for extremely small functions like most of those in stm32f4?
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 27, 2024
Add inline(usually)

I'm looking into what kind of things could recover the perf improvement detected in rust-lang#121417 (comment). I think it's worth spending quite a bit of effort to figure out how to capture a 45% incr-patched improvement.

As far as I can tell, the root cause of the problem is that we have taken very deliberate steps in the compiler to ensure that `#[inline(always)]`  causes inlining where possible, even when all optimizations are disabled. Some of the reasons that was done are now outdated or were misguided. I think the only remaining use case is where the inlined body even without optimizations is cheaper to codegen or call, for example SIMD intrinsics may require a lot of code to put their arguments on the stack, which is slow to compile and run.

I'm quite sure that the majority of users applied this attribute believing it does not cause inlining in unoptimized builds, or didn't appreciate the build time regressions that implies and would prefer it didn't if they knew. (if that's you, put a heart on this or say something elsewhere, don't reply on this PR)

I am going to _try_ to use the existing benchmark suite to evaluate a number of different approaches and take notes here, and hopefully I can collect enough data to shape any conversation about what we can do to help users.

The core of this PR is `InlineAttr::Usually` (name doesn't matter) which ensures that when optimizations are enabled that the function is inlined (usual exceptions like recursion apply). I think most users believe this is what `#[inline(always)]` does.

rust-lang#130685 (comment) Replaced `#[inline(always)]` with `#[inline(usually)]` in the standard library, and did not recover the same 45% incr-patched improvement in regex. It's a tidy net positive though, and I suspect that perf improvement would normally be big enough to motivate merging a change. I think that means the standard library's use of `#[inline(always)]` is imposing marginal compile time overhead on the ecosystem, but the bigger opportunities are probably in third-party crates.

rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` literally everywhere; this gets the desired incr-patched improvement but suffers quite a few check and doc regressions. I think that means that `alwaysinline` is more powerful than `function-inline-cost=0` in LLVM.

rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` when `-Copt-level=0`, which looks basically the same as rust-lang#121417 (comment) (omit `alwaysinline` when doing `-Copt-level=0` codegen).

rust-lang#130679 (comment) replaces `alwaysinline` with a very negative inline cost, and it still has check and doc regressions. More investigation required on what the different inlining decision is.

rust-lang#130679 (comment) is a likely explanation of this, with some interesting implications; adding `inline(always)` to a function that was going to be inlined anyway can change change optimizations (usually it seems to improve things?).

rust-lang#130679 (comment) makes `#[inline(usually)]` also defy instantiation mode selection and always be LocalCopy the way `#[inline(always)]` does, but still has regressions in stm32f4. I think that proves that `alwaysinline` can actually improve debug build times.

TODO: What happens if we infer `alwaysinline` for extremely small functions like most of those in stm32f4?
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 28, 2024
Add inline(usually)

I'm looking into what kind of things could recover the perf improvement detected in rust-lang#121417 (comment). I think it's worth spending quite a bit of effort to figure out how to capture a 45% incr-patched improvement.

As far as I can tell, the root cause of the problem is that we have taken very deliberate steps in the compiler to ensure that `#[inline(always)]`  causes inlining where possible, even when all optimizations are disabled. Some of the reasons that was done are now outdated or were misguided. I think the only remaining use case is where the inlined body even without optimizations is cheaper to codegen or call, for example SIMD intrinsics may require a lot of code to put their arguments on the stack, which is slow to compile and run.

I'm quite sure that the majority of users applied this attribute believing it does not cause inlining in unoptimized builds, or didn't appreciate the build time regressions that implies and would prefer it didn't if they knew. (if that's you, put a heart on this or say something elsewhere, don't reply on this PR)

I am going to _try_ to use the existing benchmark suite to evaluate a number of different approaches and take notes here, and hopefully I can collect enough data to shape any conversation about what we can do to help users.

The core of this PR is `InlineAttr::Usually` (name doesn't matter) which ensures that when optimizations are enabled that the function is inlined (usual exceptions like recursion apply). I think most users believe this is what `#[inline(always)]` does.

rust-lang#130685 (comment) Replaced `#[inline(always)]` with `#[inline(usually)]` in the standard library, and did not recover the same 45% incr-patched improvement in regex. It's a tidy net positive though, and I suspect that perf improvement would normally be big enough to motivate merging a change. I think that means the standard library's use of `#[inline(always)]` is imposing marginal compile time overhead on the ecosystem, but the bigger opportunities are probably in third-party crates.

rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` literally everywhere; this gets the desired incr-patched improvement but suffers quite a few check and doc regressions. I think that means that `alwaysinline` is more powerful than `function-inline-cost=0` in LLVM.

rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` when `-Copt-level=0`, which looks basically the same as rust-lang#121417 (comment) (omit `alwaysinline` when doing `-Copt-level=0` codegen).

rust-lang#130679 (comment) replaces `alwaysinline` with a very negative inline cost, and it still has check and doc regressions. More investigation required on what the different inlining decision is.

rust-lang#130679 (comment) is a likely explanation of this, with some interesting implications; adding `inline(always)` to a function that was going to be inlined anyway can change change optimizations (usually it seems to improve things?).

rust-lang#130679 (comment) makes `#[inline(usually)]` also defy instantiation mode selection and always be LocalCopy the way `#[inline(always)]` does, but still has regressions in stm32f4. I think that proves that `alwaysinline` can actually improve debug build times.

TODO: What happens if we infer `alwaysinline` for extremely small functions like most of those in stm32f4?
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 29, 2024
Add inline(usually)

I'm looking into what kind of things could recover the perf improvement detected in rust-lang#121417 (comment). I think it's worth spending quite a bit of effort to figure out how to capture a 45% incr-patched improvement.

As far as I can tell, the root cause of the problem is that we have taken very deliberate steps in the compiler to ensure that `#[inline(always)]`  causes inlining where possible, even when all optimizations are disabled. Some of the reasons that was done are now outdated or were misguided. I think the only remaining use case is where the inlined body even without optimizations is cheaper to codegen or call, for example SIMD intrinsics may require a lot of code to put their arguments on the stack, which is slow to compile and run.

I'm quite sure that the majority of users applied this attribute believing it does not cause inlining in unoptimized builds, or didn't appreciate the build time regressions that implies and would prefer it didn't if they knew. (if that's you, put a heart on this or say something elsewhere, don't reply on this PR)

I am going to _try_ to use the existing benchmark suite to evaluate a number of different approaches and take notes here, and hopefully I can collect enough data to shape any conversation about what we can do to help users.

The core of this PR is `InlineAttr::Usually` (name doesn't matter) which ensures that when optimizations are enabled that the function is inlined (usual exceptions like recursion apply). I think most users believe this is what `#[inline(always)]` does.

rust-lang#130685 (comment) Replaced `#[inline(always)]` with `#[inline(usually)]` in the standard library, and did not recover the same 45% incr-patched improvement in regex. It's a tidy net positive though, and I suspect that perf improvement would normally be big enough to motivate merging a change. I think that means the standard library's use of `#[inline(always)]` is imposing marginal compile time overhead on the ecosystem, but the bigger opportunities are probably in third-party crates.

rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` literally everywhere; this gets the desired incr-patched improvement but suffers quite a few check and doc regressions. I think that means that `alwaysinline` is more powerful than `function-inline-cost=0` in LLVM.

rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` when `-Copt-level=0`, which looks basically the same as rust-lang#121417 (comment) (omit `alwaysinline` when doing `-Copt-level=0` codegen).

rust-lang#130679 (comment) replaces `alwaysinline` with a very negative inline cost, and it still has check and doc regressions. More investigation required on what the different inlining decision is.

rust-lang#130679 (comment) is a likely explanation of this, with some interesting implications; adding `inline(always)` to a function that was going to be inlined anyway can change change optimizations (usually it seems to improve things?).

rust-lang#130679 (comment) makes `#[inline(usually)]` also defy instantiation mode selection and always be LocalCopy the way `#[inline(always)]` does, but still has regressions in stm32f4. I think that proves that `alwaysinline` can actually improve debug build times.

rust-lang#130679 (comment) infers `alwaysinline` for extremely trivial functions, but still has regressions for stm32f4. But of course it does, I left `inline(always)` treated as `inline(usually)` which slows down the compiler 🤦 inconclusive perf run.

TODO: What happens if we infer `alwaysinline` for extremely small functions like most of those in stm32f4?
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 29, 2024
Add inline(usually)

I'm looking into what kind of things could recover the perf improvement detected in rust-lang#121417 (comment). I think it's worth spending quite a bit of effort to figure out how to capture a 45% incr-patched improvement.

As far as I can tell, the root cause of the problem is that we have taken very deliberate steps in the compiler to ensure that `#[inline(always)]`  causes inlining where possible, even when all optimizations are disabled. Some of the reasons that was done are now outdated or were misguided. I think the only remaining use case is where the inlined body even without optimizations is cheaper to codegen or call, for example SIMD intrinsics may require a lot of code to put their arguments on the stack, which is slow to compile and run.

I'm quite sure that the majority of users applied this attribute believing it does not cause inlining in unoptimized builds, or didn't appreciate the build time regressions that implies and would prefer it didn't if they knew. (if that's you, put a heart on this or say something elsewhere, don't reply on this PR)

I am going to _try_ to use the existing benchmark suite to evaluate a number of different approaches and take notes here, and hopefully I can collect enough data to shape any conversation about what we can do to help users.

The core of this PR is `InlineAttr::Usually` (name doesn't matter) which ensures that when optimizations are enabled that the function is inlined (usual exceptions like recursion apply). I think most users believe this is what `#[inline(always)]` does.

rust-lang#130685 (comment) Replaced `#[inline(always)]` with `#[inline(usually)]` in the standard library, and did not recover the same 45% incr-patched improvement in regex. It's a tidy net positive though, and I suspect that perf improvement would normally be big enough to motivate merging a change. I think that means the standard library's use of `#[inline(always)]` is imposing marginal compile time overhead on the ecosystem, but the bigger opportunities are probably in third-party crates.

rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` literally everywhere; this gets the desired incr-patched improvement but suffers quite a few check and doc regressions. I think that means that `alwaysinline` is more powerful than `function-inline-cost=0` in LLVM.

rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` when `-Copt-level=0`, which looks basically the same as rust-lang#121417 (comment) (omit `alwaysinline` when doing `-Copt-level=0` codegen).

rust-lang#130679 (comment) replaces `alwaysinline` with a very negative inline cost, and it still has check and doc regressions. More investigation required on what the different inlining decision is.

rust-lang#130679 (comment) is a likely explanation of this, with some interesting implications; adding `inline(always)` to a function that was going to be inlined anyway can change change optimizations (usually it seems to improve things?).

rust-lang#130679 (comment) makes `#[inline(usually)]` also defy instantiation mode selection and always be LocalCopy the way `#[inline(always)]` does, but still has regressions in stm32f4. I think that proves that `alwaysinline` can actually improve debug build times.

rust-lang#130679 (comment) infers `alwaysinline` for extremely trivial functions, but still has regressions for stm32f4. But of course it does, I left `inline(always)` treated as `inline(usually)` which slows down the compiler 🤦 inconclusive perf run.

rust-lang#130679 (comment) doesn't have any stm32f4 regressions 🥳 I think this means that there is some threshold where `alwaysinline` produces faster debug builds.

So still two questions:
1. Why does `alwaysinline` sometimes make debug builds faster?
2. Is there any obvious threshold at which adding `alwaysinline` causes more work for debug builds?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-SGX Target: SGX O-unix Operating system: Unix-like O-windows Operating system: Windows perf-regression Performance regression. S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants