-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve codegen of fmt_num to delete unreachable panic #122770
improve codegen of fmt_num to delete unreachable panic #122770
Conversation
it seems LLVM doesn't realize that `curr` is always decremented at least once in either loop formatting characters of the input string by their appropriate radix, and so the later `&buf[curr..]` generates a check for out-of-bounds access and panic. this is unreachable in reality as even for `x == T::zero()` we'll produce at least the character `Self::digit(T::zero())` for at least one character output, and `curr` will always be at least one below `buf.len()`. adjust `fmt_int` to make this fact more obvious to the compiler, which fortunately (or unfortunately) results in a measurable performance improvement for workloads heavy on formatting integers.
rustbot has assigned @workingjubilee. Use |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…ion, r=<try> improve codegen of fmt_num to delete unreachable panic it seems LLVM doesn't realize that `curr` is always decremented at least once in either loop formatting characters of the input string by their appropriate radix, and so the later `&buf[curr..]` generates a check for out-of-bounds access and panic. this is unreachable in reality as even for `x == T::zero()` we'll produce at least the character `Self::digit(T::zero())`, yielding at least one character output, and `curr` will always be at least one below `buf.len()`. adjust `fmt_int` to make this fact more obvious to the compiler, which fortunately (or unfortunately) results in a measurable performance improvement for workloads heavy on formatting integers. in the program i'd noticed this in, you can see the `cmp $0x80,%rdi; ja 7c` here, which branches to a slice index fail helper: <img width="660" alt="before" src="https://github.com/rust-lang/rust/assets/4615790/ac482d54-21f8-494b-9c83-4beadc3ca0ef"> where after this change the function is broadly similar, but smaller, with one fewer registers updated in each pass through the loop in addition the never-taken `cmp/ja` being gone: <img width="646" alt="after" src="https://github.com/rust-lang/rust/assets/4615790/1bee1d76-b674-43ec-9b21-4587364563aa"> this represents a ~2-3% difference in runtime in my [admittedly comically i32-formatting-bound](https://github.com/athre0z/disas-bench/blob/master/bench/yaxpeax/src/main.rs#L58-L67) use case (printing x86 instructions, including i32 displacements and immediates) as measured on a ryzen 9 3950x. the impact on `<impl LowerHex for i8>::fmt` is both more dramatic and less impactful: it continues to have a loop that is evaluated at most twice, though the compiler doesn't know that to unroll it. the generated code there is identical to the impl for `i32`. there, the smaller loop body has less effect on runtime, and removing the never-taken slice bounds check is offset by whatever address recalculation is happening with the `lea/add/neg` at the end of the loop. it behaves about the same before and after. --- i went through a few tries at getting llvm to understand the bounds check isn't necessary, but i should mention the _best_ i'd seen here was actually from the existing `fmt_int` with a diff like ```diff if x == zero { // No more digits left to accumulate. break; }; } } + + if curr >= buf.len() { + unsafe { core::hint::unreachable_unchecked(); } + } let buf = &buf[curr..]; ``` posting a random PR to `rust-lang/rust` to do that without a really really compelling reason seemed a bit absurd, so i tried to work that into something that seems more palatable at a glance. but if you're interested, that certainly produced better (x86_64) code through LLVM. in that case with `buf.iter_mut().rev()` as the iterator, `<impl LowerHex for i8>::fmt` actually unrolls into something like ``` put_char(x & 0xf); let mut len = 1; if x > 0xf { put_char((x >> 4) & 0xf); len = 2; } pad_integral(buf[buf.len() - len..]); ``` it's pretty cool! `<impl LowerHex for i32>::fmt` also was slightly better. that all resulted in closer to an 6% difference in my use case. --- i have not looked at formatters other than LowerHex/UpperHex with this change, though i'd be a bit shocked if any were _worse_. (i have absolutely _no_ idea how you'd regression test this, but that might be just my not knowing what the right tool for that would be in rust-lang/rust. i'm of half a mind that this is small and fiddly enough to not be worth landing lest it quietly regress in the future anyway. but i didn't want to discard the idea without at least offering it upstream here)
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
@iximeow this can in fact be regression-tested! we'd use a codegen (llvm level) or assembly (x86-64 level) test. they use FileCheck to verify that a certain pattern is fulfilled, and can be parameterized with a bit of fussing with their DSL and regex so that e.g. it doesn't matter exactly which |
Finished benchmarking commit (baf578c): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 669.372s -> 670.355s (0.15%) |
are there codegen tests for how rustc compiles |
🤨 or given that i expected this at worst to have no effect, maybe i should double-check how |
@iximeow That performance suite, and the remark, is about compiler tests, primarily. There's a runtime perf suite, but it showed no change. It's not surprising to me that rustc's happy path instruction counts aren't greatly affected by this change (this branch, after all, should not be hit), and the two tests it is affected on are fairly "noisy" tests and also reflect synthetic stress tests.
It will be under An example that is relevant is this, with its |
@iximeow the perf bot's default remarks are also based on instructions executed. note that the wall time is reduced: this, combined with the (slight) binary size reduction, in generated artifacts (admittedly, the only non-synthetic mark here is helloworld): suggests to me that this may in fact be resulting in smaller code sizes with better cache layout. also, I'd be happy to accept the |
i added for tests like this is a problem for the new
where the assertion i want to write is really that but in this .ll all we get is the function declaration,
i was hoping that maybe by making the crate type sufficiently statically linked that i could get a .ll with this function filled in, but i don't see how. at least then writing a correct test would be reduced to "write a i couldn't see similar prior art in other forms of (i also added benches for formatting |
@iximeow Oh, no, you're right, this is tricky to test. Hrm. |
@saethlin do we have a sufficient combination of commands such that with enough
we can make this codegen-testable anyway? |
That is true and I really do not like that approach. Playing games with the inliner rules tends to have undesired effects; for example it's possible if In addition, codegen tests are built against whatever sysroot happens to be specified by the settings in I recently added a test based on using The usual LTO at home flags are:
Full LLVM optimizations, the LTO at home flag, enable MIR inlining in all possible cases then crank up the thresholds to an absurd level. (be aware that just applying such a high MIR inlining threshold to arbitrary crates will cause absurd memory usage, even my beefy desktop cannot compile But this doesn't help in part because of what I said above: The sysroot has already been compiled with a different set of flags that make LTO at home fail to apply. (as an optimization to compile time grumble grumble) #115306 we only encode MIR which is deduced to be useful to other crates when compiling the current crate. The compilation of |
An assembly test with |
this empty commit reproduces a github comment that describes the work on commits from this point back to, roughly, 1.2.2. since many commits between these two points are interesting in the context of performance optimization (especially uarch-relevant tweaks), many WIP commits are preserved. as a result there is no clear squash merge, and this commit will be the next best thing. on Rust 1.68.0 and a Xeon E3-1230 V2, relative changes are measured roughly as: starting at ed4f238: - non-fmt ns/decode: 15ns - non-fmt instructions/decode: 94.6 - non-fmt IPC: 1.71 - fmt ns/decode+display: 91ns - fmt instructions/decode+display: 683.8 - fmt IPC: 2.035 ending at 6a5ea10 - non-fmt ns/decode: 15ns - non-fmt instructions/decode: 94.6 - non-fmt IPC: 1.71 - fmt ns/decode+display: 47ns - fmt instructions/decode+display: 329.6 - fmt IPC: 1.898 for an overall ~50% reduction in runtimes to display instructions. writing into InstructionTextBuffer reduces overhead another ~10%. -- original message follows -- this is where much of iximeow/yaxpeax-arch#7 originated. `std::fmt` as a primary writing mechanism has.. some limitations: * rust-lang/rust#92993 (comment) * llvm/llvm-project#87440 * rust-lang/rust#122770 and some more interesting more fundamental limitations - writing to a `T: fmt::Write` means implementations don't know if it's possible to write bytes in reverse order (useful for printing digits) or if it's OK to write too many bytes and then only advance `len` by the correct amount (useful for copying variable-length-but-short strings like register names). these are both perfectly fine to a `String` or `Vec`, less fine to do to a file descriptor like stdout. at the same time, `Colorize` and traits depending on it are very broken, for reasons described in yaxpeax-arch. so, this adapts `yaxpeax-x86` to use the new `DisplaySink` type for writing, with optimizations where appropriate and output spans for certain kinds of tokens - registers, integers, opcodes, etc. it's not a perfect replacement for Colorize-to-ANSI-supporting-outputs but it's more flexible and i think can be made right. along the way this completes the move of `safer_unchecked` out to yaxpeax-arch (ty @5225225 it's still so useful), cleans up some docs, and comes with a few new test cases. because of the major version bump of yaxpeax-arch, and because this removes most functionality of the Colorize impl - it prints the correct words, just without coloring - this is itself a major version bump to 2.0.0. yay! this in turn is a good point to change the `Opcode` enums from being tuple-like to struct-like, and i've done so in 1b8019d. full notes in CHANGELOG ofc. this is notes for myself when i'm trying to remember any of this in two years :)
We discussed this in a T-libs meeting. Given the difficulty of writing a codegen test that properly inlines everything we're ok with merging this without a test. AIUI the test currently isn't working as intended? If so it should be removed from the PR and someone wants to add a working codegen test that can be done in a followup PR. |
this test was included for demonstrative and discussion purposes but does not test what its name alleges - in fact it does not test much of value at all! as mentioned in this comment, rust-lang#122770 (comment) , writing a correct test for this codegen outcome is difficult (partially because the public interface to std::fmt intentionally includes an #[inline(never)] function!), so to test this correctly will require more than i can offer in 122770.
correct, i'd included it in the hopes it was close to being useful (alas!) - i've removed it but kept the benches for u8 formatting as those are informative. |
Ah, thank you! Sorry about the everything, @bors r+ rollup=never |
☀️ Test successful - checks-actions |
Finished benchmarking commit (22bcb81): comparison URL. Overall result: ❌ regressions - no action needed@rustbot label: -perf-regression Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -2.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -3.3%, secondary 1.8%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary 0.0%, secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 786.257s -> 787.179s (0.12%) |
78fc550584d Auto merge of rust-lang#133247 - GuillaumeGomez:reduce-integer-display-impl, r=workingjubilee db5c2c6a18f Rollup merge of rust-lang#132982 - suaviloquence:2-doc-changed-alloc-methods, r=Mark-Simulacrum 117ad4f17b0 Rollup merge of rust-lang#132533 - SUPERCILEX:patch-4, r=Mark-Simulacrum e2aa7c108dc fix `Allocator` method names in `alloc` free function docs 6b141ee1007 Rollup merge of rust-lang#133298 - n0toose:remove-dir-all-but-not-paths, r=Noratrieb e3691db3d70 Rollup merge of rust-lang#133260 - compiler-errors:deref, r=fee1-dead 895f290b058 Rollup merge of rust-lang#132730 - joboet:after_main_sync, r=Noratrieb 6ffa4556f46 Rollup merge of rust-lang#133389 - eduardosm:stabilize-const_float_methods, r=RalfJung f4139353bf6 Rollup merge of rust-lang#133301 - GuillaumeGomez:add-example-wrapping-neg, r=workingjubilee 6112cfd51ba Auto merge of rust-lang#132611 - compiler-errors:async-prelude, r=ibraheemdev 23a5a0e1f54 Auto merge of rust-lang#132597 - lukas-code:btree-plug-leak, r=jhpratt f0b0942f402 Constify Deref and DerefMut d05e8e8e9fb Auto merge of rust-lang#133379 - jieyouxu:rollup-00jxo71, r=jieyouxu 641c1ae3978 Stabilize `const_float_methods` 256c54db000 Auto merge of rust-lang#133377 - jieyouxu:rollup-n536hzq, r=jieyouxu dff533fc1d3 Improve code by using `unsigned_abs` a850f7c8760 Rollup merge of rust-lang#133237 - fee1-dead-contrib:constadd, r=compiler-errors 99741dd6dcb Rollup merge of rust-lang#133332 - bjoernager:const-array-as-mut-slice, r=jhpratt 9a152e2565f Rollup merge of rust-lang#131505 - madsmtm:darwin_user_temp_dir, r=dtolnay a12c8386f5b Auto merge of rust-lang#132994 - clubby789:cc-bisect, r=Kobzol 6548ad8c627 Auto merge of rust-lang#133360 - compiler-errors:rollup-a2o38tq, r=compiler-errors a4f797ed7de Rollup merge of rust-lang#133264 - lolbinarycat:os-string-truncate, r=joboet a9398015dc4 Auto merge of rust-lang#132329 - compiler-errors:fn-and-destruct, r=lcnr 30aa6db150e Add code example for `wrapping_neg` method for signed integers bc77567dc6b Deduplicate checking drop terminator 6f3ec5caf86 Gate const drop behind const_destruct feature, and fix const_precise_live_drops post-drop-elaboration check fb6f0c2e982 Auto merge of rust-lang#133339 - jieyouxu:rollup-gav0nvr, r=jieyouxu c792ef3e01a Rollup merge of rust-lang#133337 - ColinFinck:thread-scoped-fix-typo, r=joboet cfed1c696dd Rollup merge of rust-lang#133330 - RalfJung:close, r=the8472 e26edf01cab Rollup merge of rust-lang#133313 - thesummer:fix-arc4random, r=cuviper 90a85ef0042 Rollup merge of rust-lang#133288 - bjoernager:const-array-each-ref, r=jhpratt 4e6f154dba6 Rollup merge of rust-lang#133238 - heiher:loong-stdarch-rexport, r=Amanieu 23a1b31cfa4 Auto merge of rust-lang#130867 - michirakara:steps_between, r=dtolnay 9693572a6e3 Fix typo in `std::thread::Scope::spawn` documentation. b4a50679016 Mark '<[T; N]>::as_mut_slice' as 'const'; b6b40efcac5 library: update comment around close() 6ce7e79948e Don't try to use confstr in Miri 40d6e2ca941 Auto merge of rust-lang#129238 - umgefahren:stabilize-ipv6-unique-local, r=dtolnay 276c0fc3fd1 distinguish overflow and unimplemented in Step::steps_between 8be952bbbdf Use arc4random of libc for RTEMS target 4583ddef0cd Mention that std::fs::remove_dir_all fails on files 4f6ca37539a Mark and implement 'each_ref' and 'each_mut' in '[T; N]' as const; ec220b6b368 constify `Add` 3c558bf870d Rollup merge of rust-lang#131736 - hoodmane:emscripten-wasm-bigint, r=workingjubilee 38d4c11c3ff implement OsString::truncate 4fd2c8db81e Rollup merge of rust-lang#133226 - compiler-errors:opt-in-pointer-like, r=lcnr 3f03a0f3fcd Rollup merge of rust-lang#130800 - bjoernager:const-mut-cursor, r=joshtriplett eea7e230c2b Rollup merge of rust-lang#129838 - Ayush1325:uefi-process-args, r=joboet 8b4995aad87 Make PointerLike opt-in as a trait f74b38aa2b1 Reduce integer `Display` implementation size 2f179d1a7c9 Stabilize const_pin_2 b2dc297e863 re-export `is_loongarch_feature_detected` e26c298e75c Rollup merge of rust-lang#132732 - gavincrawford:as_ptr_attribute, r=Urgau d6ee9dbe8b0 Rollup merge of rust-lang#133183 - n0toose:improve-remove-dir-docs, r=joboet 40735d31322 Rollup merge of rust-lang#125405 - m-ou-se:thread-add-spawn-hook, r=WaffleLapkin 6c20348f356 Rollup merge of rust-lang#123947 - zopsicle:vec_deque-Iter-as_slices, r=Amanieu 2089cb3188c Update doc comments for spawn hook. c02090d6a3a Address review comments. 79bffa90cfe Fix tracking issue. 3eff64c78b0 Add tracking issue. 15bac4f115e Use Send + Sync for spawn hooks. a42af062212 Add thread Builder::no_hooks(). 49ac15b192b Update thread spawn hooks. 2cc4b2ef6d9 Use add_spawn_hook for libtest's output capturing. 24a0765dd03 Add std::thread::add_spawn_hook. 50ac72542be Correct comments concerning updated dangling pointer lint cdf54869ec6 Auto merge of rust-lang#133205 - matthiaskrgr:rollup-xhhhp5u, r=matthiaskrgr 543667a49e3 Rollup merge of rust-lang#133200 - RalfJung:miri-rwlock-test, r=tgross35 7430eb407e2 ignore an occasionally-failing test in Miri 607b493008b Rollup merge of rust-lang#133182 - RalfJung:const-panic-inline, r=tgross35 e6cd122d981 Rollup merge of rust-lang#132758 - nnethercote:improve-get_key_value-docs, r=cuviper a3c95972a3b Mention std::fs::remove_dir_all in std::fs::remove_dir bd5c1426568 Bump `stdarch` to the latest master e84f865601d const_panic: inline in bootstrap builds to avoid f16/f128 crashes 05fecb95df2 std: allow after-main use of synchronization primitives c1beb25002a Auto merge of rust-lang#133160 - jhpratt:rollup-wzj9q15, r=jhpratt ce80c9f20de Rollup merge of rust-lang#133145 - kornelski:static-mutex, r=traviscross f385ac21fee Auto merge of rust-lang#128219 - connortsui20:rwlock-downgrade, r=tgross35 86151abcc67 rename rustc_const_stable_intrinsic -> rustc_intrinsic_const_stable_indirect a33f889e959 Improve `{BTreeMap,HashMap}::get_key_value` docs. 15e6fc0a5db Document alternatives to `static mut` 1cd1dd7c56f Auto merge of rust-lang#120370 - x17jiri:likely_unlikely_fix, r=saethlin e475f40115f Likely unlikely fix ddcabfe85ff Rollup merge of rust-lang#133126 - ohno418:fix-String-doc, r=jhpratt e4eff6a5a78 Rollup merge of rust-lang#133116 - RalfJung:const-null-ptr, r=dtolnay 16e6d209ccb alloc: fix `String`'s doc e4fb96229e0 clean up const stability around UB checks ee78601c3ce stabilize const_ptr_is_null 1e4a9ee22cd Rollup merge of rust-lang#132449 - RalfJung:is_val_statically_known, r=compiler-errors 1dfe94c308a Rollup merge of rust-lang#131717 - tgross35:stabilize-const_atomic_from_ptr, r=RalfJung 70326e8f708 reduce threads in downgrade test d58e4f2d08c fix `DOWNGRADED` bit unpreserved 5d683160ada fix memory ordering bug + bad test 0604b8ffc1d add safety comments for queue implementation 00255e627b7 add `downgrade` to `queue` implementation 40256c63003 modify queue implementation documentation f8041644f07 add `downgrade` to `futex` implementation 572adedb6a0 add simple `downgrade` implementations 48bcf09d8d7 add `downgrade` method onto `RwLockWriteGuard` 5416aef6a86 add `RwLock` `downgrade` tests 40109807569 Rollup merge of rust-lang#133050 - tgross35:inline-f16-f128, r=saethlin 2ee4159af2a Rollup merge of rust-lang#133048 - cyrgani:ptr-doc-update, r=Amanieu e1448dee6d0 Rollup merge of rust-lang#133019 - sorairolake:add-missing-period-and-colon, r=tgross35 b1d31d297f2 Rollup merge of rust-lang#132984 - sunshowers:pipe2, r=tgross35 8cef1ef708e Rollup merge of rust-lang#132977 - cberner:fix_solaris, r=tgross35 daa9c433370 Rollup merge of rust-lang#132790 - aDotInTheVoid:ioslice-asslice-rides-again, r=cuviper cdb5ff5f6fe Pass `f16` and `f128` by value in `const_assert!` 60ef4797595 use `&raw` in `{read, write}_unaligned` documentation d2983fffb15 Auto merge of rust-lang#132709 - programmerjake:optimize-charto_digit, r=joshtriplett 918cc8d59c8 Rollup merge of rust-lang#133027 - no1wudi:master, r=jhpratt 25f55123bf2 Auto merge of rust-lang#133026 - workingjubilee:rollup-q8ig6ah, r=workingjubilee d8de2ccfc55 Fix a copy-paste issue in the NuttX raw type definition c06bb349936 Rollup merge of rust-lang#133008 - onur-ozkan:update-outdated-comment, r=jieyouxu 8eaea39049a Rollup merge of rust-lang#133004 - cuviper:unrecover-btree, r=ibraheemdev 81a191a826e Rollup merge of rust-lang#133003 - zachs18:clonetouninit-dyn-compat-u8, r=dtolnay e3e5e358c71 Rollup merge of rust-lang#132907 - BLANKatGITHUB:intrinsic, r=saethlin f57853bc421 Rollup merge of rust-lang#131304 - RalfJung:float-core, r=tgross35 7bc04367d32 Auto merge of rust-lang#122770 - iximeow:ixi/int-formatting-optimization, r=workingjubilee ce2e318c5d1 docs: Fix missing colon in methods for primitive types 1870e9269bf docs: Fix missing period in methods for integer types 64397743828 Auto merge of rust-lang#133006 - matthiaskrgr:rollup-dz6oiq5, r=matthiaskrgr 98dad0baf9a update outdated comment about test-float-parse 520d4fdff72 Rollup merge of rust-lang#126046 - davidzeng0:mixed_integer_ops_unsigned_sub, r=Amanieu e3c425b74bb Auto merge of rust-lang#132662 - RalfJung:const-panic-inlining, r=tgross35 c4b77cf11ef Update core CloneToUninit tests d4e21f55ec3 btree: simplify the backdoor between set and map 5d61cf9a7ea Bump `cc` 44f376ba0c7 Fix compilation error on Solaris due to flock usage 75609d6d329 Auto merge of rust-lang#132556 - clubby789:cargo-update, r=Mark-Simulacrum 5ba28a4d89a Run `cargo update` and update licenses 08200043034 const_panic: don't wrap it in a separate function d30e2c0963b [illumos] use pipe2 to create anonymous pipes 7e12686e222 Auto merge of rust-lang#132883 - LaihoE:vectorized_is_sorted, r=thomcc 02e32d70d61 Auto merge of rust-lang#132972 - matthiaskrgr:rollup-456osr7, r=matthiaskrgr 157eb1c94be Rollup merge of rust-lang#132970 - tyilo:nonzero-u-div-ceil-issue, r=tgross35 03e52a50ac8 Rollup merge of rust-lang#132966 - RalfJung:const_option_ext, r=jhpratt 2f615a1c410 Rollup merge of rust-lang#132948 - RalfJung:const_unicode_case_lookup, r=Noratrieb f00e0913f63 Rollup merge of rust-lang#132851 - chansuke:update-comment, r=thomcc 656009854a3 Auto merge of rust-lang#132870 - Noratrieb:inline-int-parsing, r=tgross35 a0c0c40eafd Add tracking issue number to unsigned_nonzero_div_ceil feature c2296662624 Make `CloneToUninit` dyn-compatible 6ab50dd1c97 stabilize const_option_ext 27fe6c7ca46 Rollup merge of rust-lang#132541 - RalfJung:const-stable-extern-crate, r=compiler-errors 7fafe990a8b stabilize const_unicode_case_lookup c5ed62506ae Stabilize `Ipv6Addr::is_unique_local` and `Ipv6Addr::is_unicast_link_local` e0452c9af5e adds new declaration to codegen 33fa8701b27 Auto merge of rust-lang#132943 - matthiaskrgr:rollup-164l3ej, r=matthiaskrgr 7f12f02c6ec Rollup merge of rust-lang#132914 - rcorre:cell-grammar, r=tgross35 300a2664117 Rollup merge of rust-lang#132895 - scottmcm:generalize-nonnull-from-raw-parts, r=ibraheemdev a461cf9310e remove no-longer-needed abs_private 170e993a781 allow rustc_private feature in force-unstable-if-unmarked crates 4a20245f4ad Rollup merge of rust-lang#132929 - cuviper:check-alloc_zeroed, r=tgross35 992bbf7c46c Rollup merge of rust-lang#132869 - lolbinarycat:library-fix-too_long_first_doc_paragraph, r=tgross35 e3925fa3b42 Rollup merge of rust-lang#132847 - RalfJung:addr-dont-expose, r=Mark-Simulacrum 327a0d7814e Auto merge of rust-lang#132919 - matthiaskrgr:rollup-ogghyvp, r=matthiaskrgr 67c3c9f8bb7 Check for null in the `alloc_zeroed` example 068537aeb72 new intrinsic declaration b689951272e new intrinsic declaration 16fa12ebd98 Rollup merge of rust-lang#132144 - adetaylor:receiver-trait-itself, r=wesleywiser 54f699d3ee9 Rollup merge of rust-lang#120077 - SUPERCILEX:set-entry, r=Amanieu e541a4f8204 Update dangling pointer tests 7707584415a Tag relevant functions with #[rustc_as_ptr] attribute b541c5aebc8 Auto merge of rust-lang#132902 - matthiaskrgr:rollup-43qgg3t, r=matthiaskrgr 2d676d49a6e Update grammar in std::cell docs. 7325f33701c Emscripten: link with -sWASM_BIGINT 1c482c93dbd Rollup merge of rust-lang#130999 - cberner:flock_pr, r=joboet 4dd22707d54 Auto merge of rust-lang#127589 - notriddle:notriddle/search-sem-3, r=GuillaumeGomez 0af64b63ced Generalize `NonNull::from_raw_parts` per ACP362 2fd9ac4a7ba vectorize slice::is_sorted 737521c21f5 `#[inline]` integer parsing functions b9be1ddefcb split up the first paragraph of doc comments for better summaries f9063ff15cf Update the doc comment of `ASCII_CASE_MASK` 57c7b80414a elem_offset / subslice_range: use addr() instead of 'as usize' d19aa692d4e Rollup merge of rust-lang#132136 - RalfJung:target-feature-abi-compat, r=Mark-Simulacrum 6b0bd5a6630 honor rustc_const_stable_indirect in non-staged_api crate with -Zforce-unstable-if-unmarked 070baf4fab5 Add as_slice/into_slice for IoSlice/IoSliceMut. 978a5535d35 Rollup merge of rust-lang#132778 - lolbinarycat:io-Error-into_inner-docs, r=cuviper 6d54bfe5925 update io::Error::into_inner to acknowlage io::Error::other 7c0a90c3ead Address review comments ac66068ca8b Update library/std/src/sys/pal/windows/fs.rs d90f8668e45 Auto merge of rust-lang#132717 - RalfJung:rustc_safe_intrinsic, r=compiler-errors f2bf9e65116 remove support for rustc_safe_intrinsic attribute; use rustc_intrinsic functions instead 2391b4b2a0d Rollup merge of rust-lang#132738 - cuviper:channel-heap-init, r=ibraheemdev 086cfefa97e mark is_val_statically_known intrinsic as stably const-callable dffc5e73119 Rollup merge of rust-lang#132696 - fortanix:raoul/rte-235-fix_fmodl_missing_symbol_issue, r=tgross35 f14fc562a82 Rollup merge of rust-lang#132639 - RalfJung:intrinsics, r=workingjubilee,Amanieu 6d63012a62e Initialize channel `Block`s directly on the heap 7ff251b9793 core: move intrinsics.rs into intrinsics folder 6244f4816fe Auto merge of rust-lang#132714 - mati865:update-memchr, r=tgross35 a2eaef75de6 Rollup merge of rust-lang#132715 - tabokie:fix-lazy-lock-doc, r=Noratrieb 6a77b2121f1 Rollup merge of rust-lang#132665 - tyilo:nonzero-u-div-ceil, r=joboet 79d2063e0b6 Separate f128 `%` operation to deal with missing `fmodl` symbol 8022523cc73 Auto merge of rust-lang#132705 - kornelski:inline-repeat, r=tgross35 df9f5db19d7 fix lazylock comment 7a82eb5b610 Auto merge of rust-lang#131888 - ChrisDenton:deopt, r=ibraheemdev 75b9ce3b0be unpin and update memchr 4d1c7d9526a optimize char::to_digit and assert radix is at least 2 95bff3e6b5c Inline str::repeat 52c2a459546 Rollup merge of rust-lang#132617 - uellenberg:fix-rendered-doc, r=cuviper 28f7e7bbbd9 Auto merge of rust-lang#131721 - okaneco:const_eq_ignore_ascii_case, r=m-ou-se 41b7e5f7705 Auto merge of rust-lang#132500 - RalfJung:char-is-whitespace-const, r=jhpratt 4ed08bd2262 Add new unstable feature `const_eq_ignore_ascii_case` f4e9fe4ec00 Auto merge of rust-lang#132664 - matthiaskrgr:rollup-i27nr7i, r=matthiaskrgr afc66fe29e2 Change some code blocks to quotes in rendered std doc 2e63cbdb695 Rollup merge of rust-lang#131261 - clarfonthey:unsafe-cell-from-mut, r=m-ou-se ab6f663ed6a Auto merge of rust-lang#132661 - matthiaskrgr:rollup-npytbl6, r=matthiaskrgr 8b165db1e5c Implement div_ceil for NonZero<unsigned> 6bc1b1b449e Rollup merge of rust-lang#132571 - RalfJung:const_eval_select_macro, r=oli-obk c12f4d12055 Rollup merge of rust-lang#132473 - ZhekaS:core_fmt_radix_no_panic, r=joboet bbb927540c8 Rollup merge of rust-lang#132153 - bjoernager:const-char-encode-utf16, r=dtolnay 919de701b02 add const_eval_select macro to reduce redundancy 538f5b4cd2b Rollup merge of rust-lang#132609 - NotWearingPants:patch-1, r=Amanieu 86c6f276b21 Rollup merge of rust-lang#132606 - eduardosm:char-slice-str-pattern-doc, r=tgross35 4660d7ebef9 most const intrinsics don't need an explicit rustc_const_unstable any more 8eb30fe4b09 add new rustc_const_stable_intrinsic attribute for const-stable intrinsics 792d1646c7c convert all const-callable intrinsics into the new form (without extern block) fad7d68d1c2 docs: fix grammar in doc comment at unix/process.rs 92bb77993bd Improve example of `impl Pattern for &[char]` 553bb181dad Add AsyncFn* to to the prelude in all editions 2ae24bf3e15 Fixed typo, rebased 47f60d7ad0d Updated SAFETY comment to address underflow 581aa8d587f Replace checked slice indexing by unchecked to support panic-free code c5a0f6c60aa Rollup merge of rust-lang#132579 - RalfJung:rustc-std-workspace-crates, r=Amanieu 9cdbf39ba56 btree: don't leak value if destructor of key panics 4caff135a74 Stabilise 'const_char_encode_utf16'; 84fae7ebfee Auto merge of rust-lang#132586 - workingjubilee:rollup-qrmn49a, r=workingjubilee 95b4127a6c8 update rustc-std-workspace crates 082b98d885c Rollup merge of rust-lang#132423 - RalfJung:const-eval-align-offset, r=dtolnay 3b40634d007 Auto merge of rust-lang#132434 - tgross35:f128-tests, r=workingjubilee 5dea8b2e41f Enable `f128` tests on all non-buggy platforms 🎉 2bb8ea389a4 Auto merge of rust-lang#132581 - workingjubilee:rollup-4wj318p, r=workingjubilee 83bd286345b Update `compiler_builtins` to 0.1.138 and pin it 699702f2a6c Rollup merge of rust-lang#132563 - frectonz:master, r=Amanieu 4390c35bcce Auto merge of rust-lang#123723 - madsmtm:apple-std-os, r=dtolnay 1e8ed9064ee Auto merge of rust-lang#132479 - compiler-errors:fx-feat-yeet, r=fee1-dead 9a3b7c0f20b Rename the FIXMEs, remove a few that dont matter anymore ed4f110b101 Auto merge of rust-lang#132542 - RalfJung:const_panic, r=tgross35 d8bca01fc2b remove const-support for align_offset 76b866c7156 Modify `NonZero` documentation to reference the underlying integer type 9e579641927 Rollup merge of rust-lang#132511 - RalfJung:const_arguments_as_str, r=dtolnay bfeeb7450fd Rollup merge of rust-lang#132503 - RalfJung:const-hash-map, r=Amanieu a42fc213104 Rollup merge of rust-lang#132499 - RalfJung:unicode_data.rs, r=tgross35 0278cab1fcc Rollup merge of rust-lang#132393 - zedddie16:issue-131865-fix, r=tgross35 714115abfb6 Rollup merge of rust-lang#131377 - rick-de-water:nonzero-exp, r=dtolnay 9789c548b8c Rollup merge of rust-lang#129329 - eduardosm:rc-from-mut-slice, r=dtolnay ff9178b1a6a add const_panic macro to make it easier to fall back to non-formatting panic in const 9ef483bbbb0 stabilize const_arguments_as_str 4c6593fe8f8 Auto merge of rust-lang#132458 - RalfJung:rustc-const-unstable, r=Amanieu 81b20e0a3dd Rustdoc: added brief colon explanation 73d9f4da9a9 Add Set entry API e883a6074c2 Add BorrowedBuf::into_filled{,_mut} methods to allow returning buffer with original lifetime 261c5b998ad remove const_hash feature leftovers d515da6678b const_with_hasher test: actually construct a usable HashMap 11dc6c388d9 make char::is_whitespace unstably const 1a481fd9e01 unicode_data.rs: show command for generating file 3a5b026c04a get rid of a whole bunch of unnecessary rustc_const_unstable attributes 2e24b7fba4b remove no-longer-needed attribute ffbcba08e7f add missing safety comments 768d0cd7fe4 adjust test gating for f16/f128 6335056119f float types: move copysign, abs, signum to libcore c353337b8d0 rustdoc-search: simplify rules for generics and type params 9d10ab71105 Implement `From<&mut {slice}>` for `Box/Rc/Arc<{slice}>` 34329c06529 Stabilize `const_atomic_from_ptr` a2e1edf18f6 Arbitrary self types v2: (unused) Receiver trait 2d26681057c ABI compatibility: remove section on target features f1c99041c25 Support lock() and lock_shared() on async IO Files 7f6af4dd882 Revert using `HEAP` static in Windows alloc 541bda102a0 Implement file_lock feature d7a7b0a361b uefi: process: Add args support 14aef3d76cf Use with_capacity(0) because we're reading the capacity later on 5b16abea8d5 Prefer `target_vendor = "apple"` on confstr bc6398107f8 use `confstr(_CS_DARWIN_USER_TEMP_DIR, ...)` as a `TMPDIR` fallback on darwin f8dc879fa96 Add LowerExp and UpperExp implementations 50afc521a17 Stabilize UnsafeCell::from_mut aa74e934264 Mark 'get_mut' and 'set_position' in 'std::io::Cursor' as const; c370665e88b Make `std::os::darwin` public 797c2498600 Implement `mixed_integer_ops_unsigned_sub` ff1212e6a22 Add vec_deque::Iter::as_slices and friends e938deaf0f1 try adding a test that LowerHex and friends don't panic, but it doesn't work c6d2bb7445c improve codegen of fmt_num to delete unreachable panic git-subtree-dir: library git-subtree-split: 78fc550584dcdf3f917cc399f8ae1f1001887f09
it seems LLVM doesn't realize that
curr
is always decremented at least once in either loop formatting characters of the input string by their appropriate radix, and so the later&buf[curr..]
generates a check for out-of-bounds access and panic. this is unreachable in reality as even forx == T::zero()
we'll produce at least the characterSelf::digit(T::zero())
, yielding at least one character output, andcurr
will always be at least one belowbuf.len()
.adjust
fmt_int
to make this fact more obvious to the compiler, which fortunately (or unfortunately) results in a measurable performance improvement for workloads heavy on formatting integers.in the program i'd noticed this in, you can see the
cmp $0x80,%rdi; ja 7c
here, which branches to a slice index fail helper:where after this change the function is broadly similar, but smaller, with one fewer registers updated in each pass through the loop in addition the never-taken
cmp/ja
being gone:this represents a ~2-3% difference in runtime in my admittedly comically i32-formatting-bound use case (printing x86 instructions, including i32 displacements and immediates) as measured on a ryzen 9 3950x.
the impact on
<impl LowerHex for i8>::fmt
is both more dramatic and less impactful: it continues to have a loop that is evaluated at most twice, though the compiler doesn't know that to unroll it. the generated code there is identical to the impl fori32
. there, the smaller loop body has less effect on runtime, and removing the never-taken slice bounds check is offset by whatever address recalculation is happening with thelea/add/neg
at the end of the loop. it behaves about the same before and after.i initially measured slightly better outcomes using
unreachable_unchecked()
here instead, but that was hacking on std and rebuilding with-Z build-std
on an older rustc (nightly 5b377cece, 2023-06-30). it does not yield better outcomes now, so i see no reason to proceed with that approach at all.initial notes about that, seemingly irrelevant on modern rustc
i went through a few tries at getting llvm to understand the bounds check isn't necessary, but i should mention the _best_ i'd seen here was actually from the existing `fmt_int` with a diff like ```diff if x == zero { // No more digits left to accumulate. break; }; } } + + if curr >= buf.len() { + unsafe { core::hint::unreachable_unchecked(); } + } let buf = &buf[curr..]; ```posting a random PR to
rust-lang/rust
to do that without a really really compelling reason seemed a bit absurd, so i tried to work that into something that seems more palatable at a glance. but if you're interested, that certainly produced better (x86_64) code through LLVM. in that case withbuf.iter_mut().rev()
as the iterator,<impl LowerHex for i8>::fmt
actually unrolls into something likeit's pretty cool!
<impl LowerHex for i32>::fmt
also was slightly better. that all resulted in closer to an 6% difference in my use case.i have not looked at formatters other than LowerHex/UpperHex with this change, though i'd be a bit shocked if any were worse.
(i have absolutely no idea how you'd regression test this, but that might be just my not knowing what the right tool for that would be in rust-lang/rust. i'm of half a mind that this is small and fiddly enough to not be worth landing lest it quietly regress in the future anyway. but i didn't want to discard the idea without at least offering it upstream here)