-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put labels code back in #1
Conversation
@@ -776,6 +777,8 @@ impl<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>> FunctionCx<'a, 'tcx, Bx> { | |||
|
|||
debug!("codegen_block({:?}={:?})", bb, data); | |||
|
|||
let llbb = bx.llbb(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we move that closer to where it is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't. I put this there because I'm worried it will be overwritten by any of the statements. That being said, it seems way back we also never passed the block in (and instead used llbb()
in add_yk_block_label
, which just gets the current block the builder is pointing to). Since we never had any problems with this, it might mean that the statements don't overwrite this. But it still makes me nervous that we can't be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Shall we add a comment?
// bx.add_yk_block_label(lbl_name); | ||
//} | ||
if bx.tcx().sess.opts.cg.tracer.sir_labels() && | ||
!bx.tcx().def_path_str(self.instance.def_id()).contains("drop_in_place") && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should be exact equality checks so as to not skip user functions that may contain "drop_in_place".
IIRC, the two offenders are core::drop_in_place
and std::drop_in_place
, but might be wrong.
!bx.tcx().crate_name(LOCAL_CRATE).as_str().starts_with("rustc") { | ||
use ykpack::BLOCK_LABEL_PREFIX; | ||
let lbl_name = CString::new(format!( | ||
"NEW_{}:{}:{}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I should have killed the NEW_
prefix. Let's kill it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we have a constant defined someplace for this in ykpack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes. We use it! Sorry.
So just kill NEW_
Just a few comments! |
Addressed your comments. Although there are other |
LGTM. You can squash if you like, but we will probably rebase the target branch later anyway. Up to you. |
Squashed. |
Actually, if you want to wait a minute I can add another commit that adds the function return labels too. |
ok |
Is this ready for re-review? |
I've just added the labels for function call returns. I've also deduplicated some code and moved it into |
Yes, this is ready for re-review. |
bx.cx().tcx().symbol_name(fx.instance).name.as_str() != "main" && | ||
!bx.tcx().crate_name(LOCAL_CRATE).as_str().starts_with("rustc") { | ||
let llbb = bx.llbb(); | ||
use ykpack::BLOCK_LABEL_PREFIX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need another prefix for this kind of label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could but it's not strictly necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a block contains a call to a function with a return value, won't we insert two labels of the same name at different places?
I may be wrong, I don't 100% remember the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut feeling says no, but I'm not sure. Let's rename it just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah damn. Because I moved the label generation into BuilderCx
I now need to add another argument to the add_yk_block_label
function.
You meant the |
Sorry, yes, I meant the |
Moved the label generation code to after |
bors r+ |
Let's merge this manually, as bors will test this when my branch is merged. |
* Fix `const-display.rs` XPATH queries * Add `issue_76501.rs` test file * Rename issue_76501.rs to issue-76501.rs
Optimise align_offset for stride=1 further `stride == 1` case can be computed more efficiently through `-p (mod a)`. That, then translates to a nice and short sequence of LLVM instructions: %address = ptrtoint i8* %p to i64 %negptr = sub i64 0, %address %offset = and i64 %negptr, %a_minus_one And produces pretty much ideal code-gen when this function is used in isolation. Typical use of this function will, however, involve use of the result to offset a pointer, i.e. %aligned = getelementptr inbounds i8, i8* %p, i64 %offset This still looks very good, but LLVM does not really translate that to what would be considered ideal machine code (on any target). For example that's the codegen we obtain for an unknown alignment: ; x86_64 dec rsi mov rax, rdi neg rax and rax, rsi add rax, rdi In particular negating a pointer is not something that’s going to be optimised for in the design of CISC architectures like x86_64. They are much better at offsetting pointers. And so we’d love to utilize this ability and produce code that's more like this: ; x86_64 lea rax, [rsi + rdi - 1] neg rsi and rax, rsi To achieve this we need to give LLVM an opportunity to apply its various peep-hole optimisations that it does during DAG selection. In particular, the `and` instruction appears to be a major inhibitor here. We cannot, sadly, get rid of this load-bearing operation, but we can reorder operations such that LLVM has more to work with around this instruction. One such ordering is proposed in #75579 and results in LLVM IR that looks broadly like this: ; using add enables `lea` and similar CISCisms %offset_ptr = add i64 %address, %a_minus_one %mask = sub i64 0, %a %masked = and i64 %offset_ptr, %mask ; can be folded with `gepi` that may follow %offset = sub i64 %masked, %address …and generates the intended x86_64 machine code. One might also wonder how the increased amount of code would impact a RISC target. Turns out not much: ; aarch64 previous ; aarch64 new sub x8, x1, #1 add x8, x1, x0 neg x9, x0 sub x8, x8, #1 and x8, x9, x8 neg x9, x1 add x0, x0, x8 and x0, x8, x9 (and similarly for ppc, sparc, mips, riscv, etc) The only target that seems to do worse is… wasm32. Onto actual measurements – the best way to evaluate snipets like these is to use llvm-mca. Much like Aarch64 assembly would allow to suspect, there isn’t any performance difference to be found. Both snippets execute in same number of cycles for the CPUs I tried. On x86_64, we get throughput improvement of >50%! Fixes #75579
Before: ``` 2:rustc INFO rustc_interface::passes Pre-codegen 2:rustcTy interner total ty lt ct all 2:rustc Adt : 1078 81.3%, 0.0% 0.0% 0.0% 0.0% 2:rustc Array : 1 0.1%, 0.0% 0.0% 0.0% 0.0% 2:rustc Slice : 1 0.1%, 0.0% 0.0% 0.0% 0.0% 2:rustc RawPtr : 2 0.2%, 0.0% 0.0% 0.0% 0.0% 2:rustc Ref : 4 0.3%, 0.1% 0.1% 0.0% 0.0% 2:rustc FnDef : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc FnPtr : 76 5.7%, 0.0% 0.0% 0.0% 0.0% 2:rustc Placeholder : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Generator : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc GeneratorWitness : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Dynamic : 3 0.2%, 0.0% 0.0% 0.0% 0.0% 2:rustc Closure : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Tuple : 13 1.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Bound : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Param : 146 11.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Infer : 2 0.2%, 0.1% 0.0% 0.0% 0.0% 2:rustc Projection : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Opaque : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Foreign : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc total 1326 0.2% 0.1% 0.0% 0.0% 2:rustcInternalSubsts interner: #437 2:rustcRegion interner: #355 2:rustcStability interner: #1 2:rustcConst Stability interner: #0 2:rustcAllocation interner: #0 2:rustcLayout interner: #0 ``` After: ``` INFO rustc_interface::passes Post-codegen Ty interner total ty lt ct all Adt : 1078 81.3%, 0.0% 0.0% 0.0% 0.0% Array : 1 0.1%, 0.0% 0.0% 0.0% 0.0% Slice : 1 0.1%, 0.0% 0.0% 0.0% 0.0% RawPtr : 2 0.2%, 0.0% 0.0% 0.0% 0.0% Ref : 4 0.3%, 0.1% 0.1% 0.0% 0.0% FnDef : 0 0.0%, 0.0% 0.0% 0.0% 0.0% FnPtr : 76 5.7%, 0.0% 0.0% 0.0% 0.0% Placeholder : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Generator : 0 0.0%, 0.0% 0.0% 0.0% 0.0% GeneratorWitness : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Dynamic : 3 0.2%, 0.0% 0.0% 0.0% 0.0% Closure : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Tuple : 13 1.0%, 0.0% 0.0% 0.0% 0.0% Bound : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Param : 146 11.0%, 0.0% 0.0% 0.0% 0.0% Infer : 2 0.2%, 0.1% 0.0% 0.0% 0.0% Projection : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Opaque : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Foreign : 0 0.0%, 0.0% 0.0% 0.0% 0.0% total 1326 0.2% 0.1% 0.0% 0.0% InternalSubsts interner: #437 Region interner: #355 Stability interner: #1 Const Stability interner: #0 Allocation interner: #0 Layout interner: #0 ```
Don't print thread ids and names in `tracing` logs Before: ``` 2:rustc INFO rustc_interface::passes Pre-codegen 2:rustcTy interner total ty lt ct all 2:rustc Adt : 1078 81.3%, 0.0% 0.0% 0.0% 0.0% 2:rustc Array : 1 0.1%, 0.0% 0.0% 0.0% 0.0% 2:rustc Slice : 1 0.1%, 0.0% 0.0% 0.0% 0.0% 2:rustc RawPtr : 2 0.2%, 0.0% 0.0% 0.0% 0.0% 2:rustc Ref : 4 0.3%, 0.1% 0.1% 0.0% 0.0% 2:rustc FnDef : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc FnPtr : 76 5.7%, 0.0% 0.0% 0.0% 0.0% 2:rustc Placeholder : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Generator : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc GeneratorWitness : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Dynamic : 3 0.2%, 0.0% 0.0% 0.0% 0.0% 2:rustc Closure : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Tuple : 13 1.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Bound : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Param : 146 11.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Infer : 2 0.2%, 0.1% 0.0% 0.0% 0.0% 2:rustc Projection : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Opaque : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Foreign : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc total 1326 0.2% 0.1% 0.0% 0.0% 2:rustcInternalSubsts interner: #437 2:rustcRegion interner: #355 2:rustcStability interner: #1 2:rustcConst Stability interner: #0 2:rustcAllocation interner: #0 2:rustcLayout interner: #0 ``` After: ``` INFO rustc_interface::passes Post-codegen Ty interner total ty lt ct all Adt : 1078 81.3%, 0.0% 0.0% 0.0% 0.0% Array : 1 0.1%, 0.0% 0.0% 0.0% 0.0% Slice : 1 0.1%, 0.0% 0.0% 0.0% 0.0% RawPtr : 2 0.2%, 0.0% 0.0% 0.0% 0.0% Ref : 4 0.3%, 0.1% 0.1% 0.0% 0.0% FnDef : 0 0.0%, 0.0% 0.0% 0.0% 0.0% FnPtr : 76 5.7%, 0.0% 0.0% 0.0% 0.0% Placeholder : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Generator : 0 0.0%, 0.0% 0.0% 0.0% 0.0% GeneratorWitness : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Dynamic : 3 0.2%, 0.0% 0.0% 0.0% 0.0% Closure : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Tuple : 13 1.0%, 0.0% 0.0% 0.0% 0.0% Bound : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Param : 146 11.0%, 0.0% 0.0% 0.0% 0.0% Infer : 2 0.2%, 0.1% 0.0% 0.0% 0.0% Projection : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Opaque : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Foreign : 0 0.0%, 0.0% 0.0% 0.0% 0.0% total 1326 0.2% 0.1% 0.0% 0.0% InternalSubsts interner: #437 Region interner: #355 Stability interner: #1 Const Stability interner: #0 Allocation interner: #0 Layout interner: #0 ``` Closes rust-lang/rust#78931 r? ``@oli-obk``
``` Benchmark #1: ./raytracer_cg_clif_pre Time (mean ± σ): 9.553 s ± 0.129 s [User: 9.543 s, System: 0.008 s] Range (min … max): 9.438 s … 9.837 s 10 runs Benchmark softdevteam#2: ./raytracer_cg_clif_post Time (mean ± σ): 9.463 s ± 0.055 s [User: 9.452 s, System: 0.008 s] Range (min … max): 9.387 s … 9.518 s 10 runs Summary './raytracer_cg_clif_post' ran 1.01 ± 0.01 times faster than './raytracer_cg_clif_pre' ```
Don't run `resolve_vars_if_possible` in `normalize_erasing_regions` Neither `@eddyb` nor I could figure out what this was for. I changed it to `assert_eq!(normalized_value, infcx.resolve_vars_if_possible(&normalized_value));` and it passed the UI test suite. <details><summary> Outdated, I figured out the issue - `needs_infer()` needs to come _after_ erasing the lifetimes </summary> Strangely, if I change it to `assert!(!normalized_value.needs_infer())` it panics almost immediately: ``` query stack during panic: #0 [normalize_generic_arg_after_erasing_regions] normalizing `<str::IsWhitespace as str::pattern::Pattern>::Searcher` #1 [needs_drop_raw] computing whether `str::iter::Split<str::IsWhitespace>` needs drop softdevteam#2 [mir_built] building MIR for `str::<impl str>::split_whitespace` softdevteam#3 [unsafety_check_result] unsafety-checking `str::<impl str>::split_whitespace` softdevteam#4 [mir_const] processing MIR for `str::<impl str>::split_whitespace` softdevteam#5 [mir_promoted] processing `str::<impl str>::split_whitespace` softdevteam#6 [mir_borrowck] borrow-checking `str::<impl str>::split_whitespace` softdevteam#7 [analysis] running analysis passes on this crate end of query stack ``` I'm not entirely sure what's going on - maybe the two disagree? </details> For context, this came up while reviewing rust-lang/rust#77467 (cc `@lcnr).` Possibly this needs a crater run? r? `@nikomatsakis` cc `@matthewjasper`
There we go. Let's get this merged into your fork and then we can raise a proper PR against softdevteam from there.