Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put labels code back in #1

Merged
merged 1 commit into from
Mar 9, 2020
Merged

Conversation

ptersilie
Copy link
Collaborator

There we go. Let's get this merged into your fork and then we can raise a proper PR against softdevteam from there.

@@ -776,6 +777,8 @@ impl<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>> FunctionCx<'a, 'tcx, Bx> {

debug!("codegen_block({:?}={:?})", bb, data);

let llbb = bx.llbb();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we move that closer to where it is used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't. I put this there because I'm worried it will be overwritten by any of the statements. That being said, it seems way back we also never passed the block in (and instead used llbb() in add_yk_block_label, which just gets the current block the builder is pointing to). Since we never had any problems with this, it might mean that the statements don't overwrite this. But it still makes me nervous that we can't be sure.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Shall we add a comment?

// bx.add_yk_block_label(lbl_name);
//}
if bx.tcx().sess.opts.cg.tracer.sir_labels() &&
!bx.tcx().def_path_str(self.instance.def_id()).contains("drop_in_place") &&
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be exact equality checks so as to not skip user functions that may contain "drop_in_place".

IIRC, the two offenders are core::drop_in_place and std::drop_in_place, but might be wrong.

!bx.tcx().crate_name(LOCAL_CRATE).as_str().starts_with("rustc") {
use ykpack::BLOCK_LABEL_PREFIX;
let lbl_name = CString::new(format!(
"NEW_{}:{}:{}",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I should have killed the NEW_ prefix. Let's kill it now.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we have a constant defined someplace for this in ykpack?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. We use it! Sorry.

So just kill NEW_

@vext01
Copy link
Owner

vext01 commented Mar 5, 2020

Just a few comments!

@vext01 vext01 self-assigned this Mar 6, 2020
@ptersilie
Copy link
Collaborator Author

Addressed your comments. Although there are other drop_in_place functions, it seems we get away with only skipping those three.

@vext01
Copy link
Owner

vext01 commented Mar 8, 2020

LGTM.

You can squash if you like, but we will probably rebase the target branch later anyway.

Up to you.

@ptersilie
Copy link
Collaborator Author

Squashed.

@ptersilie
Copy link
Collaborator Author

Actually, if you want to wait a minute I can add another commit that adds the function return labels too.

@vext01
Copy link
Owner

vext01 commented Mar 9, 2020

ok

@vext01
Copy link
Owner

vext01 commented Mar 9, 2020

Is this ready for re-review?

@ptersilie
Copy link
Collaborator Author

I've just added the labels for function call returns. I've also deduplicated some code and moved it into BuilderCx. I would have liked to just use interface as a reference and forward that, but the compiler wouldn't let me, so we are now fowarding the symbol and the function names.

@ptersilie
Copy link
Collaborator Author

Yes, this is ready for re-review.

bx.cx().tcx().symbol_name(fx.instance).name.as_str() != "main" &&
!bx.tcx().crate_name(LOCAL_CRATE).as_str().starts_with("rustc") {
let llbb = bx.llbb();
use ykpack::BLOCK_LABEL_PREFIX;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need another prefix for this kind of label?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could but it's not strictly necessary.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a block contains a call to a function with a return value, won't we insert two labels of the same name at different places?

I may be wrong, I don't 100% remember the details.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling says no, but I'm not sure. Let's rename it just in case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah damn. Because I moved the label generation into BuilderCx I now need to add another argument to the add_yk_block_label function.

@vext01
Copy link
Owner

vext01 commented Mar 9, 2020

I would have liked to just use interface as a reference and forward that, but the compiler wouldn't let me

You meant the instance, right? I'm OK with passing the symbol name and index.

@ptersilie
Copy link
Collaborator Author

Sorry, yes, I meant the instance.

@ptersilie
Copy link
Collaborator Author

Moved the label generation code to after codegen_terminators which means we are now getting labels for all mir blocks (as far as I can tell). This means we can removed the label generation code for function returns. Ready to merge into your branch.

@vext01
Copy link
Owner

vext01 commented Mar 9, 2020

bors r+

@vext01
Copy link
Owner

vext01 commented Mar 9, 2020

Let's merge this manually, as bors will test this when my branch is merged.

@vext01 vext01 merged commit 215035b into vext01:yk-cgssa-sir Mar 9, 2020
vext01 pushed a commit that referenced this pull request Sep 28, 2020
* Fix `const-display.rs` XPATH queries

* Add `issue_76501.rs` test file

* Rename issue_76501.rs to issue-76501.rs
vext01 pushed a commit that referenced this pull request Nov 12, 2020
Optimise align_offset for stride=1 further

`stride == 1` case can be computed more efficiently through `-p (mod
a)`. That, then translates to a nice and short sequence of LLVM
instructions:

    %address = ptrtoint i8* %p to i64
    %negptr = sub i64 0, %address
    %offset = and i64 %negptr, %a_minus_one

And produces pretty much ideal code-gen when this function is used in
isolation.

Typical use of this function will, however, involve use of
the result to offset a pointer, i.e.

    %aligned = getelementptr inbounds i8, i8* %p, i64 %offset

This still looks very good, but LLVM does not really translate that to
what would be considered ideal machine code (on any target). For example
that's the codegen we obtain for an unknown alignment:

    ; x86_64
    dec     rsi
    mov     rax, rdi
    neg     rax
    and     rax, rsi
    add     rax, rdi

In particular negating a pointer is not something that’s going to be
optimised for in the design of CISC architectures like x86_64. They
are much better at offsetting pointers. And so we’d love to utilize this
ability and produce code that's more like this:

    ; x86_64
    lea     rax, [rsi + rdi - 1]
    neg     rsi
    and     rax, rsi

To achieve this we need to give LLVM an opportunity to apply its
various peep-hole optimisations that it does during DAG selection. In
particular, the `and` instruction appears to be a major inhibitor here.
We cannot, sadly, get rid of this load-bearing operation, but we can
reorder operations such that LLVM has more to work with around this
instruction.

One such ordering is proposed in #75579 and results in LLVM IR that
looks broadly like this:

    ; using add enables `lea` and similar CISCisms
    %offset_ptr = add i64 %address, %a_minus_one
    %mask = sub i64 0, %a
    %masked = and i64 %offset_ptr, %mask
    ; can be folded with `gepi` that may follow
    %offset = sub i64 %masked, %address

…and generates the intended x86_64 machine code.
One might also wonder how the increased amount of code would impact a
RISC target. Turns out not much:

    ; aarch64 previous                 ; aarch64 new
    sub     x8, x1, #1                 add     x8, x1, x0
    neg     x9, x0                     sub     x8, x8, #1
    and     x8, x9, x8                 neg     x9, x1
    add     x0, x0, x8                 and     x0, x8, x9

    (and similarly for ppc, sparc, mips, riscv, etc)

The only target that seems to do worse is… wasm32.

Onto actual measurements – the best way to evaluate snipets like these
is to use llvm-mca. Much like Aarch64 assembly would allow to suspect,
there isn’t any performance difference to be found. Both snippets
execute in same number of cycles for the CPUs I tried. On x86_64,
we get throughput improvement of >50%!

Fixes #75579
vext01 pushed a commit that referenced this pull request Nov 23, 2020
Before:

```
2:rustc INFO rustc_interface::passes Pre-codegen
2:rustcTy interner             total           ty lt ct all
2:rustc    Adt               :   1078 81.3%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Array             :      1  0.1%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Slice             :      1  0.1%,  0.0%   0.0%  0.0%  0.0%
2:rustc    RawPtr            :      2  0.2%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Ref               :      4  0.3%,  0.1%   0.1%  0.0%  0.0%
2:rustc    FnDef             :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    FnPtr             :     76  5.7%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Placeholder       :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Generator         :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    GeneratorWitness  :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Dynamic           :      3  0.2%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Closure           :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Tuple             :     13  1.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Bound             :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Param             :    146 11.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Infer             :      2  0.2%,  0.1%   0.0%  0.0%  0.0%
2:rustc    Projection        :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Opaque            :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Foreign           :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc                  total   1326         0.2%   0.1%  0.0%  0.0%
2:rustcInternalSubsts interner: #437
2:rustcRegion interner: #355
2:rustcStability interner: #1
2:rustcConst Stability interner: #0
2:rustcAllocation interner: #0
2:rustcLayout interner: #0
```

After:

```
 INFO rustc_interface::passes Post-codegen
Ty interner             total           ty lt ct all
    Adt               :   1078 81.3%,  0.0%   0.0%  0.0%  0.0%
    Array             :      1  0.1%,  0.0%   0.0%  0.0%  0.0%
    Slice             :      1  0.1%,  0.0%   0.0%  0.0%  0.0%
    RawPtr            :      2  0.2%,  0.0%   0.0%  0.0%  0.0%
    Ref               :      4  0.3%,  0.1%   0.1%  0.0%  0.0%
    FnDef             :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    FnPtr             :     76  5.7%,  0.0%   0.0%  0.0%  0.0%
    Placeholder       :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    Generator         :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    GeneratorWitness  :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    Dynamic           :      3  0.2%,  0.0%   0.0%  0.0%  0.0%
    Closure           :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    Tuple             :     13  1.0%,  0.0%   0.0%  0.0%  0.0%
    Bound             :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    Param             :    146 11.0%,  0.0%   0.0%  0.0%  0.0%
    Infer             :      2  0.2%,  0.1%   0.0%  0.0%  0.0%
    Projection        :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    Opaque            :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    Foreign           :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
                  total   1326         0.2%   0.1%  0.0%  0.0%
InternalSubsts interner: #437
Region interner: #355
Stability interner: #1
Const Stability interner: #0
Allocation interner: #0
Layout interner: #0
```
vext01 pushed a commit that referenced this pull request Nov 23, 2020
Don't print thread ids and names in `tracing` logs

Before:

```
2:rustc INFO rustc_interface::passes Pre-codegen
2:rustcTy interner             total           ty lt ct all
2:rustc    Adt               :   1078 81.3%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Array             :      1  0.1%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Slice             :      1  0.1%,  0.0%   0.0%  0.0%  0.0%
2:rustc    RawPtr            :      2  0.2%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Ref               :      4  0.3%,  0.1%   0.1%  0.0%  0.0%
2:rustc    FnDef             :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    FnPtr             :     76  5.7%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Placeholder       :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Generator         :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    GeneratorWitness  :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Dynamic           :      3  0.2%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Closure           :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Tuple             :     13  1.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Bound             :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Param             :    146 11.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Infer             :      2  0.2%,  0.1%   0.0%  0.0%  0.0%
2:rustc    Projection        :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Opaque            :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc    Foreign           :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
2:rustc                  total   1326         0.2%   0.1%  0.0%  0.0%
2:rustcInternalSubsts interner: #437
2:rustcRegion interner: #355
2:rustcStability interner: #1
2:rustcConst Stability interner: #0
2:rustcAllocation interner: #0
2:rustcLayout interner: #0
```

After:

```
 INFO rustc_interface::passes Post-codegen
Ty interner             total           ty lt ct all
    Adt               :   1078 81.3%,  0.0%   0.0%  0.0%  0.0%
    Array             :      1  0.1%,  0.0%   0.0%  0.0%  0.0%
    Slice             :      1  0.1%,  0.0%   0.0%  0.0%  0.0%
    RawPtr            :      2  0.2%,  0.0%   0.0%  0.0%  0.0%
    Ref               :      4  0.3%,  0.1%   0.1%  0.0%  0.0%
    FnDef             :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    FnPtr             :     76  5.7%,  0.0%   0.0%  0.0%  0.0%
    Placeholder       :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    Generator         :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    GeneratorWitness  :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    Dynamic           :      3  0.2%,  0.0%   0.0%  0.0%  0.0%
    Closure           :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    Tuple             :     13  1.0%,  0.0%   0.0%  0.0%  0.0%
    Bound             :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    Param             :    146 11.0%,  0.0%   0.0%  0.0%  0.0%
    Infer             :      2  0.2%,  0.1%   0.0%  0.0%  0.0%
    Projection        :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    Opaque            :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
    Foreign           :      0  0.0%,  0.0%   0.0%  0.0%  0.0%
                  total   1326         0.2%   0.1%  0.0%  0.0%
InternalSubsts interner: #437
Region interner: #355
Stability interner: #1
Const Stability interner: #0
Allocation interner: #0
Layout interner: #0
```

Closes rust-lang/rust#78931
r? ``@oli-obk``
vext01 pushed a commit that referenced this pull request Dec 2, 2020
```
Benchmark #1: ./raytracer_cg_clif_pre
  Time (mean ± σ):      9.553 s ±  0.129 s    [User: 9.543 s, System: 0.008 s]
  Range (min … max):    9.438 s …  9.837 s    10 runs

Benchmark softdevteam#2: ./raytracer_cg_clif_post
  Time (mean ± σ):      9.463 s ±  0.055 s    [User: 9.452 s, System: 0.008 s]
  Range (min … max):    9.387 s …  9.518 s    10 runs

Summary
  './raytracer_cg_clif_post' ran
    1.01 ± 0.01 times faster than './raytracer_cg_clif_pre'
```
vext01 pushed a commit that referenced this pull request Dec 2, 2020
Don't run `resolve_vars_if_possible` in `normalize_erasing_regions`

Neither `@eddyb` nor I could figure out what this was for. I changed it to `assert_eq!(normalized_value, infcx.resolve_vars_if_possible(&normalized_value));` and it passed the UI test suite.

<details><summary>

Outdated, I figured out the issue - `needs_infer()` needs to come _after_ erasing the lifetimes

</summary>

Strangely, if I change it to `assert!(!normalized_value.needs_infer())` it panics almost immediately:

```
query stack during panic:
#0 [normalize_generic_arg_after_erasing_regions] normalizing `<str::IsWhitespace as str::pattern::Pattern>::Searcher`
#1 [needs_drop_raw] computing whether `str::iter::Split<str::IsWhitespace>` needs drop
softdevteam#2 [mir_built] building MIR for `str::<impl str>::split_whitespace`
softdevteam#3 [unsafety_check_result] unsafety-checking `str::<impl str>::split_whitespace`
softdevteam#4 [mir_const] processing MIR for `str::<impl str>::split_whitespace`
softdevteam#5 [mir_promoted] processing `str::<impl str>::split_whitespace`
softdevteam#6 [mir_borrowck] borrow-checking `str::<impl str>::split_whitespace`
softdevteam#7 [analysis] running analysis passes on this crate
end of query stack
```

I'm not entirely sure what's going on - maybe the two disagree?

</details>

For context, this came up while reviewing rust-lang/rust#77467 (cc `@lcnr).`

Possibly this needs a crater run?

r? `@nikomatsakis`
cc `@matthewjasper`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants