Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A pretty big performance hit after a recent change (PR #51361) #52849

Closed
ljedrz opened this issue Jul 30, 2018 · 43 comments
Closed

A pretty big performance hit after a recent change (PR #51361) #52849

ljedrz opened this issue Jul 30, 2018 · 43 comments
Assignees
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. P-high High priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance

Comments

@ljedrz
Copy link
Contributor

ljedrz commented Jul 30, 2018

Browsing the perf page I noticed a pretty big regression:

coercions-check | avg: 58.3%? | min: 31.0%? | max: 106.3%?
coercions-opt | avg: 31.8%? | min: 22.9%? | max: 49.1%?
coercions-debug | avg: 26.9%? | min: 19.8%? | max: 45.9%?
tuple-stress-check | avg: 31.7% | min: 9.0% | max: 43.4%
tuple-stress-opt | avg: 15.9% | min: 13.1% | max: 18.0%
tuple-stress-debug | avg: 15.7% | min: 12.9% | max: 18.0%

I have tracked it down to one of these changes. This might be a necessary evil, but I'd rather make sure.

cc @RalfJung, @oli-obk

@ljedrz ljedrz changed the title A pretty big performance hit in a recent change A pretty big performance hit after a recent change Jul 30, 2018
@kennytm kennytm added I-compiletime Issue: Problems and improvements with respect to compile times. WG-compiler-performance Working group: Compiler Performance labels Jul 30, 2018
@RalfJung
Copy link
Member

Are these CTFE-heavy crates? Still it's somewhat strange as only the final result of CTFE is supposed to be checked, not every intermediate value. (That's something I wanted to add, actually, but maybe that should be a miri-only feature.)

I opened #52291 because I wasn't sure we are stressing CTFE enough in perf, but maybe we are?

@kennytm
Copy link
Member

kennytm commented Jul 30, 2018

tuple-stress is a 2.4 MB source file consisting of a static array of length 65535.

static BIG_ARRAY_OF_TUPLES: [(i32, (f64, f64, f64)); 0xffff] = [
    (641161, (2.988, 10.544, 29.598)),
    ...
    (209150, (5.216, 15.229, 23.389)),
];

fn main() {
    assert!(BIG_ARRAY_OF_TUPLES.len() == 0xffff);
}

coercions is a similar 1.8 MB file again with a 65535-length array.

pub static MAPPING: [&'static str; 0xffff] = [
    "\u{0}",
    ...
    "",
];

fn main() { }

@oli-obk
Copy link
Contributor

oli-obk commented Jul 30, 2018

Yea. A regression there is pretty much expected. But we can do better. Right now we'd keep reopening the allocation at least 2^20 times in the tuple-stress case.

We should move a bunch of methods from Memory to Allocation and just keep working the Allocation.

Candidates are the methods reading integers, pointers or primvals in general

@RalfJung
Copy link
Member

Right now we'd keep reopening the allocation at least 2^20 times in the tuple-stress case.

That's just get_mut though, right?

@oli-obk
Copy link
Contributor

oli-obk commented Jul 30, 2018

get, and only transitively. But in order to not keep hitting it we'd need to move all the other methods that ever end up calling get to Allocation and a bunch of other methods, too, e.g. check_bounds or check_align, which are called almost every time get is

@RalfJung
Copy link
Member

a 2.4 MB source file consisting of a static array of length 65535

Ah, so it's just a big constant.
Still, 64k is not such a big number for a machine, I am surprised it makes such a big impact.

@RalfJung
Copy link
Member

RalfJung commented Jul 30, 2018

This looks like the same regression: The following code regressed from the previous nightly (2018-07-24) to today's (2018-07-30):

#![feature(const_fn, const_let)]

// Try to make CTFE actually do a lot of computation, without producing a big result.
// And without support for loops.

macro_rules! const_repeat {
    // Recursive case: Take a 16
    ([16 $($n: tt)*] $e: expr, $T: ty) => {{
        const fn e() -> $T { const_repeat!([$($n)*] $e, $T) }
        e(); e(); e(); e();
        e(); e(); e(); e();
        e(); e(); e(); e();
        e(); e(); e(); e()
    }};
    // Recursive case: Take a 2
    ([2 $($n: tt)*] $e: expr, $T: ty) => {{
        const fn e() -> $T { const_repeat!([$($n)*] $e, $T) }
        e(); e()
    }};
    // Base case
    ([] $e: expr, $T: ty) => {$e};

    // 16^4 * 2^2 = 2^(4*4+2) = 2^18 = 256k repetitions
    ($e: expr, $T: ty) => (const_repeat!([16 16 16 16 2 2] $e, $T));
}
macro_rules! expensive_static {
    ($name: ident : $T: ty = $e : expr) =>
        (pub static $name : $T = const_repeat!($e, $T);)
}

pub trait Trait: Sync {}
impl Trait for u32 {}
const fn nop<T>(t: T) -> T { t }
const fn inc(i: i32) -> i32 { i + 1 }

expensive_static!(RELOCATIONS : &'static str = "hello");
expensive_static!(FIELDS: &'static i32 = &("bar", 42, "foo", 3.14).1);
expensive_static!(CHECKED_INDEX: u8 = b"foomp"[3]);
expensive_static!(UNSIZING: &'static [u8] = b"foo");
expensive_static!(UNSIZE_TRAIT: &'static Trait = &42u32);
expensive_static!(CHAIN: usize = 42i32 as u8 as u64 as i8 as isize as usize);
expensive_static!(OPS: i32 = ((((10 >> 1) + 3) * 7) / 2 - 12) << 4);
expensive_static!(FORCE_ALLOC: i32 = *****(&&&&&5));
expensive_static!(CONST_FN_BASELINE: i32 = nop(42));
expensive_static!(CONST_FN_SIMPLE: i32 = inc(42));

Old:13s
New: 29s

With 2^20 runs, it doesn't even seem to terminate at all now (5min and still going, likely in FIELDS), whereas it is 50s previously. This is strange because behavior otherwise seemed linear.

That's definitely fishy, because the final constant (that is supposed to be sanity checked only) is pretty small.

@RalfJung
Copy link
Member

Is it possible that it is now doing something for every allocation, even those that are not reachable from the final constant? That would explain why FIELDS specifically got slower.

@RalfJung
Copy link
Member

Interestingly, the "it diverges" part of the regression goes away when I change the macro to just repeat $e 16 times for the innermost part -- instead of wrapping it in a const fn and calling that 16 times.

Function calls seem to have regressed a lot.

Also, according to perf, all that time it diverges is spent hashing. Might be related to #51702? However, the "good" nightly 2018-07-24 already contains that PR as well.

@nnethercote
Copy link
Contributor

Presumably the "Sanity-check all constants" commit is the one responsible, given that coercions and tuple-stress both contains enormous constants.

@kennytm kennytm added the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Jul 30, 2018
@kennytm kennytm changed the title A pretty big performance hit after a recent change A pretty big performance hit after a recent change (PR #51361) Jul 30, 2018
@nnethercote
Copy link
Contributor

html5ever also took a smaller (but still substantial) hit:

html5ever-check
        avg: 1.8%       min: -2.0%      max: 5.1%
html5ever-debug
        avg: 2.0%       min: 0.4%       max: 3.6%

@nnethercote
Copy link
Contributor

Here's the top part of a Cachegrind diff, showing where the increase came from. The numbers are instruction counts. Lots of stuff in librustc_mir.

--------------------------------------------------------------------------------
           Ir
--------------------------------------------------------------------------------
4,121,321,316  PROGRAM TOTALS

--------------------------------------------------------------------------------
         Ir  file:function
--------------------------------------------------------------------------------
179,820,043  /home/njn/moz/rustN/src/librustc/ty/query/plumbing.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::get_query
100,660,499  /home/njn/moz/rustN/src/librustc/mir/interpret/mod.rs:rustc::mir::interpret::UndefMask::set_range
 95,548,806  /home/njn/moz/rustN/src/librustc/mir/interpret/mod.rs:rustc::mir::interpret::Pointer::offset
 94,250,157  /home/njn/moz/rustN/src/libcore/cell.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::get_query
 82,179,434  /home/njn/moz/rustN/src/librustc_mir/interpret/memory.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::get
 80,869,155  /home/njn/moz/rustN/src/librustc_mir/interpret/memory.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::check_align
 75,363,916  /home/njn/moz/rustN/src/librustc/ty/layout.rs:<rustc::ty::layout::LayoutCx<'tcx, rustc::ty::query::TyCtxtAt<'a, 'tcx, 'tcx>> as rustc_target::abi::LayoutOf>::layout_of
 73,870,463  /home/njn/moz/rustN/src/libstd/collections/hash/table.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::get_query
 71,369,937  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:rustc_mir::interpret::step::<impl rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::step
 67,775,068  /home/njn/moz/rustN/src/libstd/collections/hash/map.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::get_query
 61,209,022  /home/njn/moz/rustN/src/librustc/mir/interpret/mod.rs:rustc::mir::interpret::UndefMask::grow
 60,553,924  /home/njn/moz/rustN/src/libstd/collections/hash/table.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::get
 60,487,853  /home/njn/moz/rustN/src/librustc_mir/interpret/place.rs:rustc_mir::interpret::place::<impl rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::place_field
 59,530,793  /build/glibc-OTsEL5/glibc-2.27/malloc/malloc.c:free
 57,801,198  /home/njn/moz/rustN/src/librustc_mir/interpret/memory.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::relocations
 56,127,203  ???:???
 51,661,186  /home/njn/moz/rustN/src/libcore/ptr.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::get_query
 49,020,227  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:<rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::validate_ptr_target
 48,627,289  /home/njn/moz/rustN/src/librustc/mir/interpret/mod.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::copy_repeatedly
 48,103,009  /home/njn/moz/rustN/src/libcore/iter/range.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::copy_repeatedly
 42,359,299  /build/glibc-OTsEL5/glibc-2.27/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
 40,044,283  /home/njn/moz/rustN/src/librustc/mir/tcx.rs:rustc::mir::tcx::<impl rustc::mir::Place<'tcx>>::ty
 39,121,766  /build/glibc-OTsEL5/glibc-2.27/malloc/malloc.c:_int_malloc
 39,091,742  /build/glibc-OTsEL5/glibc-2.27/malloc/malloc.c:realloc
 38,927,504  /home/njn/moz/rustN/src/libstd/collections/hash/map.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::get
 38,609,260  /home/njn/moz/rustN/src/libcore/fmt/mod.rs:core::fmt::write
 37,747,369  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:<rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::write_value
 36,567,583  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:<rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::write_value_to_ptr
 35,716,347  /home/njn/moz/rustN/src/librustc_mir/interpret/place.rs:rustc_mir::interpret::place::<impl rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::eval_place
 35,391,276  /home/njn/moz/rustN/src/librustc_mir/borrow_check/nll/type_check/relate_tys.rs:<rustc_mir::borrow_check::nll::type_check::relate_tys::TypeRelating<'cx, 'bccx, 'gcx, 'tcx> as rustc::ty:
:relate::TypeRelation<'cx, 'gcx, 'tcx>>::tys
 34,601,512  /home/njn/moz/rustN/src/librustc_mir/interpret/memory.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::get_bytes_mut
-33,360,936  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:<rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::eval_rvalue_into_place
 33,098,069  /home/njn/moz/rustN/src/librustc/ty/relate.rs:<rustc_mir::borrow_check::nll::type_check::relate_tys::TypeRelating<'cx, 'bccx, 'gcx, 'tcx> as rustc::ty::relate::TypeRelation<'cx, 'gcx,
'tcx>>::tys
 32,710,110  /home/njn/moz/rustN/src/librustc/ty/mod.rs:rustc::ty::ParamEnv::and
 32,372,938  /home/njn/moz/rustN/src/librustc_mir/interpret/memory.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::write_scalar
 31,134,882  /home/njn/moz/rustN/src/librustc/ty/layout.rs:<rustc::ty::layout::LayoutCx<'tcx, rustc::ty::context::TyCtxt<'a, 'tcx, 'tcx>>>::record_layout_for_printing
 31,128,574  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:<&'a rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M> as rustc_target::abi::LayoutOf>::layout_of
 29,168,137  /home/njn/moz/rustN/src/libcore/num/mod.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::get_query
 28,114,083  /home/njn/moz/rustN/src/librustc_mir/interpret/memory.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::read_scalar
 27,851,882  /home/njn/moz/rustN/src/librustc/ty/layout.rs:<&'a rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M> as rustc_target::abi::LayoutOf>::layout_of
 27,249,214  /build/glibc-OTsEL5/glibc-2.27/malloc/malloc.c:malloc
 27,065,779  /home/njn/moz/rustN/src/librustc_mir/interpret/step.rs:rustc_mir::interpret::step::<impl rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::step
 26,803,051  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:<rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::force_allocation

@RalfJung
Copy link
Member

RalfJung commented Jul 31, 2018

I verified (using cargo-bisect-rustc) that the regression I reported above is also caused by the same PR:

regression in 70cac59

@RalfJung
Copy link
Member

At least for my benchmark, the regression is entirely caused by https://github.com/rust-lang/rust/pull/51361/files#diff-1651ba5b47f9ffa648d59265d3f94becR1671: This leads to statics being CTFE'd at all in check, and (seemingly) to them being CTFE'd twice in build.

@RalfJung
Copy link
Member

UnusedBrokenConstVisitor does not actually make sure to only visit unused items. Also I am quite surprised to see that this lint is the only place that runs the check?

For example, with generic statics (which promoteds can create) or generic consts (which are proposed, AFAIK), this check couldn't do anything. There, we'd have to check on instantiation, right?

@pietroalbini pietroalbini added regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Jul 31, 2018
@pietroalbini pietroalbini added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 31, 2018
bors added a commit that referenced this issue Aug 1, 2018
check_const: use the same ParamEnv as codegen for statics

Fixes at least part of #52849 (my CTFE-stress benchmark). Note that I do not know what I am doing here, this is just based on hints from @oli-obk.

r? @oli-obk
@pnkfelix
Copy link
Member

pnkfelix commented Aug 2, 2018

visited for triage. P-high. Haven't decided whom if anyone to assign this to; hopefully that will be addressed soonish.

@brunocodutra
Copy link
Contributor

I'm mostly finished by the way, the only missing piece is to write a convenience macro to avoid code duplication, but I've been lacking the time to work on it. Hopefully by the end of the week I'll have it ready for review.

@nikomatsakis
Copy link
Contributor

According to @oli-obk, @RalfJung has turned this from a perf regression into a perf win. Way to go!

@oli-obk
Copy link
Contributor

oli-obk commented Aug 23, 2018

For anyone interested, the relevant PR is #53424

@RalfJung
Copy link
Member

Uh, I think it is still slower than without the sanity check... but that is expected.

However, there is also a known actual issue still open: #52849 (comment). I am going to reopen this issue until we got that solved. I think we should stop hashing memory, also see #52626 (comment).

@RalfJung RalfJung reopened this Aug 23, 2018
@pietroalbini
Copy link
Member

PR for fixing this in beta is #53425.

@RalfJung
Copy link
Member

That's not correct. That PR fixes a functional bug, it does not help with perf.

@pietroalbini
Copy link
Member

Uh, ok. So, since #53424 is not going to be backported, will the perf regression be fixed in another PR or should I mark this as "won't fix"?

@RalfJung
Copy link
Member

The full perf regression is not fixed in master yet but I plan to do so -- I am essentially waiting for #52626 because that touches the same code (and actually that one might already do it). That will still be slower than it used to but that's expected, it is doing more work now.

As for beta -- @oli-obk any plans? We could (should, I think) backport #52925, that would help. Beyond that, the only measures I can think of is to reduce what validation does -- make it stop at pointers, or disable it entirely.

@pnkfelix
Copy link
Member

visiting for triage. Assigning to @oli-obk to take the lead on the backport.

bors added a commit that referenced this issue Aug 31, 2018
[beta] const eval perf regression fix

backports #52925 (for #52849)

and additionally skips hashing on every evaluation step
@oli-obk oli-obk assigned RalfJung and unassigned oli-obk Sep 6, 2018
@oli-obk
Copy link
Contributor

oli-obk commented Sep 6, 2018

visited for triage: I think @RalfJung has one more fix in this direction planned?

@RalfJung
Copy link
Member

RalfJung commented Sep 6, 2018

Do I? Not that I know of.^^ I consider this done with #52626.

@oli-obk
Copy link
Contributor

oli-obk commented Sep 6, 2018

Awesome! I'll close this when #52626 is merged

@oli-obk oli-obk added regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. and removed regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Sep 6, 2018
@oli-obk oli-obk assigned oli-obk and unassigned RalfJung Sep 6, 2018
bors added a commit that referenced this issue Sep 6, 2018
Fix issue #52475: Make loop detector only consider reachable memory

As [suggested](#51702 (comment)) by @oli-obk `alloc_id`s should be ignored by traversing all `Allocation`s in interpreter memory at a given moment in time, beginning by `ByRef` locals in the stack.

- [x] Generalize the implementation of `Hash` for `EvalSnapshot` to traverse `Allocation`s
- [x] Generalize the implementation of `PartialEq` for `EvalSnapshot` to traverse `Allocation`s
- [x] Commit regression tests

Fixes #52626
Fixes #52849
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. P-high High priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance
Projects
None yet
Development

No branches or pull requests

9 participants