A pretty big performance hit after a recent change (PR #51361) #52849

ljedrz · 2018-07-30T07:30:56Z

Browsing the perf page I noticed a pretty big regression:

coercions-check | avg: 58.3%? | min: 31.0%? | max: 106.3%?
coercions-opt | avg: 31.8%? | min: 22.9%? | max: 49.1%?
coercions-debug | avg: 26.9%? | min: 19.8%? | max: 45.9%?
tuple-stress-check | avg: 31.7% | min: 9.0% | max: 43.4%
tuple-stress-opt | avg: 15.9% | min: 13.1% | max: 18.0%
tuple-stress-debug | avg: 15.7% | min: 12.9% | max: 18.0%

I have tracked it down to one of these changes. This might be a necessary evil, but I'd rather make sure.

cc @RalfJung, @oli-obk

The text was updated successfully, but these errors were encountered:

RalfJung · 2018-07-30T09:03:46Z

Are these CTFE-heavy crates? Still it's somewhat strange as only the final result of CTFE is supposed to be checked, not every intermediate value. (That's something I wanted to add, actually, but maybe that should be a miri-only feature.)

I opened #52291 because I wasn't sure we are stressing CTFE enough in perf, but maybe we are?

kennytm · 2018-07-30T09:23:40Z

tuple-stress is a 2.4 MB source file consisting of a static array of length 65535.

static BIG_ARRAY_OF_TUPLES: [(i32, (f64, f64, f64)); 0xffff] = [
    (641161, (2.988, 10.544, 29.598)),
    ...
    (209150, (5.216, 15.229, 23.389)),
];

fn main() {
    assert!(BIG_ARRAY_OF_TUPLES.len() == 0xffff);
}

coercions is a similar 1.8 MB file again with a 65535-length array.

pub static MAPPING: [&'static str; 0xffff] = [
    "\u{0}",
    ...
    "",
];

fn main() { }

oli-obk · 2018-07-30T09:43:36Z

Yea. A regression there is pretty much expected. But we can do better. Right now we'd keep reopening the allocation at least 2^20 times in the tuple-stress case.

We should move a bunch of methods from Memory to Allocation and just keep working the Allocation.

Candidates are the methods reading integers, pointers or primvals in general

RalfJung · 2018-07-30T09:57:48Z

Right now we'd keep reopening the allocation at least 2^20 times in the tuple-stress case.

That's just get_mut though, right?

oli-obk · 2018-07-30T10:29:26Z

get, and only transitively. But in order to not keep hitting it we'd need to move all the other methods that ever end up calling get to Allocation and a bunch of other methods, too, e.g. check_bounds or check_align, which are called almost every time get is

RalfJung · 2018-07-30T13:16:20Z

a 2.4 MB source file consisting of a static array of length 65535

Ah, so it's just a big constant.
Still, 64k is not such a big number for a machine, I am surprised it makes such a big impact.

RalfJung · 2018-07-30T16:30:49Z

This looks like the same regression: The following code regressed from the previous nightly (2018-07-24) to today's (2018-07-30):

#![feature(const_fn, const_let)]

// Try to make CTFE actually do a lot of computation, without producing a big result.
// And without support for loops.

macro_rules! const_repeat {
    // Recursive case: Take a 16
    ([16 $($n: tt)*] $e: expr, $T: ty) => {{
        const fn e() -> $T { const_repeat!([$($n)*] $e, $T) }
        e(); e(); e(); e();
        e(); e(); e(); e();
        e(); e(); e(); e();
        e(); e(); e(); e()
    }};
    // Recursive case: Take a 2
    ([2 $($n: tt)*] $e: expr, $T: ty) => {{
        const fn e() -> $T { const_repeat!([$($n)*] $e, $T) }
        e(); e()
    }};
    // Base case
    ([] $e: expr, $T: ty) => {$e};

    // 16^4 * 2^2 = 2^(4*4+2) = 2^18 = 256k repetitions
    ($e: expr, $T: ty) => (const_repeat!([16 16 16 16 2 2] $e, $T));
}
macro_rules! expensive_static {
    ($name: ident : $T: ty = $e : expr) =>
        (pub static $name : $T = const_repeat!($e, $T);)
}

pub trait Trait: Sync {}
impl Trait for u32 {}
const fn nop<T>(t: T) -> T { t }
const fn inc(i: i32) -> i32 { i + 1 }

expensive_static!(RELOCATIONS : &'static str = "hello");
expensive_static!(FIELDS: &'static i32 = &("bar", 42, "foo", 3.14).1);
expensive_static!(CHECKED_INDEX: u8 = b"foomp"[3]);
expensive_static!(UNSIZING: &'static [u8] = b"foo");
expensive_static!(UNSIZE_TRAIT: &'static Trait = &42u32);
expensive_static!(CHAIN: usize = 42i32 as u8 as u64 as i8 as isize as usize);
expensive_static!(OPS: i32 = ((((10 >> 1) + 3) * 7) / 2 - 12) << 4);
expensive_static!(FORCE_ALLOC: i32 = *****(&&&&&5));
expensive_static!(CONST_FN_BASELINE: i32 = nop(42));
expensive_static!(CONST_FN_SIMPLE: i32 = inc(42));

Old:13s
New: 29s

With 2^20 runs, it doesn't even seem to terminate at all now (5min and still going, likely in FIELDS), whereas it is 50s previously. This is strange because behavior otherwise seemed linear.

That's definitely fishy, because the final constant (that is supposed to be sanity checked only) is pretty small.

RalfJung · 2018-07-30T16:39:00Z

Is it possible that it is now doing something for every allocation, even those that are not reachable from the final constant? That would explain why FIELDS specifically got slower.

RalfJung · 2018-07-30T17:50:11Z

Interestingly, the "it diverges" part of the regression goes away when I change the macro to just repeat $e 16 times for the innermost part -- instead of wrapping it in a const fn and calling that 16 times.

Function calls seem to have regressed a lot.

Also, according to perf, all that time it diverges is spent hashing. Might be related to #51702? However, the "good" nightly 2018-07-24 already contains that PR as well.

nnethercote · 2018-07-30T22:36:11Z

Presumably the "Sanity-check all constants" commit is the one responsible, given that coercions and tuple-stress both contains enormous constants.

nnethercote · 2018-07-30T23:27:44Z

html5ever also took a smaller (but still substantial) hit:

html5ever-check
        avg: 1.8%       min: -2.0%      max: 5.1%
html5ever-debug
        avg: 2.0%       min: 0.4%       max: 3.6%

nnethercote · 2018-07-30T23:36:22Z

Here's the top part of a Cachegrind diff, showing where the increase came from. The numbers are instruction counts. Lots of stuff in librustc_mir.

--------------------------------------------------------------------------------
           Ir
--------------------------------------------------------------------------------
4,121,321,316  PROGRAM TOTALS

--------------------------------------------------------------------------------
         Ir  file:function
--------------------------------------------------------------------------------
179,820,043  /home/njn/moz/rustN/src/librustc/ty/query/plumbing.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::get_query
100,660,499  /home/njn/moz/rustN/src/librustc/mir/interpret/mod.rs:rustc::mir::interpret::UndefMask::set_range
 95,548,806  /home/njn/moz/rustN/src/librustc/mir/interpret/mod.rs:rustc::mir::interpret::Pointer::offset
 94,250,157  /home/njn/moz/rustN/src/libcore/cell.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::get_query
 82,179,434  /home/njn/moz/rustN/src/librustc_mir/interpret/memory.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::get
 80,869,155  /home/njn/moz/rustN/src/librustc_mir/interpret/memory.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::check_align
 75,363,916  /home/njn/moz/rustN/src/librustc/ty/layout.rs:<rustc::ty::layout::LayoutCx<'tcx, rustc::ty::query::TyCtxtAt<'a, 'tcx, 'tcx>> as rustc_target::abi::LayoutOf>::layout_of
 73,870,463  /home/njn/moz/rustN/src/libstd/collections/hash/table.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::get_query
 71,369,937  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:rustc_mir::interpret::step::<impl rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::step
 67,775,068  /home/njn/moz/rustN/src/libstd/collections/hash/map.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::get_query
 61,209,022  /home/njn/moz/rustN/src/librustc/mir/interpret/mod.rs:rustc::mir::interpret::UndefMask::grow
 60,553,924  /home/njn/moz/rustN/src/libstd/collections/hash/table.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::get
 60,487,853  /home/njn/moz/rustN/src/librustc_mir/interpret/place.rs:rustc_mir::interpret::place::<impl rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::place_field
 59,530,793  /build/glibc-OTsEL5/glibc-2.27/malloc/malloc.c:free
 57,801,198  /home/njn/moz/rustN/src/librustc_mir/interpret/memory.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::relocations
 56,127,203  ???:???
 51,661,186  /home/njn/moz/rustN/src/libcore/ptr.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::get_query
 49,020,227  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:<rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::validate_ptr_target
 48,627,289  /home/njn/moz/rustN/src/librustc/mir/interpret/mod.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::copy_repeatedly
 48,103,009  /home/njn/moz/rustN/src/libcore/iter/range.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::copy_repeatedly
 42,359,299  /build/glibc-OTsEL5/glibc-2.27/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
 40,044,283  /home/njn/moz/rustN/src/librustc/mir/tcx.rs:rustc::mir::tcx::<impl rustc::mir::Place<'tcx>>::ty
 39,121,766  /build/glibc-OTsEL5/glibc-2.27/malloc/malloc.c:_int_malloc
 39,091,742  /build/glibc-OTsEL5/glibc-2.27/malloc/malloc.c:realloc
 38,927,504  /home/njn/moz/rustN/src/libstd/collections/hash/map.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::get
 38,609,260  /home/njn/moz/rustN/src/libcore/fmt/mod.rs:core::fmt::write
 37,747,369  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:<rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::write_value
 36,567,583  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:<rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::write_value_to_ptr
 35,716,347  /home/njn/moz/rustN/src/librustc_mir/interpret/place.rs:rustc_mir::interpret::place::<impl rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::eval_place
 35,391,276  /home/njn/moz/rustN/src/librustc_mir/borrow_check/nll/type_check/relate_tys.rs:<rustc_mir::borrow_check::nll::type_check::relate_tys::TypeRelating<'cx, 'bccx, 'gcx, 'tcx> as rustc::ty:
:relate::TypeRelation<'cx, 'gcx, 'tcx>>::tys
 34,601,512  /home/njn/moz/rustN/src/librustc_mir/interpret/memory.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::get_bytes_mut
-33,360,936  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:<rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::eval_rvalue_into_place
 33,098,069  /home/njn/moz/rustN/src/librustc/ty/relate.rs:<rustc_mir::borrow_check::nll::type_check::relate_tys::TypeRelating<'cx, 'bccx, 'gcx, 'tcx> as rustc::ty::relate::TypeRelation<'cx, 'gcx,
'tcx>>::tys
 32,710,110  /home/njn/moz/rustN/src/librustc/ty/mod.rs:rustc::ty::ParamEnv::and
 32,372,938  /home/njn/moz/rustN/src/librustc_mir/interpret/memory.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::write_scalar
 31,134,882  /home/njn/moz/rustN/src/librustc/ty/layout.rs:<rustc::ty::layout::LayoutCx<'tcx, rustc::ty::context::TyCtxt<'a, 'tcx, 'tcx>>>::record_layout_for_printing
 31,128,574  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:<&'a rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M> as rustc_target::abi::LayoutOf>::layout_of
 29,168,137  /home/njn/moz/rustN/src/libcore/num/mod.rs:rustc::ty::query::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::get_query
 28,114,083  /home/njn/moz/rustN/src/librustc_mir/interpret/memory.rs:<rustc_mir::interpret::memory::Memory<'a, 'mir, 'tcx, M>>::read_scalar
 27,851,882  /home/njn/moz/rustN/src/librustc/ty/layout.rs:<&'a rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M> as rustc_target::abi::LayoutOf>::layout_of
 27,249,214  /build/glibc-OTsEL5/glibc-2.27/malloc/malloc.c:malloc
 27,065,779  /home/njn/moz/rustN/src/librustc_mir/interpret/step.rs:rustc_mir::interpret::step::<impl rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::step
 26,803,051  /home/njn/moz/rustN/src/librustc_mir/interpret/eval_context.rs:<rustc_mir::interpret::eval_context::EvalContext<'a, 'mir, 'tcx, M>>::force_allocation

RalfJung · 2018-07-31T06:56:54Z

I verified (using cargo-bisect-rustc) that the regression I reported above is also caused by the same PR:

regression in 70cac59

RalfJung · 2018-07-31T11:55:50Z

At least for my benchmark, the regression is entirely caused by https://github.com/rust-lang/rust/pull/51361/files#diff-1651ba5b47f9ffa648d59265d3f94becR1671: This leads to statics being CTFE'd at all in check, and (seemingly) to them being CTFE'd twice in build.

RalfJung · 2018-07-31T11:59:05Z

UnusedBrokenConstVisitor does not actually make sure to only visit unused items. Also I am quite surprised to see that this lint is the only place that runs the check?

For example, with generic statics (which promoteds can create) or generic consts (which are proposed, AFAIK), this check couldn't do anything. There, we'd have to check on instantiation, right?

@oli-obk

check_const: use the same ParamEnv as codegen for statics Fixes at least part of #52849 (my CTFE-stress benchmark). Note that I do not know what I am doing here, this is just based on hints from @oli-obk. r? @oli-obk

pnkfelix · 2018-08-02T14:40:14Z

visited for triage. P-high. Haven't decided whom if anyone to assign this to; hopefully that will be addressed soonish.

brunocodutra · 2018-08-21T07:27:54Z

I'm mostly finished by the way, the only missing piece is to write a convenience macro to avoid code duplication, but I've been lacking the time to work on it. Hopefully by the end of the week I'll have it ready for review.

nikomatsakis · 2018-08-23T14:16:09Z

According to @oli-obk, @RalfJung has turned this from a perf regression into a perf win. Way to go!

oli-obk · 2018-08-23T14:18:54Z

For anyone interested, the relevant PR is #53424

RalfJung · 2018-08-23T15:14:34Z

Uh, I think it is still slower than without the sanity check... but that is expected.

However, there is also a known actual issue still open: #52849 (comment). I am going to reopen this issue until we got that solved. I think we should stop hashing memory, also see #52626 (comment).

pietroalbini · 2018-08-25T12:41:09Z

PR for fixing this in beta is #53425.

RalfJung · 2018-08-25T12:45:32Z

That's not correct. That PR fixes a functional bug, it does not help with perf.

pietroalbini · 2018-08-25T12:49:11Z

Uh, ok. So, since #53424 is not going to be backported, will the perf regression be fixed in another PR or should I mark this as "won't fix"?

RalfJung · 2018-08-25T12:52:58Z

The full perf regression is not fixed in master yet but I plan to do so -- I am essentially waiting for #52626 because that touches the same code (and actually that one might already do it). That will still be slower than it used to but that's expected, it is doing more work now.

As for beta -- @oli-obk any plans? We could (should, I think) backport #52925, that would help. Beyond that, the only measures I can think of is to reduce what validation does -- make it stop at pointers, or disable it entirely.

pnkfelix · 2018-08-30T14:29:26Z

visiting for triage. Assigning to @oli-obk to take the lead on the backport.

[beta] const eval perf regression fix backports #52925 (for #52849) and additionally skips hashing on every evaluation step

oli-obk · 2018-09-06T14:29:08Z

visited for triage: I think @RalfJung has one more fix in this direction planned?

RalfJung · 2018-09-06T14:46:32Z

Do I? Not that I know of.^^ I consider this done with #52626.

oli-obk · 2018-09-06T14:51:46Z

Awesome! I'll close this when #52626 is merged

@oli-obk

Fix issue #52475: Make loop detector only consider reachable memory As [suggested](#51702 (comment)) by @oli-obk `alloc_id`s should be ignored by traversing all `Allocation`s in interpreter memory at a given moment in time, beginning by `ByRef` locals in the stack. - [x] Generalize the implementation of `Hash` for `EvalSnapshot` to traverse `Allocation`s - [x] Generalize the implementation of `PartialEq` for `EvalSnapshot` to traverse `Allocation`s - [x] Commit regression tests Fixes #52626 Fixes #52849

ljedrz changed the title ~~A pretty big performance hit in a recent change~~ A pretty big performance hit after a recent change Jul 30, 2018

kennytm added I-compiletime Issue: Problems and improvements with respect to compile times. WG-compiler-performance Working group: Compiler Performance labels Jul 30, 2018

kennytm mentioned this issue Jul 30, 2018

Do a basic sanity check for all constant values #51361

Merged

RalfJung mentioned this issue Jul 30, 2018

Add keccak crate rust-lang/rustc-perf#265

Merged

kennytm added the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Jul 30, 2018

kennytm changed the title ~~A pretty big performance hit after a recent change~~ A pretty big performance hit after a recent change (PR #51361) Jul 30, 2018

pietroalbini added regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Jul 31, 2018

pietroalbini added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 31, 2018

RalfJung mentioned this issue Jul 31, 2018

check_const: use the same ParamEnv as codegen for statics #52925

Merged

RalfJung mentioned this issue Aug 1, 2018

CTFE error shown thrice #52922

Closed

This was referenced Aug 21, 2018

Fix issue #52475: Make loop detector only consider reachable memory #52626

Merged

CTFE engine refactor #53424

Merged

nikomatsakis closed this as completed Aug 23, 2018

RalfJung reopened this Aug 23, 2018

pnkfelix assigned oli-obk Aug 30, 2018

oli-obk mentioned this issue Aug 30, 2018

[beta] const eval perf regression fix #53825

Merged

bors added a commit that referenced this issue Aug 31, 2018

Auto merge of #53825 - oli-obk:beta_backport, r=RalfJung

1daa912

[beta] const eval perf regression fix backports #52925 (for #52849) and additionally skips hashing on every evaluation step

oli-obk assigned RalfJung and unassigned oli-obk Sep 6, 2018

oli-obk added regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. and removed regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Sep 6, 2018

oli-obk assigned oli-obk and unassigned RalfJung Sep 6, 2018

bors closed this as completed in #52626 Sep 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A pretty big performance hit after a recent change (PR #51361) #52849

A pretty big performance hit after a recent change (PR #51361) #52849

ljedrz commented Jul 30, 2018 •

edited

Loading

RalfJung commented Jul 30, 2018

kennytm commented Jul 30, 2018 •

edited

Loading

oli-obk commented Jul 30, 2018

RalfJung commented Jul 30, 2018

oli-obk commented Jul 30, 2018

RalfJung commented Jul 30, 2018

RalfJung commented Jul 30, 2018 •

edited

Loading

RalfJung commented Jul 30, 2018

RalfJung commented Jul 30, 2018

nnethercote commented Jul 30, 2018

nnethercote commented Jul 30, 2018

nnethercote commented Jul 30, 2018

RalfJung commented Jul 31, 2018 •

edited

Loading

RalfJung commented Jul 31, 2018

RalfJung commented Jul 31, 2018

pnkfelix commented Aug 2, 2018

brunocodutra commented Aug 21, 2018

nikomatsakis commented Aug 23, 2018

oli-obk commented Aug 23, 2018

RalfJung commented Aug 23, 2018

pietroalbini commented Aug 25, 2018

RalfJung commented Aug 25, 2018

pietroalbini commented Aug 25, 2018

RalfJung commented Aug 25, 2018

pnkfelix commented Aug 30, 2018

oli-obk commented Sep 6, 2018

RalfJung commented Sep 6, 2018

oli-obk commented Sep 6, 2018

A pretty big performance hit after a recent change (PR #51361) #52849

A pretty big performance hit after a recent change (PR #51361) #52849

Comments

ljedrz commented Jul 30, 2018 • edited Loading

RalfJung commented Jul 30, 2018

kennytm commented Jul 30, 2018 • edited Loading

oli-obk commented Jul 30, 2018

RalfJung commented Jul 30, 2018

oli-obk commented Jul 30, 2018

RalfJung commented Jul 30, 2018

RalfJung commented Jul 30, 2018 • edited Loading

RalfJung commented Jul 30, 2018

RalfJung commented Jul 30, 2018

nnethercote commented Jul 30, 2018

nnethercote commented Jul 30, 2018

nnethercote commented Jul 30, 2018

RalfJung commented Jul 31, 2018 • edited Loading

RalfJung commented Jul 31, 2018

RalfJung commented Jul 31, 2018

pnkfelix commented Aug 2, 2018

brunocodutra commented Aug 21, 2018

nikomatsakis commented Aug 23, 2018

oli-obk commented Aug 23, 2018

RalfJung commented Aug 23, 2018

pietroalbini commented Aug 25, 2018

RalfJung commented Aug 25, 2018

pietroalbini commented Aug 25, 2018

RalfJung commented Aug 25, 2018

pnkfelix commented Aug 30, 2018

oli-obk commented Sep 6, 2018

RalfJung commented Sep 6, 2018

oli-obk commented Sep 6, 2018

ljedrz commented Jul 30, 2018 •

edited

Loading

kennytm commented Jul 30, 2018 •

edited

Loading

RalfJung commented Jul 30, 2018 •

edited

Loading

RalfJung commented Jul 31, 2018 •

edited

Loading