Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NLL performance tracking issue #47879

Closed
2 of 6 tasks
nikomatsakis opened this issue Jan 30, 2018 · 7 comments
Closed
2 of 6 tasks

NLL performance tracking issue #47879

nikomatsakis opened this issue Jan 30, 2018 · 7 comments
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-compiletime Issue: Problems and improvements with respect to compile times. NLL-performant Working towards the "performance is good" goal T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jan 30, 2018

This is a tracking issue dedicated to discussing ideas and techniques for improving compile-time with the NLL feature. It's meant to house as a repository of measurements, benchmarks, etc for the time being to help coordinate efforts.

Benchmarks and timing runs

We can view performance results on perf.rust-lang.org now. Just look for the "NLL" runs. They can be compared against "clean" runs -- the delta is the 'extra work' we are doing when NLL is enabled (note that NLL still runs the old region analysis and borrow check, so it does strictly more work).

Ideas for improvement or measurement

Quick pointers into the source

The main source of time for NLL is probably going to be the code that propagates constraints:

/// Propagate the region constraints: this will grow the values
/// for each region variable until all the constraints are
/// satisfied. Note that some values may grow **too** large to be
/// feasible, but we check this later.
fn propagate_constraints(&mut self, mir: &Mir<'tcx>) {

And in particular the calls to dfs:

let Ok(made_changes) = self.dfs(
mir,
CopyFromSourceToTarget {
source_region: constraint.sub,
target_region: constraint.sup,
inferred_values: &mut inferred_values,
constraint_point: constraint.point,
constraint_span: constraint.span,
},
);

The dfs code must walk the graph to find which points should be added and where:

pub(super) fn dfs<C>(&self, mir: &Mir<'tcx>, mut op: C) -> Result<bool, C::Early>
where
C: DfsOp,
{

cc @rust-lang/wg-compiler-nll

@nikomatsakis nikomatsakis added this to the NLL: Performance is good milestone Jan 30, 2018
@nikomatsakis nikomatsakis added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-nll labels Jan 30, 2018
@nikomatsakis
Copy link
Contributor Author

nikomatsakis commented Jan 30, 2018

I did some profiling of syn. The top result was actually not what I expected:

+   14.45%    14.45%  rustc    librustc-1dc9e414ba4b7dab.so                  [.] rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data                                          

This is type-checker constraint collection; this may imply that some of the refactorings I had in mind for doing this in a smarter way are in order.

@TimNN TimNN added the C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC label Jan 30, 2018
@nikomatsakis
Copy link
Contributor Author

OK, that profiling run was kind of messed up, because I forgot to include debuginfo = 1, so that the framepointers were all wrong etc. Still, when I did a better run, the data was a lot more detailed, but pointed roughly to the same conclusion. The NLL type checker is killing a lot of time doing normalization. This is also responsible for some soundness bugs. This kind of ups the importance of doing the trait refactorings I had in mind to handle this scenario better, but I'd hate to block on this (they are kind of involved). I have to think if we can do them in a kind of lightweight way.

@nikomatsakis
Copy link
Contributor Author

nikomatsakis commented Feb 1, 2018

On a hunch, tried the results of @Aaron1011's #47920. This makes a big difference, bringing us down to 7s. Haven't looked at the detailed profiling yet.

@nikomatsakis
Copy link
Contributor Author

nikomatsakis commented Feb 2, 2018

Did some rough profiling and analysis. For syn, with #47920 applied:

  • mir borrowck accounts for 18% of total runtime
    • nll accounts for 16% of total run time
      • running the MIR type check accounts for 14% of total run time
        • add_drop_live_constraint accounts for 12% of total run time
      • the other 2% is not concentrated in any one place
        • rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve in particular is 0% (7 samples out of 1034)
  • for comparison, AST-based borrowck accounts for 2%

@nikomatsakis
Copy link
Contributor Author

So after #48411 lands, this is the current profile of the syn crate:

lunch-box. perf focus  '{do_mir_borrowck}' --tree-callees --tree-min-percent 2
Matcher    : {do_mir_borrowck}
Matches    : 152
Not Matches: 776
Percentage : 16%

Tree
| matched `{do_mir_borrowck}` (16% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (12% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check (10% total, 0% self)
: : : | rustc_mir::borrow_check::nll::type_check::type_check_internal (10% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::type_check::_$u7b$$u7b$closure$u7d$$u7d$::haa6458083ae8ea7a (7% total, 0% self)
: : : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (7% total, 0% self)
: : : : : : | rustc_mir::borrow_check::nll::type_check::TypeChecker::fully_perform_op (5% total, 0% self)
: : : : : : : | rustc::infer::InferCtxt::take_and_reset_region_constraints (2% total, 0% self)
: : : : : : : : | rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data (2% total, 0% self)
: : : : : : : : : | <rustc_data_structures::unify::UnificationTable<K>>::new_key (2% total, 0% self)
: : : : : : : | rustc::infer::InferCtxt::commit_if_ok (2% total, 0% self)
: : : : : : : : | rustc::traits::query::dropck_outlives::<impl rustc::infer::at::At<'cx, 'gcx, 'tcx>>::dropck_outlives (2% total, 0% self)

This is showing stuff that takes more than 2% of total execution time.

Here is the profile showing stuff that takes more than 1%:

1% profile
lunch-box. perf focus  '{do_mir_borrowck}' --tree-callees --tree-min-percent 1
Matcher    : {do_mir_borrowck}
Matches    : 152
Not Matches: 776
Percentage : 16%

Tree
| matched `{do_mir_borrowck}` (16% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (12% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check (10% total, 0% self)
: : : | rustc_mir::borrow_check::nll::type_check::type_check_internal (10% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::type_check::_$u7b$$u7b$closure$u7d$$u7d$::haa6458083ae8ea7a (7% total, 0% self)
: : : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (7% total, 0% self)
: : : : : : | rustc_mir::borrow_check::nll::type_check::TypeChecker::fully_perform_op (5% total, 0% self)
: : : : : : : | rustc::infer::InferCtxt::take_and_reset_region_constraints (2% total, 0% self)
: : : : : : : : | rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data (2% total, 0% self)
: : : : : : : : : | <rustc_data_structures::unify::UnificationTable<K>>::new_key (2% total, 0% self)
: : : : : : : : : : | <rustc_data_structures::snapshot_vec::SnapshotVec<D>>::push (1% total, 1% self)
: : : : : : : | rustc::infer::InferCtxt::commit_if_ok (2% total, 0% self)
: : : : : : : : | rustc::traits::query::dropck_outlives::<impl rustc::infer::at::At<'cx, 'gcx, 'tcx>>::dropck_outlives (2% total, 0% self)
: : : : : : : : : | rustc::infer::canonical::<impl rustc::infer::InferCtxt<'cx, 'gcx, 'tcx>>::instantiate_query_result (1% total, 0% self)
: : : : : : | <rustc_mir::dataflow::at_location::FlowAtLocation<BD>>::each_state_bit (1% total, 1% self)
: : : : | <rustc_mir::borrow_check::nll::type_check::TypeVerifier<'a, 'b, 'gcx, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (1% total, 0% self)
: : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve (1% total, 0% self)

@nikomatsakis
Copy link
Contributor Author

Good news:

The indomitable @Mark-Simulacrum has added a special "NLL mode" to http://perf.rust-lang.org/, so we can now visualize our performance very easily. The NLL point should be compared to "clean" -- the delta is the 'extra work' we are doing when NLL is enabled (note that NLL still runs the old region analysis and borrow check, so it does strictly more work).

Bad news:

We got out work cut out for us! =) See that little orange triangle in the upper right?

screen shot 2018-03-23 at 1 54 21 pm

@nikomatsakis
Copy link
Contributor Author

I'm gonna close this tracking issue. Not that the problem is solved, but I don't see this issue as adding a lot of value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC I-compiletime Issue: Problems and improvements with respect to compile times. NLL-performant Working towards the "performance is good" goal T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

2 participants