compiler performance #14743

JeffBezanson · 2016-01-20T16:58:01Z

This is a tracking issue for work on speeding up the compiler itself. Between LLVM 3.7 and the upcoming jb/functions we have significant slowdowns. Dealing with this is becoming quite urgent. All phases of the system could use improvement.

Front end

Options for improving flisp performance
- Try using Gambit-C again
- Write an flisp bytecode-to-llvm compiler (can be a static compiler)
- Hand compile the front end code to C
~~Clean up lowering passes (julia-syntax.scm). Probably at least 2-3 of them can be combined or removed.~~ (simplify and speed up front end #14997)

IR

~~AST representation needs to be more compact and include better debug info improved IR #15609, Improve inlined line numbers #14949, WIP: overhaul file name info #15583~~
- Eliminate jl_compress_ast
~~More efficient Slot representation (separate Slot into SlotNumber and TypedSlot to save space #15951)~~
(Maybe) make GenSym/SSAVal assignments implicit, i.e. just statement indexes

Type inference

~~Use workqueue instead of recursion type-inference workq #15300~~
~~More efficient lookup structure for cached inferred trees TupleMap type #15779~~
Don't keep as many trees around (e.g. ones too big to be inlined) (compute inline_worthy after inference and cache it #15970)
~~Method-cache-style widening before invoking recursive inference, to cut down workload~~
Fewer specializations
~~Const lattice element~~ (add a lattice element type for constants in inference #15785)
~~combine tfunc and specializations arrays (merge specializations and tfunc #15918)~~
~~allow type inf to always allocate new LambdaInfos to avoid copies in both specializations and method cache~~

Other

Use generated functions less
~~There is sometimes a regression due to precompile+ #15934~~ believed to be largely fixed

Codegen

Run more things in an interpreter, avoiding codegen entirely (WIP: use interpreter to execute functions that are all generic calls #15855)
~~Codegen time might be slightly super-linear in total amount of code (see set JULIA_TEST_MAXRSS_MB=600 on appveyor #14845 (comment)) some codegen tests & fixes #15632~~
~~Quadratic jit debug info registration (jit code debug registration is O(n^2) #14846) gdb bug, not Julia~~
~~Add -O0 option~~
Run fewer optimization passes (basically just mem2reg), e.g. on code with poor type information
When 2 specializations of a function have the same LLVM IR, reuse the native code
~~Use less memory (Approaches for avoiding fragmentation in code memory allocation #14626)~~
Improvements inside LLVM
~~calling convention for constant functions that fully avoids codegen (RFC: specialized calling convention for pure functions that return a constant #16837)~~

Some specific issues:

Very slow compiler performance with long function and many list comprehensions #6685 Slow compilation with many comprehensions
~~function AST in a module limited to 2^16 constants #14113 AST representation with many constants~~
Compilation bottleneck #14556 egal bottleneck

The text was updated successfully, but these errors were encountered:

Keno · 2016-01-20T17:02:55Z

When 2 specializations of a function have the same LLVM IR, reuse the native code

I've been thinking about this. We could try hashing the IR code, but we'd have to do some work to avoid spurious differences due to naming things, etc., e.g. we could name all functions after a hash of their IR. Of course this'll also seriously complicate backtraces/debug info.

JeffBezanson · 2016-01-20T17:07:14Z

this'll also seriously complicate backtraces/debug info

Would this be mitigated if we started by only considering different specializations of the exact same method? I imagine we could do a reasonably quick experiment to see if this might be profitable.

Keno · 2016-01-20T17:18:18Z

Would this be mitigated if we started by only considering different specializations of the exact same method? I imagine we could do a reasonably quick experiment to see if this might be profitable.

Yes for backtraces, no for debug info, but I think it might be fixable.

StefanKarpinski · 2016-01-20T17:59:16Z

The approach I had contemplated was replacing actual debug info with some sort of template values and then when you get a cache hit, use the previously generated code but with the debug info "template" filled in. Not sure how well that could be made to work though.

Keno · 2016-01-20T19:02:59Z

I think the biggest problem is to know which of the specializations you're in while walking the stack. You could potentially do it by looking at the local variables of the parent frame and the trying to figure out which one would have had to have been called.

StefanKarpinski · 2016-01-20T19:37:10Z

What I'm was describing would result in different specialized versions (with different debug info), but would reuse the generated code, so it would save time but not memory. Of course, that's not as good as using the same generated code, but that seems much harder.

Keno · 2016-01-20T19:38:05Z

Ah, I understand

ViralBShah · 2016-01-21T03:48:39Z

Wasn't gambit a bit buggy when we first tried it in the very early days? I guess it should be easy to try it out and run PkgEvaluator.

ViralBShah · 2016-01-21T03:49:55Z

An flisp to llvm bytecode compiler could also be a great JSOC project. We need to announce JSOC soon too.

StefanKarpinski · 2016-01-21T17:01:37Z

I think that compiler performance is a little too important to hang our hopes on a JSoC project.

ViralBShah · 2016-01-23T04:42:11Z

Of course we wouldn't hang our hopes on it, but there is no harm in mentioning it as a potential candidate project - in case we don't get around to doing it.

felipenoris · 2016-03-07T18:40:37Z

Is there any update on which solution will be given to improve flisp performance?

timholy · 2016-03-07T20:42:30Z

Check out https://github.com/JuliaLang/julia/pulls?q=is%3Apr+author%3AJeffBezanson+is%3Aclosed for some of Jeff's PRs which have already implemented some of the solutions.

andreasnoack · 2016-05-26T16:55:38Z

On my laptop, generic_matmatmul! takes 0.5 seconds to compile on 0.5 and 0.3 seconds on 0.3 (and 0.4). Even though the function is ~200 lines, it seems too slow in general and is probably one the main time consumers in the linear algebra tests.

It might be related to #16434 and, therefore, we should probably also look into the effects of splitting up the function. It might be much faster to compile six smaller versions.

JeffBezanson · 2016-06-03T03:56:26Z

With a quick look, a significant amount of compile time for generic_matmatmul! is in alloc_elim_pass.

JeffBezanson · 2016-06-30T16:49:51Z

Still to go here: #16837

JeffBezanson · 2016-06-30T17:52:51Z

There's also an especially bad case in #17137 we should fix.

PallHaraldsson · 2016-09-01T11:45:58Z

"Try using Gambit-C again", "Scheme" wasn't obvious (well I guess implied by flisp..):

https://en.wikipedia.org/wiki/Gambit_(scheme_implementation)

Is FemtoLisp on the way out? If/when this works? I see recent issues on a REPL for it..

ViralBShah · 2016-09-01T13:12:26Z

@PallHaraldsson This is just adding noise by asking such questions here. Best to do it on julia-users.

vtjnash · 2017-05-24T17:26:22Z

Doesn't seem to be anything left on this list worth doing / tracking with a meta issue.

JeffBezanson added performance Must go faster compiler:codegen Generation of LLVM IR and native code labels Jan 20, 2016

JeffBezanson added this to the 0.5.0 milestone Jan 28, 2016

tkelman mentioned this issue Jan 29, 2016

set JULIA_TEST_MAXRSS_MB=600 on appveyor #14845

Merged

JeffBezanson added the priority This should be addressed urgently label Feb 6, 2016

timholy mentioned this issue Mar 27, 2016

Decouple dispatch and specialization #11339

Open

JeffBezanson removed the priority This should be addressed urgently label May 3, 2016

ivirshup mentioned this issue May 11, 2016

Type parameters slowing down compilation with many concrete types #16321

Closed

vtjnash assigned JeffBezanson May 18, 2016

vtjnash mentioned this issue Jun 30, 2016

some generally-applicable changes from #16622 #17214

Merged

JeffBezanson modified the milestones: 0.5.x, 0.5.0 Jul 7, 2016

jeff-regier mentioned this issue Jul 28, 2016

[WIP] Update to ForwardDiff 0.2.x jeff-regier/Celeste.jl#248

Merged

StefanKarpinski added help wanted Indicates that a maintainer wants help on an issue or pull request and removed help wanted Indicates that a maintainer wants help on an issue or pull request labels Oct 27, 2016

ChrisRackauckas mentioned this issue Jan 16, 2017

Response Time SciML/DiffEqOnline#7

Closed

vtjnash closed this as completed May 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler performance #14743

compiler performance #14743

JeffBezanson commented Jan 20, 2016 •

edited

Loading

Keno commented Jan 20, 2016

JeffBezanson commented Jan 20, 2016

Keno commented Jan 20, 2016

StefanKarpinski commented Jan 20, 2016

Keno commented Jan 20, 2016

StefanKarpinski commented Jan 20, 2016

Keno commented Jan 20, 2016

ViralBShah commented Jan 21, 2016

ViralBShah commented Jan 21, 2016

StefanKarpinski commented Jan 21, 2016

ViralBShah commented Jan 23, 2016

felipenoris commented Mar 7, 2016

timholy commented Mar 7, 2016

andreasnoack commented May 26, 2016 •

edited

Loading

JeffBezanson commented Jun 3, 2016

JeffBezanson commented Jun 30, 2016

JeffBezanson commented Jun 30, 2016

PallHaraldsson commented Sep 1, 2016 •

edited

Loading

ViralBShah commented Sep 1, 2016

vtjnash commented May 24, 2017

compiler performance #14743

compiler performance #14743

Comments

JeffBezanson commented Jan 20, 2016 • edited Loading

Front end

IR

Type inference

Other

Codegen

Keno commented Jan 20, 2016

JeffBezanson commented Jan 20, 2016

Keno commented Jan 20, 2016

StefanKarpinski commented Jan 20, 2016

Keno commented Jan 20, 2016

StefanKarpinski commented Jan 20, 2016

Keno commented Jan 20, 2016

ViralBShah commented Jan 21, 2016

ViralBShah commented Jan 21, 2016

StefanKarpinski commented Jan 21, 2016

ViralBShah commented Jan 23, 2016

felipenoris commented Mar 7, 2016

timholy commented Mar 7, 2016

andreasnoack commented May 26, 2016 • edited Loading

JeffBezanson commented Jun 3, 2016

JeffBezanson commented Jun 30, 2016

JeffBezanson commented Jun 30, 2016

PallHaraldsson commented Sep 1, 2016 • edited Loading

ViralBShah commented Sep 1, 2016

vtjnash commented May 24, 2017

JeffBezanson commented Jan 20, 2016 •

edited

Loading

andreasnoack commented May 26, 2016 •

edited

Loading

PallHaraldsson commented Sep 1, 2016 •

edited

Loading