compute `inline_worthy` after inference and cache it #15970

JeffBezanson · 2016-04-20T21:51:15Z

This avoids repeating the computation for the same piece of code, and allows rejecting inlining before calling jl_uncompress_ast in more cases.

This should be generally faster. The only reason it might be slower AFAICT would be if we end up calling inline_worthy for lots of functions that previously never had inlining attempts by call sites. However inline_worthy itself should be pretty cheap.

Keno · 2016-04-20T22:22:55Z

Does it make sense to instead compute this the first time we ask for it?

JeffBezanson · 2016-04-20T22:43:51Z

Good idea; I'll try that. I guess the main annoyance is that there would be 3 states, which is awkward to represent.

vtjnash · 2016-04-20T23:17:20Z

base/inference.jl

+        if me.linfo.def.module === _topmod(me)
+            name = me.linfo.def.name
+            if me.linfo.isva && (name === :+ || name === :* || name === :min || name === :max)
+                inlineable = true


maybe just cost ÷= 20? i think the previous test seemed very strict on which method it was matching to hopefully avoid the corner cases where inlining would make it slower.

vtjnash · 2016-04-20T23:21:07Z

i don't think it's going to be worth deferring the test. if we have the information at the ready like this, i believe the system should be able to delete this code as soon as it is converted into native code, thus saving memory.

JeffBezanson · 2016-04-22T02:07:28Z

👍 to that

this avoids repeating the computation for the same piece of code, and allows rejecting inlining before calling jl_uncompress_ast in more cases.

JeffBezanson · 2016-04-22T21:20:06Z

Ok, I have implemented that. It definitely seems to save significant memory during some tests. However it only shaves 1-2MB off the system image, which was disappointing. My best guess is that this is due to having copies of LambdaInfo in both method caches and tfunc caches. #15918 might help towards that (plus we also need to eliminate InferenceState.destination).

The most troublesome point I ran into was the serialize test. Method.roots contained references to codegen'd LambdaInfos whose IR had been deleted. AFAICT, there should be no need to serialize codegen'd LambdaInfos. All of these roots came from the recently-added jl_add_linfo_root(ctx->linfo, (jl_value_t*)li); in emit_call. I don't understand why this root would be necessary --- generated code doesn't reference its LambdaInfo object, and any referenced values should have been added to some Method.roots array anyway.

vtjnash · 2016-04-22T23:39:21Z

Iirc, that link keeps the method alive if it gets replaced. I'm thinking for #265 that method replacement will just mark them rather thnq deleting them, avoiding that problem

JeffBezanson · 2016-04-22T23:48:20Z

I see; so indeed that specific LambdaInfo is not needed, but probably the called function's def.roots. I'll try changing it to root li.def.roots instead.

JeffBezanson · 2016-04-26T03:23:38Z

Here we have an OSX timeout and a crash on 32-bit AV. The crash is a bit worrying.

vtjnash · 2016-04-26T03:44:07Z

src/codegen.cpp

+                // keep a reference to the called function's roots array in case
+                // the method is replaced (PR #14301)
+                if (ctx->linfo->def && li->def && li->def->roots)
+                    jl_add_linfo_root(ctx->linfo, (jl_value_t*)li->def->roots);


another possible source of roots is sparam_vals, which emit_sparam assumes has a root (but could be any isbits)

additionally, lets just stop removing methods from the cache once they've been returned from jl_get_specialization1. this should be a really small number (since we actively discourage overwriting methods), avoids this issue, and is a going to be needed for 265 anyways.

Yeah, that sounds reasonable, and much less of a hack than adding roots in callers. How do we reference the old entry in the cache? An old_entry link in TypeMapEntry?

I was thinking this would be a decent opportunity to incrementally start to add the entry world counters (min / max). an entry would be invisible if the world counter is not between min (this would be monotonically incrementing on every method add and copied from every method definition to its cache entries) and max (initialized to typemax). initially, the TLS world counter can just be equal to the method add counter.

maybe just take this out of the diff (it doesn't seem related to the rest of the change), and then it can be merged?

Ok sounds good.

JeffBezanson · 2016-05-03T18:57:28Z

Closing but will keep this branch around for the second commit.

vtjnash reviewed Apr 20, 2016
View reviewed changes

JeffBezanson mentioned this pull request Apr 22, 2016

compiler performance #14743

Closed

compute inline_worthy after inference and cache it

a155326

this avoids repeating the computation for the same piece of code, and allows rejecting inlining before calling jl_uncompress_ast in more cases.

JeffBezanson force-pushed the jb/inlineable branch from 1610cfc to d67912f Compare April 22, 2016 21:10

WIP: delete non-inlineable code after codegen

f855da5

JeffBezanson force-pushed the jb/inlineable branch from d67912f to f855da5 Compare April 25, 2016 14:35

vtjnash reviewed Apr 26, 2016
View reviewed changes

JeffBezanson mentioned this pull request May 3, 2016

compute inline_worthy after inference and cache it #16186

Merged

JeffBezanson closed this May 3, 2016

vtjnash deleted the jb/inlineable branch June 9, 2016 16:28

vtjnash restored the jb/inlineable branch June 9, 2016 16:28

JeffBezanson deleted the jb/inlineable branch June 14, 2016 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compute `inline_worthy` after inference and cache it #15970

compute `inline_worthy` after inference and cache it #15970

JeffBezanson commented Apr 20, 2016

Keno commented Apr 20, 2016

JeffBezanson commented Apr 20, 2016

vtjnash Apr 20, 2016

vtjnash commented Apr 20, 2016 •

edited

Loading

JeffBezanson commented Apr 22, 2016

JeffBezanson commented Apr 22, 2016

vtjnash commented Apr 22, 2016

JeffBezanson commented Apr 22, 2016

JeffBezanson commented Apr 26, 2016

vtjnash Apr 26, 2016

JeffBezanson Apr 26, 2016

vtjnash Apr 26, 2016

vtjnash May 3, 2016

JeffBezanson May 3, 2016

JeffBezanson commented May 3, 2016

compute inline_worthy after inference and cache it #15970

compute inline_worthy after inference and cache it #15970

Conversation

JeffBezanson commented Apr 20, 2016

Keno commented Apr 20, 2016

JeffBezanson commented Apr 20, 2016

vtjnash Apr 20, 2016

Choose a reason for hiding this comment

vtjnash commented Apr 20, 2016 • edited Loading

JeffBezanson commented Apr 22, 2016

JeffBezanson commented Apr 22, 2016

vtjnash commented Apr 22, 2016

JeffBezanson commented Apr 22, 2016

JeffBezanson commented Apr 26, 2016

vtjnash Apr 26, 2016

Choose a reason for hiding this comment

JeffBezanson Apr 26, 2016

Choose a reason for hiding this comment

vtjnash Apr 26, 2016

Choose a reason for hiding this comment

vtjnash May 3, 2016

Choose a reason for hiding this comment

JeffBezanson May 3, 2016

Choose a reason for hiding this comment

JeffBezanson commented May 3, 2016

compute `inline_worthy` after inference and cache it #15970

compute `inline_worthy` after inference and cache it #15970

vtjnash commented Apr 20, 2016 •

edited

Loading