Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute inline_worthy after inference and cache it #15970

Closed
wants to merge 2 commits into from

Conversation

JeffBezanson
Copy link
Member

This avoids repeating the computation for the same piece of code, and allows rejecting inlining before calling jl_uncompress_ast in more cases.

This should be generally faster. The only reason it might be slower AFAICT would be if we end up calling inline_worthy for lots of functions that previously never had inlining attempts by call sites. However inline_worthy itself should be pretty cheap.

@Keno
Copy link
Member

Keno commented Apr 20, 2016

Does it make sense to instead compute this the first time we ask for it?

@JeffBezanson
Copy link
Member Author

Good idea; I'll try that. I guess the main annoyance is that there would be 3 states, which is awkward to represent.

if me.linfo.def.module === _topmod(me)
name = me.linfo.def.name
if me.linfo.isva && (name === :+ || name === :* || name === :min || name === :max)
inlineable = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just cost ÷= 20? i think the previous test seemed very strict on which method it was matching to hopefully avoid the corner cases where inlining would make it slower.

@vtjnash
Copy link
Member

vtjnash commented Apr 20, 2016

i don't think it's going to be worth deferring the test. if we have the information at the ready like this, i believe the system should be able to delete this code as soon as it is converted into native code, thus saving memory.

@JeffBezanson
Copy link
Member Author

👍 to that

this avoids repeating the computation for the same piece of code,
and allows rejecting inlining before calling jl_uncompress_ast
in more cases.
@JeffBezanson
Copy link
Member Author

Ok, I have implemented that. It definitely seems to save significant memory during some tests. However it only shaves 1-2MB off the system image, which was disappointing. My best guess is that this is due to having copies of LambdaInfo in both method caches and tfunc caches. #15918 might help towards that (plus we also need to eliminate InferenceState.destination).

The most troublesome point I ran into was the serialize test. Method.roots contained references to codegen'd LambdaInfos whose IR had been deleted. AFAICT, there should be no need to serialize codegen'd LambdaInfos. All of these roots came from the recently-added jl_add_linfo_root(ctx->linfo, (jl_value_t*)li); in emit_call. I don't understand why this root would be necessary --- generated code doesn't reference its LambdaInfo object, and any referenced values should have been added to some Method.roots array anyway.

@vtjnash
Copy link
Member

vtjnash commented Apr 22, 2016

Iirc, that link keeps the method alive if it gets replaced. I'm thinking for #265 that method replacement will just mark them rather thnq deleting them, avoiding that problem

@JeffBezanson
Copy link
Member Author

I see; so indeed that specific LambdaInfo is not needed, but probably the called function's def.roots. I'll try changing it to root li.def.roots instead.

@JeffBezanson
Copy link
Member Author

Here we have an OSX timeout and a crash on 32-bit AV. The crash is a bit worrying.

// keep a reference to the called function's roots array in case
// the method is replaced (PR #14301)
if (ctx->linfo->def && li->def && li->def->roots)
jl_add_linfo_root(ctx->linfo, (jl_value_t*)li->def->roots);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another possible source of roots is sparam_vals, which emit_sparam assumes has a root (but could be any isbits)

additionally, lets just stop removing methods from the cache once they've been returned from jl_get_specialization1. this should be a really small number (since we actively discourage overwriting methods), avoids this issue, and is a going to be needed for 265 anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds reasonable, and much less of a hack than adding roots in callers. How do we reference the old entry in the cache? An old_entry link in TypeMapEntry?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking this would be a decent opportunity to incrementally start to add the entry world counters (min / max). an entry would be invisible if the world counter is not between min (this would be monotonically incrementing on every method add and copied from every method definition to its cache entries) and max (initialized to typemax). initially, the TLS world counter can just be equal to the method add counter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just take this out of the diff (it doesn't seem related to the rest of the change), and then it can be merged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good.

@JeffBezanson
Copy link
Member Author

Closing but will keep this branch around for the second commit.

@vtjnash vtjnash deleted the jb/inlineable branch June 9, 2016 16:28
@vtjnash vtjnash restored the jb/inlineable branch June 9, 2016 16:28
@JeffBezanson JeffBezanson deleted the jb/inlineable branch June 14, 2016 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants