-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compute inline_worthy
after inference and cache it
#15970
Conversation
Does it make sense to instead compute this the first time we ask for it? |
Good idea; I'll try that. I guess the main annoyance is that there would be 3 states, which is awkward to represent. |
if me.linfo.def.module === _topmod(me) | ||
name = me.linfo.def.name | ||
if me.linfo.isva && (name === :+ || name === :* || name === :min || name === :max) | ||
inlineable = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just cost ÷= 20
? i think the previous test seemed very strict on which method it was matching to hopefully avoid the corner cases where inlining would make it slower.
i don't think it's going to be worth deferring the test. if we have the information at the ready like this, i believe the system should be able to delete this code as soon as it is converted into native code, thus saving memory. |
👍 to that |
this avoids repeating the computation for the same piece of code, and allows rejecting inlining before calling jl_uncompress_ast in more cases.
1610cfc
to
d67912f
Compare
Ok, I have implemented that. It definitely seems to save significant memory during some tests. However it only shaves 1-2MB off the system image, which was disappointing. My best guess is that this is due to having copies of LambdaInfo in both method caches and tfunc caches. #15918 might help towards that (plus we also need to eliminate The most troublesome point I ran into was the serialize test. Method.roots contained references to codegen'd LambdaInfos whose IR had been deleted. AFAICT, there should be no need to serialize codegen'd LambdaInfos. All of these roots came from the recently-added |
Iirc, that link keeps the method alive if it gets replaced. I'm thinking for #265 that method replacement will just mark them rather thnq deleting them, avoiding that problem |
I see; so indeed that specific LambdaInfo is not needed, but probably the called function's def.roots. I'll try changing it to root li.def.roots instead. |
d67912f
to
f855da5
Compare
Here we have an OSX timeout and a crash on 32-bit AV. The crash is a bit worrying. |
// keep a reference to the called function's roots array in case | ||
// the method is replaced (PR #14301) | ||
if (ctx->linfo->def && li->def && li->def->roots) | ||
jl_add_linfo_root(ctx->linfo, (jl_value_t*)li->def->roots); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another possible source of roots is sparam_vals
, which emit_sparam
assumes has a root (but could be any isbits)
additionally, lets just stop removing methods from the cache once they've been returned from jl_get_specialization1
. this should be a really small number (since we actively discourage overwriting methods), avoids this issue, and is a going to be needed for 265 anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sounds reasonable, and much less of a hack than adding roots in callers. How do we reference the old entry in the cache? An old_entry
link in TypeMapEntry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking this would be a decent opportunity to incrementally start to add the entry world counters (min / max). an entry would be invisible if the world counter is not between min (this would be monotonically incrementing on every method add and copied from every method definition to its cache entries) and max (initialized to typemax). initially, the TLS world counter can just be equal to the method add counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just take this out of the diff (it doesn't seem related to the rest of the change), and then it can be merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sounds good.
Closing but will keep this branch around for the second commit. |
This avoids repeating the computation for the same piece of code, and allows rejecting inlining before calling jl_uncompress_ast in more cases.
This should be generally faster. The only reason it might be slower AFAICT would be if we end up calling inline_worthy for lots of functions that previously never had inlining attempts by call sites. However inline_worthy itself should be pretty cheap.