-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lookup metadata for inlined frames for stack traces #41099
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Reopening because I have a different approach that might be okay, need to run CI to check. |
50fc39d
to
676f0c7
Compare
This looks like great foundational work, but just to check my understanding from a quick perusal, does this only add the module info? A full resolution of #41031 would seem to allow, e.g., Cthulhu to directly look up an inlined caller on its own (needs the full Could we get this from the If this is unfamiliar to you, check out some of the code in MethodAnalysis.jl, especially |
Great questions, and thank you. Yes, it only adds module info. I'm probably not appreciating something about the internals. When I first tried to store a dummy MethodInstance for the inlined frames that could be returned in the lookup method, I got an error about the system image being too large during the build.
I suppose you would do what you propose within Although, your approach would not work for terminal inlined frames, right? So it would potentially enable something to be done for intermediate inlined frames but would not a complete solution. I'd think it would be best to store information about inlined frames in the MethodInstance they're inlined into, which avoids an extensive search. Your comment spurred me to look harder for the But... what I'm finding is that if I allow that to be active when Julia compiles, it either leads to a "system image is too large" error or a hang after "namedtuples.jl" for a slimmer variant. I guess there are just way too many inlined methods to make that work. I think perhaps the ideal solution may require deeper study and architecture than I can currently commit to. |
Maybe this will help: a() = k
@noinline b() = a()
c() = b()
d() = c()
e() = d()
e() # note that you have to run the code once in order for the rest of this to work
julia> using MethodAnalysis
julia> mis = methodinstances(a)
1-element Vector{Core.MethodInstance}:
MethodInstance for a()
julia> mi = mis[1]
MethodInstance for a()
julia> direct_backedges(mi)
1-element Vector{Core.MethodInstance}:
MethodInstance for b()
julia> all_backedges(mi)
4-element Vector{Core.MethodInstance}:
MethodInstance for e()
MethodInstance for c()
MethodInstance for d()
MethodInstance for b() |
I don't think you can use the backedges since there might be multiple callsites withing the same function? Recording the information as debuginfo during inlinling might be more beneficial? |
Yeah... and in the current framework, all you have at that point is the name of the method (as symbol), the file name (as symbol), and the line number. Additional information stored in a MethodInstance seems to have to be simple to avoid blowing up the initial build. (Array of LineInfoNodes seem fine, Array of MethodInstances are not). So it seems you'd have to do some sort of dynamic lookup on demand. With code based on MethodAnalysis.jl, I could look up all MethodInstances associated with the inlined method from the name, and then filter based on the file and line number.
But that leaves you with specializations. If you have a following frame, I guess you could do a union of its backedges with this list, but you could still end up with multiple matches if several specializations of the same method are used, as @vchuravy noted. So, what additional information is necessary to store for a specific lookup? Just the list of argument types? Maybe that can be stored in the parent non-inlined MethodInstance. |
The julia/base/compiler/ssair/inlining.jl Line 316 in 50fcb03
@BioTurboNick you want the method right? Not the precise method instance? Then storing the signature (and not atypes ) should suffice, cc: @aviatesk
|
Other stack frames display the specialized types, so it wouldn't be consistent, though it would be easier. |
Codecov Report
@@ Coverage Diff @@
## master #41099 +/- ##
==========================================
- Coverage 89.23% 89.16% -0.08%
==========================================
Files 342 343 +1
Lines 78400 78671 +271
==========================================
+ Hits 69964 70146 +182
- Misses 8436 8525 +89
Continue to review full report at Codecov.
|
Very cool! Does this increase the size of the system image file substantially? Another concern might be serialization; perhaps other than the module, the current LineInfoNode is very portable. It's not so obvious that this will be. Bugs like those hunted down in #37809 and #38127 are the kind of thing I might worry about. (They arose from different code on different nodes, and communication between the nodes led to serialization errors.)
I was imagining that between the LineInfo and the backedges you could probably figure things out, but it might take a dive into the type-inferred code. That is, if the debuginfo says you were called from a specific spot in the code, then you go check the unoptimized inferred code and find the corresponding call, and extract the types from there.. Since inlining will happen only for inferrable calls, I don't think you need to worry about how to handle cases of non-concrete inference. (Maybe 😁 ) But this sounds like quite a lot of work, so if @BioTurboNick's solution doesn't have significant downsides, clearly it's easier. |
How do I check that? There might also be some excess to be trimmed even, as there seemed like there were a lot more linetable entries than inlined frames. With respect to serialization errors, what could the source of that be, in theory? EDIT: Also working through an issue with certain frames from generated functions e.g. |
Probably just look at the size of the sysimage ( |
I implemented the suggestion to use the sysimage (sys-o.a) size is 265 Mb with this PR. On master it is 182 Mb... a pretty significant increase. (Believe it or not, an earlier iteration was closer to 500 Mb). Maybe I can get that down by avoiding creating a new LineInfoNode for the subsequent frames that are associated with a Method instead of MethodInstance. Maybe once I match that line in the lookup function, I can backtrack to find the first one. |
Welp, that didn't reduce the size at all. Not sure how to proceed. EDIT: There are ~1.2 million inlining passes building Juila. Just adding an empty array of LineInfoNodes to all MethodInstances with inlined methods increases the sysimage size from 182 MB to 205 MB. They are occurring across ~70 thousand method instances containing inlined methods, and involve ~14 thousand unique method instances (this last might actually combine some specialiations unintentionally). |
EDIT: D'oh, that's not fully specified then; the parent method instance should be part of the key. Hang on... |
Turns out that version is even worse when done right. It may be impossible to do without grossly inflating the sysimage. So, what about turning this off for sysimage creation, and only running it for code loaded later? Is there a bool somewhere to check? Could just store the module of the func/file/line for those functions cheaply. Shouldn't matter quite as much if type information is left out from those frames, yes? |
Oh. I think I got it? Turns out I was doing ccall wrong. If I figured that out, my original approach would have worked great from the Julia side.
|
The current code works except for certain frames where the An example of this case is seen when running The parent MethodInstance for these two inlined frames is This issue could be resolved by, in these rare cases, relying on one of the other approaches I used. But before I try that, is there another place that a linetable could be found? |
My reasoning is that method errors are usually caused by dynamic dispatch, which often results in the inlining cost for the parent frame being too high for it to be inlined. As a result, we typically see type information for frames that are close to where the error occurs:
The other frames might indeed be inlined, so you are right, it might be an overstatement to say "it's rare to be involved". But I'm still wondering if the type information for those frames are really valuable enough to justify the performance cost associated with it. |
When inlined, you also know that Cthulhu will always recover that type information as well, so this is likely unnecessary to store for the user experience |
Most people do not (and arguably should not) use Cthulhu. Giving more precision in the inlined stack traces is useful from a general usability perspective. Julia often and deeply inlines code and I for one have been frustrated by the stacktraces more than once. |
If you ever have the pleasure of trying to differentiate through a non-trivial function using Zygote, the answer is a resounding "yes". For this particular case, any errors are usually not MethodErrors and the entire call stack is type stable. |
I am still concerned this is an implementation problem here, since the system image contains a fairly small amount of optimized code, yet this PR is allocating a noticeably disproportionate about of extra junk:
|
sys data: +2.07 MB I'm guessing most of that is from storing MethodInstances of inlined functions in the line table entries? There are two places where a MethodInstance is placed in a LineInfoNode - Perhaps there's some point where it could be determined that MethodInstances aren't needed for some entries and could be stripped back out? How did you generate those numbers, so I could check myself? |
This is a PkgCacheInspector feature |
I've poked around a bit and it's not clear how to use it for the sysimage, vs. packages? |
Nice work! 🎉 This PR broke a feature in Pluto.jl that iterates a backtrace. It might be nice to run nanosoldier on this PR to check other packages? (It would have caught our issue 🌝) |
Thanks! And thanks for the heads-up @fonsp. So the issue is just that sometimes a |
Yes, the types of This is how we handle frames: |
"the MethodInstance or CodeInfo containing the execution context (if it could be found), \ | ||
or Module (for macro expansions)" | ||
linfo::Union{MethodInstance, Method, Module, CodeInfo, Nothing} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really necessary to allow linfo::Module
? From what I can tell, this module information is never used or explicitly tested, so there's no need to store it here. I'm concerned that this heavy-union field definition complicates the code unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible it's out of date, but at least at one point it was needed to look up the module for macro expansion
frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have implemented test cases for these new features. Do you have any use case for that information? Otherwise we should clean that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. But I just described the use case. Is this causing a problem somewhere right now? Otherwise something to look at when the underlying problem #50204 is resolved someway and the full functionality can be restored here.
The work I did in #41099 introduced code which, if method information could not be found for an inlined frame, would fall back to use the module of the next-higher stack frame. This often worked there because the only frames that would not be assigned a module at this point would be e.g., `macro expansion` frames. However, due to the performance impact of the way method roots are currently encoded, the extra method roots were removed in #50546. The result is that inlined frames were being assigned a potentially incorrect module, rather than being left blank. Example: ``` julia> @Btime plot([1 2 3], seriestype = :blah) ... [13] #invokelatest#2 @ BenchmarkTools ./essentials.jl:901 [inlined] [14] invokelatest @ BenchmarkTools ./essentials.jl:896 [inlined] ... ```
The work I did in #41099 introduced code which, if method information could not be found for an inlined frame, would fall back to use the module of the next-higher stack frame. This often worked there because the only frames that would not be assigned a module at this point would be e.g., `macro expansion` frames. However, due to the performance impact of the way method roots are currently encoded, the extra method roots were removed in #50546. The result is that inlined frames were being assigned a potentially incorrect module, rather than being left blank. Example: ``` julia> @Btime plot([1 2 3], seriestype = :blah) ... [13] #invokelatest#2 @ BenchmarkTools ./essentials.jl:901 [inlined] [14] invokelatest @ BenchmarkTools ./essentials.jl:896 [inlined] ... ``` (cherry picked from commit ed891d6)
The work I did in #41099 introduced code which, if method information could not be found for an inlined frame, would fall back to use the module of the next-higher stack frame. This often worked there because the only frames that would not be assigned a module at this point would be e.g., `macro expansion` frames. However, due to the performance impact of the way method roots are currently encoded, the extra method roots were removed in #50546. The result is that inlined frames were being assigned a potentially incorrect module, rather than being left blank. Example: ``` julia> @Btime plot([1 2 3], seriestype = :blah) ... [13] #invokelatest#2 @ BenchmarkTools ./essentials.jl:901 [inlined] [14] invokelatest @ BenchmarkTools ./essentials.jl:896 [inlined] ... ``` (cherry picked from commit ed891d6)
The fallback code that was written for #41099 is causing unintended issues with some inlined stack frames (one previous #51405, new #52709), since the main piece, linetable storage and lookup, was removed in #50546. Probably better to strip it all back to how it was previously, until it can all be revisited more fully. Should be backported to 1.10.
The fallback code that was written for #41099 is causing unintended issues with some inlined stack frames (one previous #51405, new #52709), since the main piece, linetable storage and lookup, was removed in #50546. Probably better to strip it all back to how it was previously, until it can all be revisited more fully. Should be backported to 1.10.
The fallback code that was written for JuliaLang#41099 is causing unintended issues with some inlined stack frames (one previous JuliaLang#51405, new JuliaLang#52709), since the main piece, linetable storage and lookup, was removed in JuliaLang#50546. Probably better to strip it all back to how it was previously, until it can all be revisited more fully. Should be backported to 1.10.
Partially resolves #41031
This PR implements a lookup for information about an inlined stack frame from the parent frame's cached, inferred CodeInfo.Currently, this lookup fails if a parent frame's code has not been inferred. This merely results in the status quo for those frames.I was giving a best guess about populating the MethodInstance and Method fields necessary to make things work. I also stumbled into what might be a nicer display for the inlined methods as well, if the arguments information can't be populated. (Though note how the top frame shows the aforementioned issue:New approach (10/16/21) to avoid freaking out the GC: LineInfoNodes for inlined frames stored directly in the MethodInstance (via new fieldinlined
) for the parent method after the inlining compiler pass.jl_frame_t->linfo
is now a union that either holds a MethodInstance (linfo->mi
), or Module (linfo->module
), modeled after thedef
union in MethodInstance. When the stack frame lookup occurs inlookup_pointer
, the inlined frames are matched with the LineInfoNodes from the parent frame, and the module is stored injl_frame_t->linfo.module
. The stack frame printing code has been adjusted to match.Approach as of 5/19/22: Storing MethodInstance reference in the LineInfoNode instead of just the method name, then doing a multi-stage lookup when generating stack traces, that first tries to find the specific MethodInstance; if that fails, uses the Method; if that fails, uses the Module. This allows inlined stack frames to present more detailed information.
Current:
This PR:
Also added documentation to clarify that
inferred
in ajl_code_instance_t
may contain ajl_array_t
.