-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up return->call address translation for backtraces #33380
Conversation
…_unw_stepn Translate return->call address internally in jl_unw_stepn so that users never have to deal with this subtlety when calling StackTraces.lookup. Also rearrange the jl_unw_stepn API a little so that interpreter stack traces can always be fit in to bt_data when calling jl_unw_stepn incrementally for deep stacks + add test cases for that (Fixes #29695). For the tests, clean up printing of CodeInfo objects wrapped in InterpreterIP.
With return->call address tanslation done inside jl_unw_stepn, there should largely be no need to make this API breaking change, as all backtrace addresses which the user has access to will already be translated and can be passed directly into `StackTraces.lookup()`, as was done previously. There's one possible exception to this rule: If the user were looking up the debug info for a pointer to a C function *not* obtained from a backtrace, they previously would have had to use `StackTraces.lookup(ptr+1)` to "undo" the the `ptr-1` translation that `StackTraces.lookup` used to perform prior to #33190. This seems like an edge case however and probably not worth breaking the API over.
// * The LLVM unwinder functions step() and setInfoBasedOnIPRegister() | ||
// https://github.com/llvm/llvm-project/blob/master/libunwind/src/UnwindCursor.hpp | ||
// * The way that libunwind handles it in `unw_get_proc_name`: | ||
// https://lists.nongnu.org/archive/html/libunwind-devel/2014-06/msg00025.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you can see, I spent a while looking up what people do for this stuff. AFAICT it seems that return_ip - 1
should be sufficient for our use cases. libunwind does something slightly more fancy, but I guess we don't want to trace through signal handler frames. Also libunwind's dwarf.use_prev_instr
field is hidden from us behind an opaque pointer in any case (see unw_get_proc_name
).
The LLVM unwinder seems to have a special case for ARM in thumb mode and I tried to do the same for our ARMv7 support but I don't know how to test that yet. @yuyichao does the above use of _CPU_ARM_
look right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you can see, I spent a while looking up what people do for this stuff. AFAICT it seems that return_ip - 1 should be sufficient for our use cases. libunwind does something slightly more fancy, but I guess we don't want to trace through signal handler frames. Also libunwind's dwarf.use_prev_instr field is hidden from us behind an opaque pointer in any case (see unw_get_proc_name).
As long as we don't give that correct pointer back to libunwind and don't add the offset when we get the backtrace from the signal frame it should be fine.
does the above use of CPU_ARM look right?
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't add the offset when we get the backtrace from the signal frame
Oops, looks like I was doing that part wrong. This should now be cleaned up in the latest commit by avoiding doing the offset for contexts we know have come from a signal handler. I checked that this matches the behavior of libunwind's unw_backtrace()
for those contexts on my machine.
…al frames Add an option to jl_unw_stepn to avoid adjusting the instruction pointer when we know the cursor derives from a signal frame. (We could alternatively try to do this with unw_is_signal_frame, but that wouldn't work for windows and would be an extra function call for each jl_unw_step which seems a bit unnecessary when we already know the top frame is the signal frame.) Also generalize skipping of the first few backtrace frames as needed to hide the internal backtrace machinery itself by adding a `skip` option to jl_unw_stepn/jl_backtrace_from_here/record_backtrace. As part of this also move the workaround for 32-bit windows into the backtrace internals. Move Base.backtrace into error.jl as it doesn't need any lookup functionality from StackTraces to work correctly.
4c6dc81
to
7528266
Compare
I've added another commit here to be more careful to not adjust the pointer in the case of signal frames (@vtjnash I think this will also be more correct for use in your Profiler work) I also generalized the skipping of certain internal frames involved in capturing the backtraces themselves so frames for |
Bump. Any takers for a more detailed look? @JeffBezanson? Perhaps I should just merge this as it allows us to revert a breaking change, passes tests, and also fixes a long standing bug. But having someone read it closely enough to click "Approve" is always appreciated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me.
Thanks guys! |
Here I translate return->call address internally in
jl_unw_stepn
so that users never have to deal with this subtlety when callingStackTraces.lookup
, as discussed on slack with @vtjnash. This is a prelude to bringing in some larger changes from #33277, but which got conflicted with the merge of #33190.The first commit also rearranges the
jl_unw_stepn
API a little so that interpreter stack traces can always be fit in tobt_data
when callingjl_unw_stepn
incrementally for deep stacks + add test cases for that (fixes #29695). For the tests, clean up printing ofCodeInfo
objects wrapped inInterpreterIP
as well.The second commit reverts the API breaking name change lookupat->lookup from #33190. @timholy I think this resolves your reservations from #33190 (comment). With return->call address translation done inside
jl_unw_stepn
, there should largely be no need to make the API breaking change, as all backtrace addresses which the user has access to will already be translated and can be passed directly intoStackTraces.lookup()
, exactly as was done in julia 1.3. (There's one possible exception to this rule: If the user were looking up the debug info for a pointer to a C function not obtained from a backtrace, they previously would have had to useStackTraces.lookup(ptr+1)
to undo the theptr-1
translation thatStackTraces.lookup
did prior to #33190. This kind of usage is reflected in the test which doesStackTraces.lookup(ctestptr)
but it seems like an edge case and not worth breaking the API over.)