Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up return->call address translation for backtraces #33380

Merged
merged 4 commits into from
Oct 2, 2019

Conversation

c42f
Copy link
Member

@c42f c42f commented Sep 25, 2019

Here I translate return->call address internally in jl_unw_stepn so that users never have to deal with this subtlety when calling StackTraces.lookup, as discussed on slack with @vtjnash. This is a prelude to bringing in some larger changes from #33277, but which got conflicted with the merge of #33190.

The first commit also rearranges the jl_unw_stepn API a little so that interpreter stack traces can always be fit in to bt_data when calling jl_unw_stepn incrementally for deep stacks + add test cases for that (fixes #29695). For the tests, clean up printing of CodeInfo objects wrapped in InterpreterIP as well.

The second commit reverts the API breaking name change lookupat->lookup from #33190. @timholy I think this resolves your reservations from #33190 (comment). With return->call address translation done inside jl_unw_stepn, there should largely be no need to make the API breaking change, as all backtrace addresses which the user has access to will already be translated and can be passed directly into StackTraces.lookup(), exactly as was done in julia 1.3. (There's one possible exception to this rule: If the user were looking up the debug info for a pointer to a C function not obtained from a backtrace, they previously would have had to use StackTraces.lookup(ptr+1) to undo the the ptr-1 translation that StackTraces.lookup did prior to #33190. This kind of usage is reflected in the test which does StackTraces.lookup(ctestptr) but it seems like an edge case and not worth breaking the API over.)

…_unw_stepn

Translate return->call address internally in jl_unw_stepn so that users
never have to deal with this subtlety when calling StackTraces.lookup.

Also rearrange the jl_unw_stepn API a little so that interpreter stack
traces can always be fit in to bt_data when calling jl_unw_stepn
incrementally for deep stacks + add test cases for that (Fixes #29695).
For the tests, clean up printing of CodeInfo objects wrapped in
InterpreterIP.
With return->call address tanslation done inside jl_unw_stepn, there
should largely be no need to make this API breaking change, as all
backtrace addresses which the user has access to will already be
translated and can be passed directly into `StackTraces.lookup()`, as
was done previously.

There's one possible exception to this rule: If the user were looking up
the debug info for a pointer to a C function *not* obtained from a
backtrace, they previously would have had to use
`StackTraces.lookup(ptr+1)` to "undo" the the `ptr-1` translation that
`StackTraces.lookup` used to perform prior to #33190.  This seems like
an edge case however and probably not worth breaking the API over.
@c42f c42f requested a review from vtjnash September 25, 2019 06:32
@c42f c42f added error handling Handling of exceptions by Julia or the user bugfix This change fixes an existing bug labels Sep 25, 2019
// * The LLVM unwinder functions step() and setInfoBasedOnIPRegister()
// https://github.com/llvm/llvm-project/blob/master/libunwind/src/UnwindCursor.hpp
// * The way that libunwind handles it in `unw_get_proc_name`:
// https://lists.nongnu.org/archive/html/libunwind-devel/2014-06/msg00025.html
Copy link
Member Author

@c42f c42f Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see, I spent a while looking up what people do for this stuff. AFAICT it seems that return_ip - 1 should be sufficient for our use cases. libunwind does something slightly more fancy, but I guess we don't want to trace through signal handler frames. Also libunwind's dwarf.use_prev_instr field is hidden from us behind an opaque pointer in any case (see unw_get_proc_name).

The LLVM unwinder seems to have a special case for ARM in thumb mode and I tried to do the same for our ARMv7 support but I don't know how to test that yet. @yuyichao does the above use of _CPU_ARM_ look right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see, I spent a while looking up what people do for this stuff. AFAICT it seems that return_ip - 1 should be sufficient for our use cases. libunwind does something slightly more fancy, but I guess we don't want to trace through signal handler frames. Also libunwind's dwarf.use_prev_instr field is hidden from us behind an opaque pointer in any case (see unw_get_proc_name).

As long as we don't give that correct pointer back to libunwind and don't add the offset when we get the backtrace from the signal frame it should be fine.

does the above use of CPU_ARM look right?

LGTM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add the offset when we get the backtrace from the signal frame

Oops, looks like I was doing that part wrong. This should now be cleaned up in the latest commit by avoiding doing the offset for contexts we know have come from a signal handler. I checked that this matches the behavior of libunwind's unw_backtrace() for those contexts on my machine.

…al frames

Add an option to jl_unw_stepn to avoid adjusting the instruction
pointer when we know the cursor derives from a signal frame. (We could
alternatively try to do this with unw_is_signal_frame, but that
wouldn't work for windows and would be an extra function call for each
jl_unw_step which seems a bit unnecessary when we already know the top
frame is the signal frame.)

Also generalize skipping of the first few backtrace frames as needed to
hide the internal backtrace machinery itself by adding a `skip` option
to jl_unw_stepn/jl_backtrace_from_here/record_backtrace. As part of this
also move the workaround for 32-bit windows into the backtrace
internals.

Move Base.backtrace into error.jl as it doesn't need any lookup
functionality from StackTraces to work correctly.
@c42f c42f force-pushed the cjf/bt-lookup-offset branch from 4c6dc81 to 7528266 Compare September 26, 2019 04:58
@c42f
Copy link
Member Author

c42f commented Sep 26, 2019

I've added another commit here to be more careful to not adjust the pointer in the case of signal frames (@vtjnash I think this will also be more correct for use in your Profiler work)

I also generalized the skipping of certain internal frames involved in capturing the backtraces themselves so frames for jl_backtrace_from_here and jl_throw etc can be systematically elided. (For internal debugging it's possible to put them back in by using a negative skip argument to jl_backtrace_from_here.)

@c42f
Copy link
Member Author

c42f commented Oct 2, 2019

Bump. Any takers for a more detailed look? @JeffBezanson?

Perhaps I should just merge this as it allows us to revert a breaking change, passes tests, and also fixes a long standing bug. But having someone read it closely enough to click "Approve" is always appreciated!

@ararslan ararslan requested a review from JeffBezanson October 2, 2019 04:02
Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

@JeffBezanson JeffBezanson merged commit 3f37178 into master Oct 2, 2019
@JeffBezanson JeffBezanson deleted the cjf/bt-lookup-offset branch October 2, 2019 21:56
@c42f
Copy link
Member Author

c42f commented Oct 2, 2019

Thanks guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug error handling Handling of exceptions by Julia or the user
Projects
None yet
Development

Successfully merging this pull request may close these issues.

problems with interpreter backtraces
6 participants