-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile.jl improvements #33190
Profile.jl improvements #33190
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on a somewhat superficial read, I think this is a nice change.
@@ -98,31 +96,28 @@ end | |||
|
|||
|
|||
""" | |||
lookup(pointer::Union{Ptr{Cvoid}, UInt}) -> Vector{StackFrame} | |||
lookupat(pointer::Union{Ptr{Cvoid}, UInt}) -> Vector{StackFrame} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the name change worth it? I could imagine packages calling lookup
. Or is this about handling the off-by-one and you're deliberately breaking direct calls here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this change makes me uncomfortable.
@@ -224,12 +217,12 @@ all frames above the specified function). Primarily used to remove `StackTraces` | |||
from the `StackTrace` prior to returning it. | |||
""" | |||
function remove_frames!(stack::StackTrace, name::Symbol) | |||
splice!(stack, 1:something(findlast(frame -> frame.func == name, stack), 0)) | |||
deleteat!(stack, 1:something(findlast(frame -> frame.func == name, stack), 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow. Suggests this needs a test, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have the same side-effect, splice!
just also returns a copy of the deleted values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. I was confusing this with our renaming of (un)shift!
continue | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it would be nice to have some way of annotating tree-nodes that got collapsed onto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! What would be most useful to track? I think some sort of max and average depth, or other summary statistics?
import Base.StackTraces: lookupat, UNKNOWN, show_spec_linfo, StackFrame | ||
|
||
# deprecated functions: use `getdict` instead | ||
lookup(ip::UInt) = lookupat(convert(Ptr{Cvoid}, ip) - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the name change to fix the off-by-one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's to make it easier for consumers to detect the change, if necessary. And the deprecation here is because it's where ProfileView expected it. OTOH, this would now subtract 2 for ProfileView (since fetch is now already returning "corrected" values), so perhaps I should define const lookup = lookupat
? or ProfileView should switch to using getdict
sometime before the v1.4 release?
|
||
- `noisefloor` -- Limits frames that exceed the heuristic noise floor of the sample (only applies to format `:tree`). | ||
A suggested value to try for this is 2.0 (the default is 0). This parameter hides samples for which `n <= noisefloor * √N`, | ||
where `n` is the number of samples on this line, and `N` is the number of samples for the callee. | ||
|
||
- `mincount` -- Limits the printout to only those lines with at least `mincount` occurrences. | ||
|
||
- `recur` -- Controls the recursion handling in `:tree` format. `off` (default) prints the tree as normal. `flat` instead | ||
compresses any recursion (by ip), showing the approximate effect of converting any self-recursion into an iterator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saying it's "like an iterator" leaves ambiguous the case where f
calls g
calls h
calls f
... and one wonders where this chain gets broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that doesn't really matter (where it gets broken), since the tree should look identical in that case
@@ -299,104 +318,113 @@ error_codes = Dict( | |||
""" | |||
fetch() -> data | |||
|
|||
Returns a reference to the internal buffer of backtraces. Note that subsequent operations, | |||
like [`clear`](@ref), can affect `data` unless you first make a copy. Note that the | |||
Returns a copy of the buffer of profile backtraces. Note that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember why it was this way before, but this seems like a good change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it was to be zero-copy, but it's just not that useful given all the other work this needs to do, and simpler to be able to pre-process the buffer here.
933713f
to
5df21c3
Compare
We have long been doing an off-by-one computation in `lookup`. This is often convenient, but a bit confusing in cases where you then have to then reverse the adjustment for the correction. As we continue working towards more precise profile results, this becomes somewhat more significant. So try to cleanup some accumulated code-cruft around it and in related code.
(as in garbage-collected--these values cannot be used)
Prints the truncated repeating unicode pattern indent_s in tree view for easier visual sighting.
Ensure that text widths remains positive, even when the terminal is too small to fit everything.
Overhead time is the time spent in that function itself. It can also be sorted by it.
Under ragged recursion, the tree profile can be difficult to interpret since it becomes unclear whether time is being spent in the recursion or doing work for the function itself. To separate those effects, we can subtract off the cost of the recursion from the accumulated sum by rearranging the tree to only include the unique (non-recursive) subgraphs. In many cases, this should be similar in effect to re-writting all recursive functions as iterators. This is enabled with the new option :recur \in (:off, :flat, :flatc) for the format=:tree view. Implementing it for the :flat view is left as an exercise for the reader (although it is believed to be much easier).
Previously, we'd double-count any recursive function for each repetition of that function in a stack snapshot. Now we properly account for the time spent in any particular frame by only recording a single count for it per stack snapshot, regardless of how often that frame appeared in that stacktrace due to recursion (and independent of inlining).
5df21c3
to
0002f0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great cleanup and nice improvements to Profile!
As much as I like this PR in general, the fact that the off-by-one related to the lookup()
->lookupat()
is exposed at the julia layer makes me really uncomfortable. Surely no user should have to care about the detail of the platform ABI?
According to #33277 (comment), the -1
is also incorrect for some platform ABIs.
lkups = StackTraces.lookup(frame) | ||
for outer lkup in lkups | ||
for ip in bt | ||
ip isa Base.InterpreterIP || (ip -= 1) # convert from return stack to call stack (for inlining info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I think this answers my question at #33277 (comment)
@@ -98,31 +96,28 @@ end | |||
|
|||
|
|||
""" | |||
lookup(pointer::Union{Ptr{Cvoid}, UInt}) -> Vector{StackFrame} | |||
lookupat(pointer::Union{Ptr{Cvoid}, UInt}) -> Vector{StackFrame} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this change makes me uncomfortable.
@@ -626,9 +626,9 @@ typedef int bt_cursor_t; | |||
// Special marker in backtrace data for encoding interpreter frames | |||
#define JL_BT_INTERP_FRAME (((uintptr_t)0)-1) | |||
size_t rec_backtrace(uintptr_t *data, size_t maxsize) JL_NOTSAFEPOINT; | |||
size_t rec_backtrace_ctx(uintptr_t *data, size_t maxsize, bt_context_t *ctx) JL_NOTSAFEPOINT; | |||
size_t rec_backtrace_ctx(uintptr_t *data, size_t maxsize, bt_context_t *ctx, int add_interp_frames) JL_NOTSAFEPOINT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent change. I was thinking about this exact problem (Profile getting confused by- / having overhead from- interpreter frames) while making #33277.
@@ -668,7 +668,7 @@ static void *signal_listener(void *arg) | |||
if (critical) { | |||
bt_size += rec_backtrace_ctx(bt_data + bt_size, | |||
JL_MAX_BT_SIZE / jl_n_threads - 1, | |||
signal_context); | |||
signal_context, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over at #33277 I made the critical error printing aware of interpreter frames (printing is marked as JL_NOTSAFEPOINT and passes the static analyzer). So hopefully we could set this 0
to 1
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s no gc root for that value here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, I forgot this version of bt_data
is not rooted and other threads may run the GC.
If doexit
, we could just stop the world to print this backtrace data because we're about to attempt to abort anyway and a perf hit is the least of our worries?
It's a little extra complexity though.
With return->call address tanslation done inside jl_unw_stepn, there should largely be no need to make this API breaking change, as all backtrace addresses which the user has access to will already be translated and can be passed directly into `StackTraces.lookup()`, as was done previously. There's one possible exception to this rule: If the user were looking up the debug info for a pointer to a C function *not* obtained from a backtrace, they previously would have had to use `StackTraces.lookup(ptr+1)` to "undo" the the `ptr-1` translation that `StackTraces.lookup` used to perform prior to #33190. This seems like an edge case however and probably not worth breaking the API over.
With return->call address tanslation done inside jl_unw_stepn, there should largely be no need to make this API breaking change, as all backtrace addresses which the user has access to will already be translated and can be passed directly into `StackTraces.lookup()`, as was done previously. There's one possible exception to this rule: If the user were looking up the debug info for a pointer to a C function *not* obtained from a backtrace, they previously would have had to use `StackTraces.lookup(ptr+1)` to "undo" the the `ptr-1` translation that `StackTraces.lookup` used to perform prior to #33190. This seems like an edge case however and probably not worth breaking the API over.
With return->call address tanslation done inside jl_unw_stepn, there should largely be no need to make this API breaking change, as all backtrace addresses which the user has access to will already be translated and can be passed directly into `StackTraces.lookup()`, as was done previously. There's one possible exception to this rule: If the user were looking up the debug info for a pointer to a C function *not* obtained from a backtrace, they previously would have had to use `StackTraces.lookup(ptr+1)` to "undo" the the `ptr-1` translation that `StackTraces.lookup` used to perform prior to #33190. This seems like an edge case however and probably not worth breaking the API over.
This PR comes with a blog post! JuliaLang/www.julialang.org#447