Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile.jl improvements #33190

Merged
merged 11 commits into from
Sep 17, 2019
Merged

Profile.jl improvements #33190

merged 11 commits into from
Sep 17, 2019

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Sep 7, 2019

This PR comes with a blog post! JuliaLang/www.julialang.org#447

Copy link
Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a somewhat superficial read, I think this is a nice change.

@@ -98,31 +96,28 @@ end


"""
lookup(pointer::Union{Ptr{Cvoid}, UInt}) -> Vector{StackFrame}
lookupat(pointer::Union{Ptr{Cvoid}, UInt}) -> Vector{StackFrame}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the name change worth it? I could imagine packages calling lookup. Or is this about handling the off-by-one and you're deliberately breaking direct calls here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this change makes me uncomfortable.

@@ -224,12 +217,12 @@ all frames above the specified function). Primarily used to remove `StackTraces`
from the `StackTrace` prior to returning it.
"""
function remove_frames!(stack::StackTrace, name::Symbol)
splice!(stack, 1:something(findlast(frame -> frame.func == name, stack), 0))
deleteat!(stack, 1:something(findlast(frame -> frame.func == name, stack), 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. Suggests this needs a test, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have the same side-effect, splice! just also returns a copy of the deleted values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. I was confusing this with our renaming of (un)shift!

continue
end
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it would be nice to have some way of annotating tree-nodes that got collapsed onto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! What would be most useful to track? I think some sort of max and average depth, or other summary statistics?

import Base.StackTraces: lookupat, UNKNOWN, show_spec_linfo, StackFrame

# deprecated functions: use `getdict` instead
lookup(ip::UInt) = lookupat(convert(Ptr{Cvoid}, ip) - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the name change to fix the off-by-one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's to make it easier for consumers to detect the change, if necessary. And the deprecation here is because it's where ProfileView expected it. OTOH, this would now subtract 2 for ProfileView (since fetch is now already returning "corrected" values), so perhaps I should define const lookup = lookupat? or ProfileView should switch to using getdict sometime before the v1.4 release?

stdlib/Profile/src/Profile.jl Outdated Show resolved Hide resolved

- `noisefloor` -- Limits frames that exceed the heuristic noise floor of the sample (only applies to format `:tree`).
A suggested value to try for this is 2.0 (the default is 0). This parameter hides samples for which `n <= noisefloor * √N`,
where `n` is the number of samples on this line, and `N` is the number of samples for the callee.

- `mincount` -- Limits the printout to only those lines with at least `mincount` occurrences.

- `recur` -- Controls the recursion handling in `:tree` format. `off` (default) prints the tree as normal. `flat` instead
compresses any recursion (by ip), showing the approximate effect of converting any self-recursion into an iterator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saying it's "like an iterator" leaves ambiguous the case where f calls g calls h calls f... and one wonders where this chain gets broken.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that doesn't really matter (where it gets broken), since the tree should look identical in that case

@@ -299,104 +318,113 @@ error_codes = Dict(
"""
fetch() -> data

Returns a reference to the internal buffer of backtraces. Note that subsequent operations,
like [`clear`](@ref), can affect `data` unless you first make a copy. Note that the
Returns a copy of the buffer of profile backtraces. Note that the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember why it was this way before, but this seems like a good change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it was to be zero-copy, but it's just not that useful given all the other work this needs to do, and simpler to be able to pre-process the buffer here.

@vtjnash vtjnash force-pushed the jn/profile-recurflat branch 2 times, most recently from 933713f to 5df21c3 Compare September 10, 2019 22:18
We have long been doing an off-by-one computation in `lookup`.
This is often convenient, but a bit confusing in cases where you then have to then reverse the adjustment for the correction.
As we continue working towards more precise profile results, this becomes somewhat more significant.
So try to cleanup some accumulated code-cruft around it and in related code.
(as in garbage-collected--these values cannot be used)
Prints the truncated repeating unicode pattern indent_s in tree view
for easier visual sighting.
Ensure that text widths remains positive, even when the terminal is too small to fit everything.
Overhead time is the time spent in that function itself.
It can also be sorted by it.
Under ragged recursion, the tree profile can be difficult to interpret
since it becomes unclear whether time is being spent in the recursion
or doing work for the function itself. To separate those effects,
we can subtract off the cost of the recursion from the accumulated sum
by rearranging the tree to only include the unique (non-recursive)
subgraphs. In many cases, this should be similar in effect to re-writting
all recursive functions as iterators.

This is enabled with the new option :recur \in (:off, :flat, :flatc)
for the format=:tree view. Implementing it for the :flat view is left
as an exercise for the reader (although it is believed to be much easier).
Previously, we'd double-count any recursive function for each repetition
of that function in a stack snapshot.
Now we properly account for the time spent in any particular
frame by only recording a single count for it per stack snapshot,
regardless of how often that frame appeared in that stacktrace
due to recursion (and independent of inlining).
@vtjnash vtjnash force-pushed the jn/profile-recurflat branch from 5df21c3 to 0002f0a Compare September 13, 2019 18:40
@vtjnash vtjnash merged commit c40a93b into master Sep 17, 2019
@delete-merged-branch delete-merged-branch bot deleted the jn/profile-recurflat branch September 17, 2019 14:29
Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great cleanup and nice improvements to Profile!

As much as I like this PR in general, the fact that the off-by-one related to the lookup()->lookupat() is exposed at the julia layer makes me really uncomfortable. Surely no user should have to care about the detail of the platform ABI?

According to #33277 (comment), the -1 is also incorrect for some platform ABIs.

lkups = StackTraces.lookup(frame)
for outer lkup in lkups
for ip in bt
ip isa Base.InterpreterIP || (ip -= 1) # convert from return stack to call stack (for inlining info)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I think this answers my question at #33277 (comment)

@@ -98,31 +96,28 @@ end


"""
lookup(pointer::Union{Ptr{Cvoid}, UInt}) -> Vector{StackFrame}
lookupat(pointer::Union{Ptr{Cvoid}, UInt}) -> Vector{StackFrame}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this change makes me uncomfortable.

@@ -626,9 +626,9 @@ typedef int bt_cursor_t;
// Special marker in backtrace data for encoding interpreter frames
#define JL_BT_INTERP_FRAME (((uintptr_t)0)-1)
size_t rec_backtrace(uintptr_t *data, size_t maxsize) JL_NOTSAFEPOINT;
size_t rec_backtrace_ctx(uintptr_t *data, size_t maxsize, bt_context_t *ctx) JL_NOTSAFEPOINT;
size_t rec_backtrace_ctx(uintptr_t *data, size_t maxsize, bt_context_t *ctx, int add_interp_frames) JL_NOTSAFEPOINT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent change. I was thinking about this exact problem (Profile getting confused by- / having overhead from- interpreter frames) while making #33277.

@@ -668,7 +668,7 @@ static void *signal_listener(void *arg)
if (critical) {
bt_size += rec_backtrace_ctx(bt_data + bt_size,
JL_MAX_BT_SIZE / jl_n_threads - 1,
signal_context);
signal_context, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over at #33277 I made the critical error printing aware of interpreter frames (printing is marked as JL_NOTSAFEPOINT and passes the static analyzer). So hopefully we could set this 0 to 1 there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no gc root for that value here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, I forgot this version of bt_data is not rooted and other threads may run the GC.

If doexit, we could just stop the world to print this backtrace data because we're about to attempt to abort anyway and a perf hit is the least of our worries?

It's a little extra complexity though.

c42f added a commit that referenced this pull request Sep 25, 2019
With return->call address tanslation done inside jl_unw_stepn, there
should largely be no need to make this API breaking change, as all
backtrace addresses which the user has access to will already be
translated and can be passed directly into `StackTraces.lookup()`, as
was done previously.

There's one possible exception to this rule: If the user were looking up
the debug info for a pointer to a C function *not* obtained from a
backtrace, they previously would have had to use
`StackTraces.lookup(ptr+1)` to "undo" the the `ptr-1` translation that
`StackTraces.lookup` used to perform prior to #33190.  This seems like
an edge case however and probably not worth breaking the API over.
c42f added a commit that referenced this pull request Sep 25, 2019
With return->call address tanslation done inside jl_unw_stepn, there
should largely be no need to make this API breaking change, as all
backtrace addresses which the user has access to will already be
translated and can be passed directly into `StackTraces.lookup()`, as
was done previously.

There's one possible exception to this rule: If the user were looking up
the debug info for a pointer to a C function *not* obtained from a
backtrace, they previously would have had to use
`StackTraces.lookup(ptr+1)` to "undo" the the `ptr-1` translation that
`StackTraces.lookup` used to perform prior to #33190.  This seems like
an edge case however and probably not worth breaking the API over.
c42f added a commit that referenced this pull request Sep 25, 2019
With return->call address tanslation done inside jl_unw_stepn, there
should largely be no need to make this API breaking change, as all
backtrace addresses which the user has access to will already be
translated and can be passed directly into `StackTraces.lookup()`, as
was done previously.

There's one possible exception to this rule: If the user were looking up
the debug info for a pointer to a C function *not* obtained from a
backtrace, they previously would have had to use
`StackTraces.lookup(ptr+1)` to "undo" the the `ptr-1` translation that
`StackTraces.lookup` used to perform prior to #33190.  This seems like
an edge case however and probably not worth breaking the API over.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants