Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial implementation of differential flamegraphs #20

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Arkoniak
Copy link
Contributor

Sorry for the late PR, but here it is finally. This is pure FlameGrpahs version of davidanthoff/ProfileVega.jl#9 and it was discussed also in #7

Also sorry for the long read, but I am very unhappy with this PR, but hope that it can be improved. Below is the list of things, that I found unsatisfactory.

First of all, the problem that I was trying to solve looked like this. We have two functions

function f1()
   A = randn(100,100,20)
   Am = mapslices(sum, A; dims=2)
end

function f2()
   A = randn(50,50,20)
   Am = mapslices(sum, A; dims=2)
end

and we are trying to compare their performance. From this definition immediately follows the notion of the "wildcard" nodes, since we have common calls but they happen inside different functions on different lines and we cannot compare them per se. From the backtraces point of view, the node is "wildcard" if it for two sequences of backtraces calls before and after this node are the same.

Another problem that may arise is that some nodes can appear or disappear between target and baseline

function f3()
  A = randn(50,50,20)
  println(A)
  Am = mapslices(sum, A; dims=2)
end

It's basically the same function as f2, only it has an additional child. That is why I am using LCS approach, which is taking into account appearing and disappearing nodes.

Now the main problems, that are not solved in the current approach

  1. LCS can't process transpositions, in such a case only one node will be taken into account. But the addition of transpositions leads to all permutations and LCS can be changed to the simple Set comparison. But I am afraid this approach may result in very strange matching.

  2. LCS approach compares only the same levels of the two flame charts. So if someone does something like this

function f2a()
  randn(50, 50, 20)
end

function f2()
  A = f2a()
  Am = mapslices(sum, A; dims=2)
end

then current matching scheme will be broken.

Other less fundamental issues in the current implementation

  1. DataDiffNode is almost a copy of DataNode. There should be a better way than to copy the existing structure. But I also dislike the idea of adding the delta parameter to the original structure, it looks inappropriate.
  2. eq argument in lcs. I wasn't able to figure out an idiomatic way to compare two nodes. Because they are not equal (at least their span is different) and I do not remember why I didn't define this function externally. Suppose I wanted to have more flexible lcs function
  3. wildcard nodes currently defined through regex of the string representation of their stackframe, which is fragile, I am afraid. But I wasn't able to figure out a better way.
  4. I am also concerned about simplify function. During my profiling I constantly saw very narrow branches and I do not know, whether I did something wrong or is it a bug of current Profile.

@codecov
Copy link

codecov bot commented Feb 11, 2020

Codecov Report

Merging #20 into master will decrease coverage by 5.45%.
The diff coverage is 62.74%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #20      +/-   ##
==========================================
- Coverage   94.73%   89.28%   -5.46%     
==========================================
  Files           5        6       +1     
  Lines         228      280      +52     
==========================================
+ Hits          216      250      +34     
- Misses         12       30      +18
Impacted Files Coverage Δ
src/FlameGraphs.jl 33.33% <ø> (ø) ⬆️
src/diffgraph.jl 62.74% <62.74%> (ø)
src/io.jl 97.95% <0%> (+0.04%) ⬆️
src/graph.jl 98.78% <0%> (+1.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8bc761b...a0e13ba. Read the comment docs.

@Arkoniak
Copy link
Contributor Author

I'll add more tests and increase coverage after more important questions will be resolved.

@Arkoniak Arkoniak requested a review from timholy February 12, 2020 06:01
Copy link
Owner

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not late at all. Sorry it's taken me a couple of days to clear enough off my plate to get to this.

This looks very good. I recognize that your concerns are real. Perhaps we can start by discussing the transpositions issue; nominally this package should sort the children, but I think if there's an intervening C node then that doesn't happen. Is that the main source of trouble wrt transpositions?

@@ -0,0 +1,141 @@
"""
data = NodeDiffData(sf::StackFrame, status::UInt8, span::UnitRange{Int}, delta::Int)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead add a delta::Union{Nothing,Int} field to NodeData, with default value nothing? Alternatively, we could make it a parametric type, if you need to dispatch on it.


# sometimes root node has random narrow branches, in order to compare
# profiles it's easier to remove this spurious branches
function simplify!(node::Node)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably too generic a name for an operation that, AFAICT, discards all but the dominant child. Maybe call it singlechild!?

Copy link
Contributor Author

@Arkoniak Arkoniak Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, but if you do not mind, I'll use name dominantchild!, because it shows exactly what I am trying to achieve here.

As a side note, if put it this way, it looks generic enough, that may be at some point it could be ported to LeftChildRightSiblingTrees, as a special procedure of sibling pruning. Of course, currently it needs some specific knowledge about nodes in order to compare them.

May be even implement filter! function in LeftChildRightSiblingTrees? Then here it would like this

dominant = maximum(span.(node))
filter!(node, x -> span(x) < dominant)

This approach is not needed for this particular problem, I just love high level abstractions :-)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dominantchild! would be fine.

memo[init] = LCS((node1, node2), 1) * lcs(node1.sibling, node2.sibling, eq, is_wildcard, memo)
end
else
seq1 = islastsibling(node2) ? LCS{T}([], 0) : lcs(node1, node2.sibling, eq, is_wildcard, memo)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section could probably use a comment, particularly for seq3 and seq4.

@timholy
Copy link
Owner

timholy commented Feb 14, 2020

For my own reference (you may already know about these and may have considered them):

I doubt we want the heavy dependencies of a full LP solver and LightGraphs, but perhaps for trees (which are a very special class of graphs) there is an easier solution. I haven't begun reading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants