-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add logic to check orphan nodes and attempt to fix parents of the orphan nodes in best effort to avoid dangling nodes #3
Conversation
…han nodes in best effort to avoid dangling nodes
@JianFangAtRai Could the thing we try to achieve here be described like this (using a little bit of graph lingo): If yes, it feels like iterative filtering shouldn't be needed. If not, maybe we should rethink how we filter out nodes so that we minimize this effort of adding nodes back. |
Thanks for the suggestions. The current sampling logic is not based on graph and it does not preserve the parents, even the uber root could be filtered out based on the filtering function. As we can see, this logic could introduce a huge number of disconnected subgraphs, for example, 14336673 subgraphs for the 11GB snapshot data. The idea in this PR is to add the parent nodes back so that we could have the path to the uber root. The iteration is not to filter out nodes, but to add the parents back. Due to the nature of the graph, the adding back process takes multiple iterations in this PR since one node back in the graph could affect multiple paths due to cross references of objects. Just like you to be aware that I start to work on SPCS projects and might not have time to continue working on this. |
@JianFangAtRai Thanks for explaining. When I tried this approach on my 10 GiB snapshot, It took ~6 minutes to finish, which made me wonder if we could save all this back-and-forth and instead of removing parents in the filter step and re-adding them iteratively later, we could filter out whole subgraphs. It also seems that there is not just a single root -- I see 1405 connected components in the snapshot you added in this PR: using Graphs, HeapSnapshotUtils
nodes, strings, _... = HeapSnapshotUtils.parse_nodes("test/data/julia-test.heapsnapshot");
function heap_graph(nodes)
g = DiGraph{UInt32}()
j = 0
add_vertices!(g, length(nodes))
for i in 1:length(nodes)
for _ in 1:nodes.edge_count[i]
j += 1
e = nodes.edges.to_pos[j]
add_edge!(g, i, e)
end
end
return g
end
ccs = connected_components(heap_graph(nodes))
sort!(ccs, by=length, rev=true)
@info length(ccs)
for cc in first(ccs, 20)
println(length(cc), " => ", strings[nodes.name_index[first(cc, 20)] .+ 1])
end
While the this PR assumed everything is connected to the very first node in the graph, correct? Or maybe I'm missing something? |
Thanks for checking, the test file is a downsampled heapsnapshot using this subsampling logic from the original snapshot, which is over 100MB and is too big to check into the git repository. As a result, it isn't the original full graph. For, 352696 => ["", "GC roots", "GC finalizer list roots", "" is the uber root, "GC roots" and "GC finalizer list roots" are the subroots under the uber root. |
@JianFangAtRai Thanks for clarifying! I went ahead and tried to improve the filtering logic so that no iterative step with re-adding parents is necessary -- can you please review #4? I think its a viable alternative to the approach in this PR and has some nice performance improvements. |
Busy with SPCS project now. Will take a look once I have some free time. Thanks. |
I'm going to close this, since we're using the alternative approach, at least for now. Thanks for looking into this! |
This PR is related to the issue (#1).
From heavy testing and debugging, it seems the current sampling logic could introduce dangling nodes, i.e., orphan nodes without the path to the uber root node. Even we attempted to resolve this issue by removing the orphan nodes' children in this PR (#2), unfortunately, it is not enough due to the fact that the snapshot is actually a very complex graph with many object cross references instead of a tree structure.
To resolve this issue, this PR proposed a best effort way to fix the orphan nodes' parents as follows:
From my tests, the orphan nodes are eventually gone for snapshots that I took use the latest Julia 1.9.2+RAI.
For example, here is the output for a snapshot of an idle Julia process:
The output of 11GB heap snapshot of our RAI engine: