Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving performance of AMR with p4est #638

Merged
merged 24 commits into from
Jun 12, 2021

Conversation

ranocha
Copy link
Member

@ranocha ranocha commented Jun 10, 2021

You can find an extended description of these changes and my reasoning for them in my blog post. I will be happy to get your feedback and improve both this PR and the blog post accompanying this PR.

Using julia --num-threads=1 --check-bounds=no, I get the following results on the current main.

julia> trixi_include(joinpath(examples_dir(), "2d", "elixir_advection_amr.jl"))
[...]
julia> trixi_include(joinpath(examples_dir(), "2d", "elixir_advection_amr.jl"),
          mesh=P4estMesh((1, 1), polydeg=3,
                  coordinates_min=coordinates_min, coordinates_max=coordinates_max,
                  initial_refinement_level=4))
[...]
 ────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                      Time                   Allocations
                                     ──────────────────────   ───────────────────────
          Tot / % measured:               692ms / 98.6%            179MiB / 100%

 Section                     ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                         1.61k    228ms  33.3%   142μs   53.5MiB  29.9%  34.1KiB
[...]
 AMR                             64    213ms  31.2%  3.33ms    121MiB  67.7%  1.89MiB
   coarsen                       64    110ms  16.2%  1.72ms   53.9MiB  30.1%   862KiB
     solver                      64   97.4ms  14.3%  1.52ms   52.1MiB  29.1%   833KiB
     mesh                        64   12.9ms  1.90%   202μs   1.81MiB  1.01%  29.0KiB
       rebalance                128   11.2ms  1.64%  87.3μs   2.00KiB  0.00%    16.0B
       ~mesh~                    64   1.18ms  0.17%  18.4μs   1.29MiB  0.72%  20.6KiB
       coarsen!                  64    590μs  0.09%  9.22μs    534KiB  0.29%  8.34KiB
     ~coarsen~                   64   39.0μs  0.01%   609ns   1.66KiB  0.00%    26.5B
   refine                        64   91.8ms  13.4%  1.43ms   54.7MiB  30.6%   875KiB
     solver                      64   79.3ms  11.6%  1.24ms   54.2MiB  30.3%   867KiB
     mesh                        64   12.5ms  1.83%   195μs    515KiB  0.28%  8.05KiB
       rebalance                128   11.4ms  1.67%  88.9μs   2.00KiB  0.00%    16.0B
       refine                    64    613μs  0.09%  9.58μs     0.00B  0.00%    0.00B
       ~mesh~                    64    476μs  0.07%  7.44μs    513KiB  0.28%  8.02KiB
     ~refine~                    64   36.6μs  0.01%   572ns   1.66KiB  0.00%    26.5B
   indicator                     64   10.8ms  1.58%   168μs   12.4MiB  6.96%   199KiB
   ~AMR~                         64    252μs  0.04%  3.94μs   2.48KiB  0.00%    39.8B
[...]
 initial condition AMR            1   5.54ms  0.81%  5.54ms   2.59MiB  1.45%  2.59MiB
   AMR                            3   5.16ms  0.76%  1.72ms   2.59MiB  1.45%   885KiB
     refine                       3   4.55ms  0.67%  1.52ms   2.01MiB  1.12%   684KiB
       solver                     3   3.68ms  0.54%  1.23ms   1.98MiB  1.11%   676KiB
       mesh                       3    867μs  0.13%   289μs   24.1KiB  0.01%  8.05KiB
         rebalance                6    780μs  0.11%   130μs     96.0B  0.00%    16.0B
         refine                   3   51.8μs  0.01%  17.3μs     0.00B  0.00%    0.00B
         ~mesh~                   3   35.7μs  0.01%  11.9μs   24.0KiB  0.01%  8.02KiB
       ~refine~                   3   3.23μs  0.00%  1.08μs   1.66KiB  0.00%     565B
     indicator                    3    578μs  0.08%   193μs    503KiB  0.27%   168KiB
     ~AMR~                        3   30.5μs  0.00%  10.2μs   98.5KiB  0.05%  32.8KiB
     coarsen                      3    445ns  0.00%   148ns      240B  0.00%    80.0B
   ~initial condition AMR~        1    383μs  0.06%   383μs      848B  0.00%     848B

This PR improves these results to

julia> trixi_include(joinpath(examples_dir(), "2d", "elixir_advection_amr.jl"),
          mesh=P4estMesh((1, 1), polydeg=3,
                  coordinates_min=coordinates_min, coordinates_max=coordinates_max,
                  initial_refinement_level=4))
[...]
 ────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                      Time                   Allocations
                                     ──────────────────────   ───────────────────────
          Tot / % measured:               485ms / 98.3%           82.3MiB / 100%

 Section                     ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                         1.61k    208ms  43.7%   130μs   53.5MiB  65.2%  34.1KiB
[...]
 AMR                             64   72.8ms  15.3%  1.14ms   25.8MiB  31.4%   412KiB
   refine                        64   32.1ms  6.75%   502μs   5.95MiB  7.24%  95.1KiB
     solver                      64   20.5ms  4.31%   321μs   5.44MiB  6.63%  87.1KiB
     mesh                        64   11.6ms  2.43%   181μs    515KiB  0.61%  8.05KiB
       rebalance                128   10.6ms  2.22%  82.7μs   2.00KiB  0.00%    16.0B
       refine                    64    547μs  0.11%  8.55μs     0.00B  0.00%    0.00B
       ~mesh~                    64    452μs  0.09%  7.07μs    513KiB  0.61%  8.02KiB
     ~refine~                    64   26.9μs  0.01%   421ns   1.66KiB  0.00%    26.5B
   coarsen                       64   31.4ms  6.59%   490μs   7.36MiB  8.97%   118KiB
     solver                      64   19.7ms  4.13%   308μs   5.55MiB  6.76%  88.8KiB
     mesh                        64   11.7ms  2.45%   182μs   1.81MiB  2.21%  29.0KiB
       rebalance                128   10.1ms  2.12%  79.0μs   2.00KiB  0.00%    16.0B
       ~mesh~                    64   1.06ms  0.22%  16.5μs   1.29MiB  1.57%  20.6KiB
       coarsen!                  64    489μs  0.10%  7.65μs    534KiB  0.63%  8.34KiB
     ~coarsen~                   64   30.0μs  0.01%   469ns   1.66KiB  0.00%    26.5B
   indicator                     64   9.05ms  1.90%   141μs   12.4MiB  15.2%   199KiB
   ~AMR~                         64    201μs  0.04%  3.14μs   2.48KiB  0.00%    39.8B
[...]

Closes #627

@ranocha ranocha requested review from sloede and efaulhaber June 10, 2021 19:01
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR. I have only one minor suggestion and a couple of questions for me to understand things better. Before merging this, I'd also like for @efaulhaber to approve this PR.

How does the P4estMesh now hold up against the TreeMesh in terms of performance, e.g, for the test setup referenced in the PR description?

src/solvers/dg_curved/containers_2d.jl Outdated Show resolved Hide resolved
src/solvers/dg_curved/containers_2d.jl Show resolved Hide resolved
src/solvers/dg_p4est/containers_2d.jl Show resolved Hide resolved
src/solvers/dg_p4est/containers_2d.jl Show resolved Hide resolved
@efaulhaber
Copy link
Member

efaulhaber commented Jun 10, 2021

Thank you very much! I appreciate the thorough guide on how to approach performance optimizations like that.

Quick Longer than expected question before I look at the code: Have you tried the native @profile output of the newer julia-vscode versions?
Running your benchmark code from step 1 in vscode without using ProfileView first gives me this output:

grafik

There's also an option to show a flame graph:

grafik

I can see that a lot of time is spent in iterate_faces and the mul! lines of calc_jacobian_matrix!, but I can't see anything in the flame graph.
These two seem to be reversed (compared to ProfileView) in that they collect all calls to the low level functions (like LinearAlgebra._generic_matmatmul!), which are shown by ProfileView in the top rows, and display them in the first level. Then, I can find out which of my functions called these by expanding the entries.
I find ProfileView's output much more useful. Here, it immediately shows that iterate_faces is responsible for less than half of the time, while calc_jacobian_matrix! and calc_node_coordinates are responsible for more than half. This can't be seen in vscode because the latter calls are scattered across different low-level function calls.

I'd like to have the first picture in reverse, like a quantitative version of the flame graph by ProfileView. I like these quantitative representations a lot better than the flame graph that only shows me useful information when I hover over it. Do you happen to know if there's any way to show such a quantitative representation of ProfileView's flame graph?

src/mesh/mesh_io.jl Outdated Show resolved Hide resolved
src/solvers/dg_curved/containers_2d.jl Outdated Show resolved Hide resolved
src/solvers/dg_curved/containers_2d.jl Show resolved Hide resolved
src/solvers/dg_p4est/containers_2d.jl Show resolved Hide resolved
src/solvers/dg_p4est/containers_2d.jl Show resolved Hide resolved
src/solvers/dg_p4est/containers_2d.jl Show resolved Hide resolved
@ranocha
Copy link
Member Author

ranocha commented Jun 11, 2021

Have you tried the native @profile output of the newer julia-vscode versions?

No, I have never tried that.

Do you happen to know if there's any way to show such a quantitative representation of ProfileView's flame graph?

What exactly do you mean by that?

@ranocha
Copy link
Member Author

ranocha commented Jun 11, 2021

How does the P4estMesh now hold up against the TreeMesh in terms of performance, e.g, for the test setup referenced in the PR description?

julia> using Trixi

julia> trixi_include(joinpath(examples_dir(), "2d", "elixir_advection_amr.jl"))
[...]
 ─────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                       Time                   Allocations      
                                      ──────────────────────   ───────────────────────
           Tot / % measured:               108ms / 93.1%           16.0MiB / 99.2%    

 Section                      ncalls     time   %tot     avg     alloc   %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────────
 rhs!                          1.60k   60.1ms  60.1%  37.6μs    586KiB  3.60%     375B
[...]
 AMR                              63   28.1ms  28.0%   445μs   13.6MiB  85.6%   221KiB
   refine                         63   17.8ms  17.8%   282μs   5.52MiB  34.8%  89.7KiB
     solver                       63   15.6ms  15.5%   247μs   5.06MiB  31.8%  82.2KiB
     mesh                         63   2.16ms  2.16%  34.2μs    459KiB  2.82%  7.28KiB
       refine_unbalanced!         63   1.91ms  1.91%  30.4μs   8.59KiB  0.05%     140B
       ~mesh~                     63    191μs  0.19%  3.03μs    348KiB  2.14%  5.53KiB
       rebalance!                 63   54.7μs  0.05%   869ns    102KiB  0.63%  1.62KiB
     ~refine~                     63   57.3μs  0.06%   909ns   17.6KiB  0.11%     285B
   coarsen                        63   8.11ms  8.10%   129μs   7.14MiB  45.0%   116KiB
     solver                       63   5.04ms  5.04%  80.0μs   5.18MiB  32.6%  84.2KiB
     mesh                         63   2.47ms  2.47%  39.3μs    302KiB  1.85%  4.79KiB
     ~coarsen~                    63    597μs  0.60%  9.48μs   1.66MiB  10.5%  27.1KiB
   indicator                      63   1.77ms  1.77%  28.1μs    441KiB  2.71%  7.00KiB
   ~AMR~                          63    393μs  0.39%  6.24μs    515KiB  3.17%  8.18KiB
[...]

julia> trixi_include(joinpath(examples_dir(), "2d", "elixir_advection_amr.jl"),
          mesh=P4estMesh((1, 1), polydeg=3,
                  coordinates_min=coordinates_min, coordinates_max=coordinates_max,
                  initial_refinement_level=4))
[...]
 ────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                      Time                   Allocations      
                                     ──────────────────────   ───────────────────────
          Tot / % measured:               575ms / 98.3%           69.6MiB / 100%     

 Section                     ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                         1.61k    232ms  41.0%   144μs   53.5MiB  77.1%  34.1KiB
[...]
 AMR                             64   68.7ms  12.2%  1.07ms   13.5MiB  19.5%   217KiB
   refine                        64   34.0ms  6.02%   532μs   5.95MiB  8.57%  95.1KiB
     solver                      64   21.4ms  3.79%   335μs   5.44MiB  7.84%  87.1KiB
     mesh                        64   12.6ms  2.22%   196μs    515KiB  0.73%  8.05KiB
       rebalance                128   11.6ms  2.05%  90.3μs   2.00KiB  0.00%    16.0B
       refine                    64    580μs  0.10%  9.06μs     0.00B  0.00%    0.00B
       ~mesh~                    64    432μs  0.08%  6.75μs    513KiB  0.72%  8.02KiB
     ~refine~                    64   27.2μs  0.00%   424ns   1.66KiB  0.00%    26.5B
   coarsen                       64   32.5ms  5.75%   507μs   7.36MiB  10.6%   118KiB
     solver                      64   20.3ms  3.59%   317μs   5.55MiB  8.00%  88.8KiB
     mesh                        64   12.2ms  2.16%   190μs   1.81MiB  2.61%  29.0KiB
       rebalance                128   10.6ms  1.88%  83.2μs   2.00KiB  0.00%    16.0B
       ~mesh~                    64   1.03ms  0.18%  16.2μs   1.29MiB  1.86%  20.6KiB
       coarsen!                  64    505μs  0.09%  7.90μs    534KiB  0.75%  8.34KiB
     ~coarsen~                   64   28.7μs  0.01%   448ns   1.66KiB  0.00%    26.5B
   indicator                     64   1.96ms  0.35%  30.6μs    224KiB  0.32%  3.50KiB
   ~AMR~                         64    244μs  0.04%  3.82μs   2.48KiB  0.00%    39.8B
[...]

The code is run twice to exclude compilation time on Julia v1.6.1 using julia --threads=1 --check-bounds=no.

@ranocha ranocha requested review from sloede and efaulhaber June 11, 2021 05:32
@sloede
Copy link
Member

sloede commented Jun 11, 2021

Thanks for the timings! The p4est AMR now looks very competitive in comparison to the the TreeMesh. If we exclude the solver-specific parts from the AMR timer, and cut the time for rebalance in half, we have 7.5ms for TreeMesh and 15.9ms for P4estMesh, just for the mesh part. Given that p4est is much more capable in terms of parallelization, I think this is a very good result already!

sloede
sloede previously approved these changes Jun 11, 2021
@codecov
Copy link

codecov bot commented Jun 11, 2021

Codecov Report

Merging #638 (e379362) into main (f2972d9) will decrease coverage by 0.02%.
The diff coverage is 91.84%.

❗ Current head e379362 differs from pull request most recent head 450a4de. Consider uploading reports for the commit 450a4de to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #638      +/-   ##
==========================================
- Coverage   93.36%   93.33%   -0.03%     
==========================================
  Files         155      156       +1     
  Lines       15353    15465     +112     
==========================================
+ Hits        14334    14434     +100     
- Misses       1019     1031      +12     
Flag Coverage Δ
unittests 93.33% <91.84%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/Trixi.jl 83.33% <ø> (ø)
src/auxiliary/precompile.jl 0.00% <0.00%> (ø)
src/callbacks_step/analysis_dg2d.jl 96.63% <ø> (ø)
src/solvers/dg_common.jl 91.42% <33.33%> (-2.52%) ⬇️
src/solvers/dg_tree/dg.jl 87.96% <86.66%> (-1.22%) ⬇️
src/solvers/dg_tree/basis_lobatto_legendre.jl 88.62% <88.88%> (-0.59%) ⬇️
src/solvers/dg_tree/dg_2d.jl 96.84% <91.66%> (ø)
src/solvers/dg_tree/dg_3d.jl 97.50% <93.75%> (ø)
src/solvers/dg_tree/containers_2d.jl 96.22% <93.84%> (-0.60%) ⬇️
src/solvers/dg_p4est/containers_2d.jl 96.55% <94.23%> (-1.43%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e0c973d...450a4de. Read the comment docs.

@efaulhaber
Copy link
Member

Why is rhs! so slow in your benchmark? And why is it allocating so much? Is this related to #628?

@ranocha
Copy link
Member Author

ranocha commented Jun 11, 2021

TreeMesh:

 Section                      ncalls     time   %tot     avg     alloc   %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────────
 rhs!                          1.60k   63.8ms  67.2%  39.8μs    586KiB  3.60%     375B
   volume integral             1.60k   32.8ms  34.6%  20.5μs     0.00B  0.00%    0.00B
   interface flux              1.60k   8.31ms  8.75%  5.19μs     0.00B  0.00%    0.00B
   surface integral            1.60k   6.25ms  6.58%  3.90μs     0.00B  0.00%    0.00B
   prolong2interfaces          1.60k   4.38ms  4.61%  2.74μs     0.00B  0.00%    0.00B
   prolong2mortars             1.60k   3.88ms  4.09%  2.42μs     0.00B  0.00%    0.00B
   mortar flux                 1.60k   3.24ms  3.41%  2.02μs     0.00B  0.00%    0.00B
   ~rhs!~                      1.60k   2.27ms  2.39%  1.42μs    586KiB  3.60%     375B
   Jacobian                    1.60k   1.26ms  1.33%   789ns     0.00B  0.00%    0.00B
   reset ∂u/∂t                 1.60k   1.23ms  1.30%   770ns     0.00B  0.00%    0.00B
   prolong2boundaries          1.60k   74.6μs  0.08%  46.6ns     0.00B  0.00%    0.00B
   boundary flux               1.60k   30.3μs  0.03%  18.9ns     0.00B  0.00%    0.00B
   source terms                1.60k   27.0μs  0.03%  16.8ns     0.00B  0.00%    0.00B

P4estMesh:

Section                     ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────
 rhs!                         1.61k    247ms  47.3%   154μs   53.5MiB  77.1%  34.1KiB
   interface flux             1.61k   70.5ms  13.5%  43.9μs     0.00B  0.00%    0.00B
   prolong2mortars            1.61k   58.5ms  11.2%  36.4μs   52.9MiB  76.3%  33.8KiB
   volume integral            1.61k   48.7ms  9.35%  30.3μs     0.00B  0.00%    0.00B
   prolong2interfaces         1.61k   42.3ms  8.11%  26.3μs     0.00B  0.00%    0.00B
   mortar flux                1.61k   10.1ms  1.94%  6.32μs     0.00B  0.00%    0.00B
   surface integral           1.61k   7.00ms  1.34%  4.36μs     0.00B  0.00%    0.00B
   Jacobian                   1.61k   4.78ms  0.92%  2.98μs     0.00B  0.00%    0.00B
   ~rhs!~                     1.61k   3.16ms  0.61%  1.97μs    588KiB  0.83%     375B
   reset ∂u/∂t                1.61k   1.38ms  0.26%   858ns     0.00B  0.00%    0.00B
   prolong2boundaries         1.61k   70.2μs  0.01%  43.7ns     0.00B  0.00%    0.00B
   source terms               1.61k   37.0μs  0.01%  23.1ns     0.00B  0.00%    0.00B
   boundary flux              1.61k   35.1μs  0.01%  21.8ns     0.00B  0.00%    0.00B

Thus, the allocations are indeed caused by #628, but that's not the only reason for the reduced performance compared to the TreeMesh; the surface stuff is much more expensive, too... Maybe open an issue? I get very similar timings of the rhs! on main...

@efaulhaber
Copy link
Member

Do you happen to know if there's any way to show such a quantitative representation of ProfileView's flame graph?

What exactly do you mean by that?

Something like my first screenshot above, with exact timings of each part (instead of a graph that only shows qualitative information). Where I could see the lowest row of the flame graph with exact timings, expand it, and then the row above that is shown with exact timings.

Copy link
Member

@efaulhaber efaulhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small thing and a few comment suggestions, otherwise everything looks good to me.
Good job! It's really fast now!

src/solvers/dg_curved/containers_2d.jl Outdated Show resolved Hide resolved
src/solvers/dg_curved/containers_2d.jl Outdated Show resolved Hide resolved
src/solvers/dg_curved/containers_2d.jl Outdated Show resolved Hide resolved
src/solvers/dg_p4est/containers.jl Outdated Show resolved Hide resolved
nodes)
basis::LobattoLegendreBasis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing other than the basis' nodes is used here, right? This function will be called by Trixi2Vtk where we only have the nodes and no basis. I should've commented that, sorry!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind changing that part in Trixi2Vtk? I would prefer passing the basis for consistency with other meshes and to enable multiple dispatch. For example, other meshes use something like

  # Get cell length in reference mesh
  dx = 2 / size(mesh, 1)
  dy = 2 / size(mesh, 2)

  # Calculate node coordinates of reference mesh
  cell_x_offset = -1 + (cell_x-1) * dx + dx/2
  cell_y_offset = -1 + (cell_y-1) * dy + dy/2

which is specific to the nodes of the LobattoLegendreBasis. In particular, it won't work for FD methods implemented in #617.

What do you think, @sloede?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong feelings either way. However, I'd suggest to follow the principle of not implementing something for a feature that isn't even there yet (YAGNI). Thus if changing this for dispatch is only needed for a future version of FD-SBP, I wouldn't change the interface right now.

On the other hand, if it would remove currently existing features if we continue to use plain nodes instead of passing the basis, then we should use the implementation proposed by Hendrik.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, FD-SBP are already in main, so I would like to keep this version here

@ranocha
Copy link
Member Author

ranocha commented Jun 11, 2021

Do you happen to know if there's any way to show such a quantitative representation of ProfileView's flame graph?

What exactly do you mean by that?

Something like my first screenshot above, with exact timings of each part (instead of a graph that only shows qualitative information). Where I could see the lowest row of the flame graph with exact timings, expand it, and then the row above that is shown with exact timings.

As far as I know, the length of the bars in the output of ProfileView is quantitative (but there is no axis/legend). Might be worth opening an issue there or asking on discourse. If you get an answer, I would also be interested in it.

@ranocha ranocha requested review from efaulhaber and sloede June 11, 2021 18:22
sloede
sloede previously approved these changes Jun 11, 2021
efaulhaber
efaulhaber previously approved these changes Jun 11, 2021
Copy link
Member

@efaulhaber efaulhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

src/mesh/mesh_io.jl Outdated Show resolved Hide resolved
src/solvers/dg_curved/containers_2d.jl Outdated Show resolved Hide resolved
@ranocha ranocha dismissed stale reviews from efaulhaber and sloede via 450a4de June 12, 2021 05:22
@ranocha ranocha merged commit c486d37 into trixi-framework:main Jun 12, 2021
@ranocha
Copy link
Member Author

ranocha commented Jun 14, 2021

Do you happen to know if there's any way to show such a quantitative representation of ProfileView's flame graph?

What exactly do you mean by that?

Something like my first screenshot above, with exact timings of each part (instead of a graph that only shows qualitative information). Where I could see the lowest row of the flame graph with exact timings, expand it, and then the row above that is shown with exact timings.

As far as I know, the length of the bars in the output of ProfileView is quantitative (but there is no axis/legend). Might be worth opening an issue there or asking on discourse. If you get an answer, I would also be interested in it.

@efaulhaber: You can use PProf.jl for this. After running @profview, you can call PProf.pprof() to see another representation of the profiling result. If you choose "VIEW > Flame Graph", you get something similar to what you have described above.

@efaulhaber
Copy link
Member

That's pretty cool, thanks! A bit chaotic for bigger profiles, but I guess I like it better than the flame graphs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance of AMR with P4estMesh
3 participants