Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p4est surface performance 3D #783

Merged
merged 8 commits into from
Aug 13, 2021
Merged

p4est surface performance 3D #783

merged 8 commits into from
Aug 13, 2021

Conversation

ranocha
Copy link
Member

@ranocha ranocha commented Aug 13, 2021

I ported the 2D surface performance improvements for the P4estMesh of #767 to 3D.

Current main (after compilation, using julia --check-bounds=no --num-threads-1):

julia> trixi_include("examples/p4est_3d_dgsem/elixir_advection_unstructured_curved.jl",
                     save_solution=TrivialCallback(), save_restart=TrivialCallback())
[...]
 ───────────────────────────────────────────────────────────────────────────────
            Trixi.jl                    Time                   Allocations
                                ──────────────────────   ───────────────────────
        Tot / % measured:            2.43s / 95.8%           7.42MiB / 71.2%

 Section                ncalls     time   %tot     avg     alloc   %tot      avg
 ───────────────────────────────────────────────────────────────────────────────
 rhs!                      206    2.19s  94.4%  10.7ms   84.9KiB  1.57%     422B
   interface flux          206    1.05s  45.3%  5.11ms     0.00B  0.00%    0.00B
   volume integral         206    435ms  18.7%  2.11ms     0.00B  0.00%    0.00B
   prolong2interfaces      206    430ms  18.5%  2.09ms     0.00B  0.00%    0.00B
   surface integral        206    103ms  4.45%   502μs     0.00B  0.00%    0.00B
   Jacobian                206   71.0ms  3.06%   345μs     0.00B  0.00%    0.00B
   boundary flux           206   66.0ms  2.84%   320μs     0.00B  0.00%    0.00B
   reset ∂u/∂t             206   14.0ms  0.60%  68.1μs     0.00B  0.00%    0.00B
   prolong2boundaries      206   12.7ms  0.55%  61.8μs     0.00B  0.00%    0.00B
   ~rhs!~                  206   9.21ms  0.40%  44.7μs   84.9KiB  1.57%     422B
   prolong2mortars         206   82.8μs  0.00%   402ns     0.00B  0.00%    0.00B
   mortar flux             206   47.1μs  0.00%   229ns     0.00B  0.00%    0.00B
   source terms            206   14.4μs  0.00%  70.0ns     0.00B  0.00%    0.00B
 analyze solution            2   72.5ms  3.12%  36.2ms   5.20MiB  98.4%  2.60MiB
 calculate dt               42   57.3ms  2.46%  1.36ms     0.00B  0.00%    0.00B
 ───────────────────────────────────────────────────────────────────────────────

julia> trixi_include("examples/p4est_3d_dgsem/elixir_advection_cubed_sphere.jl",
                     save_solution=TrivialCallback(), save_restart=TrivialCallback())
[...]
 ───────────────────────────────────────────────────────────────────────────────
            Trixi.jl                    Time                   Allocations
                                ──────────────────────   ───────────────────────
        Tot / % measured:           67.7ms / 97.4%            318KiB / 80.7%

 Section                ncalls     time   %tot     avg     alloc   %tot      avg
 ───────────────────────────────────────────────────────────────────────────────
 rhs!                      261   62.3ms  94.4%   239μs    105KiB  40.8%     410B
   interface flux          261   17.5ms  26.6%  67.1μs     0.00B  0.00%    0.00B
   boundary flux           261   12.4ms  18.8%  47.6μs     0.00B  0.00%    0.00B
   prolong2interfaces      261   11.7ms  17.7%  44.8μs     0.00B  0.00%    0.00B
   volume integral         261   11.1ms  16.9%  42.7μs     0.00B  0.00%    0.00B
   ~rhs!~                  261   4.01ms  6.08%  15.4μs    105KiB  40.8%     410B
   surface integral        261   2.15ms  3.26%  8.24μs     0.00B  0.00%    0.00B
   Jacobian                261   1.77ms  2.69%  6.80μs     0.00B  0.00%    0.00B
   prolong2boundaries      261   1.34ms  2.03%  5.14μs     0.00B  0.00%    0.00B
   reset ∂u/∂t             261    211μs  0.32%   808ns     0.00B  0.00%    0.00B
   prolong2mortars         261   14.1μs  0.02%  54.0ns     0.00B  0.00%    0.00B
   mortar flux             261   8.88μs  0.01%  34.0ns     0.00B  0.00%    0.00B
   source terms            261   4.82μs  0.01%  18.5ns     0.00B  0.00%    0.00B
 analyze solution            2   2.69ms  4.08%  1.35ms    152KiB  59.2%  76.0KiB
 calculate dt               53   1.01ms  1.54%  19.1μs     0.00B  0.00%    0.00B
 ───────────────────────────────────────────────────────────────────────────────

julia> trixi_include("examples/p4est_3d_dgsem/elixir_advection_nonconforming.jl",
                     save_solution=TrivialCallback(), save_restart=TrivialCallback())
[...]
 ───────────────────────────────────────────────────────────────────────────────
            Trixi.jl                    Time                   Allocations
                                ──────────────────────   ───────────────────────
        Tot / % measured:           33.7ms / 97.0%            216KiB / 80.1%

 Section                ncalls     time   %tot     avg     alloc   %tot      avg
 ───────────────────────────────────────────────────────────────────────────────
 rhs!                      156   30.1ms  92.0%   193μs   66.9KiB  38.7%     439B
   interface flux          156   10.3ms  31.5%  66.0μs     0.00B  0.00%    0.00B
   prolong2interfaces      156   6.70ms  20.5%  42.9μs     0.00B  0.00%    0.00B
   volume integral         156   5.26ms  16.1%  33.7μs     0.00B  0.00%    0.00B
   ~rhs!~                  156   3.03ms  9.27%  19.4μs   66.9KiB  38.7%     439B
   mortar flux             156   1.73ms  5.30%  11.1μs     0.00B  0.00%    0.00B
   surface integral        156   1.03ms  3.15%  6.59μs     0.00B  0.00%    0.00B
   prolong2mortars         156    988μs  3.02%  6.33μs     0.00B  0.00%    0.00B
   Jacobian                156    928μs  2.84%  5.95μs     0.00B  0.00%    0.00B
   reset ∂u/∂t             156   93.3μs  0.29%   598ns     0.00B  0.00%    0.00B
   prolong2boundaries      156   8.26μs  0.03%  52.9ns     0.00B  0.00%    0.00B
   boundary flux           156   4.67μs  0.01%  29.9ns     0.00B  0.00%    0.00B
   source terms            156   3.03μs  0.01%  19.4ns     0.00B  0.00%    0.00B
 analyze solution            2   2.12ms  6.49%  1.06ms    106KiB  61.3%  53.0KiB
 calculate dt               32    481μs  1.47%  15.0μs     0.00B  0.00%    0.00B
 ───────────────────────────────────────────────────────────────────────────────

This PR:

julia> trixi_include("examples/p4est_3d_dgsem/elixir_advection_unstructured_curved.jl",
                     save_solution=TrivialCallback(), save_restart=TrivialCallback())
[...]
 ───────────────────────────────────────────────────────────────────────────────
            Trixi.jl                    Time                   Allocations
                                ──────────────────────   ───────────────────────
        Tot / % measured:            1.51s / 93.3%           7.42MiB / 71.2%

 Section                ncalls     time   %tot     avg     alloc   %tot      avg
 ───────────────────────────────────────────────────────────────────────────────
 rhs!                      206    1.29s  91.5%  6.24ms   84.9KiB  1.57%     422B
   interface flux          206    526ms  37.4%  2.56ms     0.00B  0.00%    0.00B
   volume integral         206    433ms  30.8%  2.10ms     0.00B  0.00%    0.00B
   surface integral        206   89.3ms  6.36%   434μs     0.00B  0.00%    0.00B
   prolong2interfaces      206   88.2ms  6.27%   428μs     0.00B  0.00%    0.00B
   Jacobian                206   71.6ms  5.09%   347μs     0.00B  0.00%    0.00B
   boundary flux           206   47.0ms  3.35%   228μs     0.00B  0.00%    0.00B
   reset ∂u/∂t             206   14.0ms  0.99%  67.9μs     0.00B  0.00%    0.00B
   ~rhs!~                  206   8.96ms  0.64%  43.5μs   84.9KiB  1.57%     422B
   prolong2boundaries      206   7.55ms  0.54%  36.6μs     0.00B  0.00%    0.00B
   prolong2mortars         206   93.1μs  0.01%   452ns     0.00B  0.00%    0.00B
   mortar flux             206   29.5μs  0.00%   143ns     0.00B  0.00%    0.00B
   source terms            206   3.83μs  0.00%  18.6ns     0.00B  0.00%    0.00B
 analyze solution            2   63.4ms  4.51%  31.7ms   5.20MiB  98.4%  2.60MiB
 calculate dt               42   56.7ms  4.03%  1.35ms     0.00B  0.00%    0.00B
 ───────────────────────────────────────────────────────────────────────────────

julia> trixi_include("examples/p4est_3d_dgsem/elixir_advection_cubed_sphere.jl",
                     save_solution=TrivialCallback(), save_restart=TrivialCallback())
[...]
 ───────────────────────────────────────────────────────────────────────────────
            Trixi.jl                    Time                   Allocations
                                ──────────────────────   ───────────────────────
        Tot / % measured:           39.8ms / 95.4%            317KiB / 80.6%

 Section                ncalls     time   %tot     avg     alloc   %tot      avg
 ───────────────────────────────────────────────────────────────────────────────
 rhs!                      261   34.6ms  91.2%   133μs    105KiB  40.9%     410B
   volume integral         261   11.2ms  29.5%  42.9μs     0.00B  0.00%    0.00B
   boundary flux           261   7.33ms  19.3%  28.1μs     0.00B  0.00%    0.00B
   interface flux          261   5.58ms  14.7%  21.4μs     0.00B  0.00%    0.00B
   ~rhs!~                  261   3.96ms  10.4%  15.2μs    105KiB  40.9%     410B
   surface integral        261   2.16ms  5.68%  8.27μs     0.00B  0.00%    0.00B
   prolong2interfaces      261   1.91ms  5.02%  7.30μs     0.00B  0.00%    0.00B
   Jacobian                261   1.80ms  4.73%  6.88μs     0.00B  0.00%    0.00B
   prolong2boundaries      261    417μs  1.10%  1.60μs     0.00B  0.00%    0.00B
   reset ∂u/∂t             261    227μs  0.60%   868ns     0.00B  0.00%    0.00B
   prolong2mortars         261   13.9μs  0.04%  53.1ns     0.00B  0.00%    0.00B
   mortar flux             261   8.61μs  0.02%  33.0ns     0.00B  0.00%    0.00B
   source terms            261   5.74μs  0.02%  22.0ns     0.00B  0.00%    0.00B
 analyze solution            2   2.33ms  6.15%  1.17ms    151KiB  59.1%  75.5KiB
 calculate dt               53   1.01ms  2.66%  19.1μs     0.00B  0.00%    0.00B
 ───────────────────────────────────────────────────────────────────────────────

julia> trixi_include("examples/p4est_3d_dgsem/elixir_advection_nonconforming.jl",
                     save_solution=TrivialCallback(), save_restart=TrivialCallback())
[...]
 ───────────────────────────────────────────────────────────────────────────────
            Trixi.jl                    Time                   Allocations
                                ──────────────────────   ───────────────────────
        Tot / % measured:           15.5ms / 94.4%            217KiB / 80.2%

 Section                ncalls     time   %tot     avg     alloc   %tot      avg
 ───────────────────────────────────────────────────────────────────────────────
 rhs!                      156   12.5ms  85.3%  80.2μs   66.9KiB  38.5%     439B
   volume integral         156   4.45ms  30.3%  28.5μs     0.00B  0.00%    0.00B
   interface flux          156   2.50ms  17.0%  16.0μs     0.00B  0.00%    0.00B
   ~rhs!~                  156   2.18ms  14.9%  14.0μs   66.9KiB  38.5%     439B
   surface integral        156    860μs  5.87%  5.51μs     0.00B  0.00%    0.00B
   prolong2interfaces      156    814μs  5.56%  5.22μs     0.00B  0.00%    0.00B
   Jacobian                156    698μs  4.76%  4.47μs     0.00B  0.00%    0.00B
   mortar flux             156    554μs  3.78%  3.55μs     0.00B  0.00%    0.00B
   prolong2mortars         156    358μs  2.44%  2.30μs     0.00B  0.00%    0.00B
   reset ∂u/∂t             156   83.4μs  0.57%   535ns     0.00B  0.00%    0.00B
   prolong2boundaries      156   7.74μs  0.05%  49.6ns     0.00B  0.00%    0.00B
   boundary flux           156   4.12μs  0.03%  26.4ns     0.00B  0.00%    0.00B
   source terms            156   2.90μs  0.02%  18.6ns     0.00B  0.00%    0.00B
 analyze solution            2   1.75ms  11.9%   873μs    107KiB  61.5%  53.5KiB
 calculate dt               32    404μs  2.76%  12.6μs     0.00B  0.00%    0.00B
 ───────────────────────────────────────────────────────────────────────────────

CC @efaulhaber

Closes #642

@ranocha ranocha added the performance We are greedy label Aug 13, 2021
@ranocha ranocha requested a review from sloede August 13, 2021 09:01
@ranocha ranocha changed the title p4est surface performance p4est surface performance 3D Aug 13, 2021
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this looks good! The performance improvements are again very impressive - thanks a lot for taking care of this!

My main concern with my questions/remarks is that it becomes much harder to understand what's going compared to before. Thus I think it would be great if we can expand the documentation a little more (especially for the new auxiliary methods) - although I am not sure exactly how and where this should go.

I also think it would be good if @andrewwinters5000 could have a look, since he is the expert on the indexing. Although I tried to understand the logic of the new indexing, I was not able to check everything.

src/solvers/dgsem_p4est/dg_2d.jl Outdated Show resolved Hide resolved
src/solvers/dgsem_p4est/dg_3d.jl Outdated Show resolved Hide resolved
src/solvers/dgsem_p4est/dg_3d.jl Show resolved Hide resolved
src/solvers/dgsem_p4est/dg_3d.jl Show resolved Hide resolved
@ranocha ranocha requested a review from sloede August 13, 2021 10:17
Copy link
Member

@andrewwinters5000 andrewwinters5000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks quite nice, thanks @ranocha for taking on the 3D cases! The indexing herein should account for all the different flip configurations of the different surfaces. It makes sense that these index flip checks are done on the fly because the mesh is dynamic. So an element's neighbours and their orientations need this extra machinery to be correct. We can keep this in mind when extending the UnstructuredMesh2D to 3D where this index flipping and logic can be done in the mesh constructor once.

src/solvers/dgsem_p4est/dg_3d.jl Show resolved Hide resolved
@ranocha
Copy link
Member Author

ranocha commented Aug 13, 2021

Thanks, @andrewwinters5000 👍

Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ranocha ranocha enabled auto-merge (squash) August 13, 2021 12:05
@codecov
Copy link

codecov bot commented Aug 13, 2021

Codecov Report

Merging #783 (1b6b974) into main (110fd99) will decrease coverage by 0.06%.
The diff coverage is 98.74%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #783      +/-   ##
==========================================
- Coverage   93.60%   93.54%   -0.06%     
==========================================
  Files         182      182              
  Lines       17797    17932     +135     
==========================================
+ Hits        16658    16774     +116     
- Misses       1139     1158      +19     
Flag Coverage Δ
unittests 93.54% <98.74%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...s/p4est_3d_dgsem/elixir_advection_nonconforming.jl 100.00% <ø> (ø)
src/solvers/dgsem_p4est/dg.jl 93.33% <ø> (+2.42%) ⬆️
src/solvers/dgsem_p4est/dg_2d.jl 99.02% <95.00%> (ø)
src/solvers/dgsem_p4est/dg_3d.jl 98.98% <99.09%> (-0.36%) ⬇️
src/solvers/dgsem_p4est/containers.jl 85.23% <0.00%> (-6.04%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 110fd99...1b6b974. Read the comment docs.

@ranocha ranocha disabled auto-merge August 13, 2021 12:32
@ranocha ranocha merged commit a38f0f5 into main Aug 13, 2021
@ranocha ranocha deleted the hr/p4est_interface_performance branch August 13, 2021 12:35
@sloede sloede mentioned this pull request Aug 15, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance We are greedy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Surface stuff is slow for the P4estMesh
3 participants