Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curved ec performance #688

Merged
merged 5 commits into from
Jun 30, 2021
Merged

Conversation

ranocha
Copy link
Member

@ranocha ranocha commented Jun 30, 2021

Flux differencing on curved meshes sucked since the resulting EC methods were only 25% faster than Fluxo 😉 To the rescue!

This speeds up the RHS evaluation in joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl") by 1/3 for flux_ranocha and 1/4 for flux_shima_etal. Thus, our EC methods should be roughly twice as fast as Fluxo on curved meshes.

Teaser: Using julia --check-bounds=no --threads=1, I get

julia> using BenchmarkTools, Trixi

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl"))
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 1456 samples with 1 evaluations.
 Range (min  max):  3.321 ms    4.301 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     3.359 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.431 ms ± 178.005 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █ ▆▄▄▄▃▂▃▂▁                                                  
  ███████████████▇▇▇▄█▇▇▆▆▇▇▇▆█▇▇▆▄▇▄▅▁▁▁▄▄▄▄▄▆▅▄▄▄▄▆▄▅▆▁▅▆▆▅ █
  3.32 ms      Histogram: log(frequency) by time      4.16 ms <

 Memory estimate: 416 bytes, allocs estimate: 6.

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl"),
                             volume_flux=flux_shima_etal)
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 3334 samples with 1 evaluations.
 Range (min  max):  1.460 ms   1.930 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.487 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.497 ms ± 38.342 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄    ▃█     ▂▄                                              
  █▄▁▁▁███▆█▇▆██▇▆██▇██▆▆▅▄▄█▆▅▅▅▅▄▇▁▃▁▃▃▁▅▃▃▃▁▃▃▁▁▁▃▁▄▁▅▄▅▆ █
  1.46 ms      Histogram: log(frequency) by time     1.71 ms <

 Memory estimate: 416 bytes, allocs estimate: 6.

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl"),
                             volume_flux=flux_shima_etal, surface_flux=FluxRotated(flux_shima_etal)) # no direct version available, needs `FluxRotated`
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 2999 samples with 1 evaluations.
 Range (min  max):  1.617 ms   2.334 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.653 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.665 ms ± 54.161 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆      █▇    ▁▄▃ ▁▁ ▂▁▁                                    ▁
  █▄▄▄▃▃▆████████████████▇▇▇█▇▇▆▆▁▃▃▄▆▆▄▄▃▃▃▃▃▁▅▁▁▁▃▃▁▃▃▁▁▃▄ █
  1.62 ms      Histogram: log(frequency) by time     1.88 ms <

 Memory estimate: 416 bytes, allocs estimate: 6.

on main and

julia> using BenchmarkTools, Trixi

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl"))
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 2368 samples with 1 evaluations.
 Range (min  max):  2.088 ms   2.514 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     2.097 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.109 ms ± 32.061 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▃▆█▄        ▁▂▅     ▁    ▁▁                                
  █████▅▁▃▄▅▅▅▆███▇▆▆▆███▇▇███▆▅▅▄▆▅▄▄▅▇▇▆▃▃▄▁▅▄▃▁▁▆▇▅▃▁▁▃▁▅ █
  290 ms       Histogram: log(frequency) by time     2.23 ms <

 Memory estimate: 416 bytes, allocs estimate: 6.

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl"),
                             volume_flux=flux_shima_etal)
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 3903 samples with 1 evaluations.
 Range (min  max):  1.244 ms   2.337 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.268 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.279 ms ± 49.904 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄     ▄█▁      ▂▃       ▁                                  ▁
  ██▄▄▄▃████▇▇▇▇▇██▇▇▅▇█▇███▇▅▅▇▆▅▆█▆▆▆▄▄▅▄▄▆▃▄▄▅▁▁▃▄▄▁▄▃▁▃▄ █
  1.24 ms      Histogram: log(frequency) by time     1.44 ms <

 Memory estimate: 416 bytes, allocs estimate: 6.

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl"),
                             volume_flux=flux_shima_etal, surface_flux=flux_shima_etal) # No need for `FluxRotated` anymore
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 4035 samples with 1 evaluations.
 Range (min  max):  1.187 ms   2.787 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.211 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.237 ms ± 82.842 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃  █   ▄▄ ▂▂▂   ▁                                          ▁
  █▃▄███▇███████▇███▇▇▇█▇▆▆▇▇▇▆▆▆▆▆▅▃▆▆▃▅▅▃▅▄▃▄▄▃▄▅▃▁▁▄▁▃▁▄▃ █
  1.19 ms      Histogram: log(frequency) by time     1.57 ms <

 Memory estimate: 416 bytes, allocs estimate: 6.

from this PR.

Benchmarks on Rocinante are running. I will add them later when they are finished (in a few hours?).

@ranocha ranocha requested a review from sloede June 30, 2021 12:42
@ranocha
Copy link
Member Author

ranocha commented Jun 30, 2021

The curvilinear meshes are still more expensive than the Cartesian TreeMesh, of course.

julia> using BenchmarkTools, Trixi

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_ec.jl"))
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 1569 samples with 1 evaluations.
 Range (min  max):  2.988 ms    5.868 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     3.081 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.182 ms ± 287.739 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃█▅▅▅▄▄▄▂▃▁▁▁▁▁▁                                             
  ████████████████████▇▇█▇▅▇▆▆▆▅▅▆▆▆▄▁▆▁▆▅▆▆▅▆▆▆▅▆▅▅▅▅▅▅▁▄▁▄▅ █
  2.99 ms      Histogram: log(frequency) by time       4.3 ms <

 Memory estimate: 384 bytes, allocs estimate: 6.

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_ec.jl"),
                   mesh=StructuredMesh((8, 8, 8), float.(coordinates_min), float.(coordinates_max)))
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 1151 samples with 1 evaluations.
 Range (min  max):  3.634 ms    5.902 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     4.320 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.330 ms ± 260.993 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     ▁▁▁▂▆▅▆▄█▆▄▂▅▄▃▅▇▄▆▂▂                     
  ▃▂▂▁▄▄▃▂▄▅▅▄▆▆██▇███████████████████████▇▇▆▆▇▅▄▅▃▃▃▃▄▃▃▃▁▂▃ ▅
  3.63 ms         Histogram: frequency by time         530 ms <

 Memory estimate: 368 bytes, allocs estimate: 6.

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_ec.jl"),
                   mesh=P4estMesh((1, 1, 1), polydeg=3, initial_refinement_level=3; coordinates_min, coordinates_max))
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 765 samples with 1 evaluations.
 Range (min  max):  6.338 ms    9.456 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     6.392 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.535 ms ± 332.859 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▅▃▃▃▂ ▂ ▁▁▁                                                
  █████████▇████▇▆▆▇▇▄█▆▆▇▅▇▇▅█▄▆▄▄▅▄▆▄▁▁▁▄▄▄▁▇▆▄▄▆▅▁▁▁▄▁▁▄▄▄ ▇
  6.34 ms      Histogram: log(frequency) by time      7.92 ms <

 Memory estimate: 736 bytes, allocs estimate: 11.

Note that the P4estMesh has quite some overhead due to issues such as #642 and #628.

Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Although I did not check the changes to the curved part if they make sense numerically - this would be better handled by @andrewwinters5000, if necessary.

@ranocha ranocha enabled auto-merge (squash) June 30, 2021 13:22
@sloede
Copy link
Member

sloede commented Jun 30, 2021

Thanks for testing this out!

@andrewwinters5000
Copy link
Member

Thanks for testing this @ranocha ! Having the volume flux able to accept the pre-averaged contravariant vectors directly make things cleaner and actually somewhat similar to how FLUXO does it ;)

@codecov
Copy link

codecov bot commented Jun 30, 2021

Codecov Report

Merging #688 (77a5cda) into main (1525846) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #688      +/-   ##
==========================================
+ Coverage   93.61%   93.63%   +0.02%     
==========================================
  Files         171      171              
  Lines       16482    16535      +53     
==========================================
+ Hits        15429    15482      +53     
  Misses       1053     1053              
Flag Coverage Δ
unittests 93.63% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/equations/compressible_euler_1d.jl 92.30% <ø> (ø)
src/equations/ideal_glm_mhd_3d.jl 96.47% <ø> (ø)
src/equations/compressible_euler_2d.jl 94.29% <100.00%> (+0.32%) ⬆️
src/equations/compressible_euler_3d.jl 95.04% <100.00%> (+0.36%) ⬆️
src/equations/ideal_glm_mhd_2d.jl 93.85% <100.00%> (ø)
src/solvers/dgsem_structured/dg_3d.jl 96.72% <100.00%> (-0.22%) ⬇️
src/solvers/dgsem_unstructured/dg_2d.jl 96.35% <100.00%> (-0.09%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1525846...77a5cda. Read the comment docs.

@ranocha ranocha disabled auto-merge June 30, 2021 14:39
@ranocha ranocha merged commit e48f449 into trixi-framework:main Jun 30, 2021
@ranocha ranocha deleted the hr/curved_ec_performance branch June 30, 2021 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants