curved ec performance #688

ranocha · 2021-06-30T12:42:05Z

Flux differencing on curved meshes sucked since the resulting EC methods were only 25% faster than Fluxo 😉 To the rescue!

This speeds up the RHS evaluation in joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl") by 1/3 for flux_ranocha and 1/4 for flux_shima_etal. Thus, our EC methods should be roughly twice as fast as Fluxo on curved meshes.

Teaser: Using julia --check-bounds=no --threads=1, I get

julia> using BenchmarkTools, Trixi

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl"))
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 1456 samples with 1 evaluations.
 Range (min … max):  3.321 ms …   4.301 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.359 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.431 ms ± 178.005 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █ ▆▄▄▄▃▂▃▂▁                                                  
  ███████████████▇▇▇▄█▇▇▆▆▇▇▇▆█▇▇▆▄▇▄▅▁▁▁▄▄▄▄▄▆▅▄▄▄▄▆▄▅▆▁▅▆▆▅ █
  3.32 ms      Histogram: log(frequency) by time      4.16 ms <

 Memory estimate: 416 bytes, allocs estimate: 6.

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl"),
                             volume_flux=flux_shima_etal)
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 3334 samples with 1 evaluations.
 Range (min … max):  1.460 ms …  1.930 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.487 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.497 ms ± 38.342 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄    ▃█     ▂▄                                              
  █▄▁▁▁███▆█▇▆██▇▆██▇██▆▆▅▄▄█▆▅▅▅▅▄▇▁▃▁▃▃▁▅▃▃▃▁▃▃▁▁▁▃▁▄▁▅▄▅▆ █
  1.46 ms      Histogram: log(frequency) by time     1.71 ms <

 Memory estimate: 416 bytes, allocs estimate: 6.

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl"),
                             volume_flux=flux_shima_etal, surface_flux=FluxRotated(flux_shima_etal)) # no direct version available, needs `FluxRotated`
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 2999 samples with 1 evaluations.
 Range (min … max):  1.617 ms …  2.334 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.653 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.665 ms ± 54.161 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆      █▇    ▁▄▃ ▁▁ ▂▁▁                                    ▁
  █▄▄▄▃▃▆████████████████▇▇▇█▇▇▆▆▁▃▃▄▆▆▄▄▃▃▃▃▃▁▅▁▁▁▃▃▁▃▃▁▁▃▄ █
  1.62 ms      Histogram: log(frequency) by time     1.88 ms <

 Memory estimate: 416 bytes, allocs estimate: 6.

on main and

julia> using BenchmarkTools, Trixi

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl"))
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 2368 samples with 1 evaluations.
 Range (min … max):  2.088 ms …  2.514 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.097 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.109 ms ± 32.061 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▃▆█▄        ▁▂▅     ▁    ▁▁                                
  █████▅▁▃▄▅▅▅▆███▇▆▆▆███▇▇███▆▅▅▄▆▅▄▄▅▇▇▆▃▃▄▁▅▄▃▁▁▆▇▅▃▁▁▃▁▅ █
  290 ms       Histogram: log(frequency) by time     2.23 ms <

 Memory estimate: 416 bytes, allocs estimate: 6.

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl"),
                             volume_flux=flux_shima_etal)
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 3903 samples with 1 evaluations.
 Range (min … max):  1.244 ms …  2.337 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.268 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.279 ms ± 49.904 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄     ▄█▁      ▂▃       ▁                                  ▁
  ██▄▄▄▃████▇▇▇▇▇██▇▇▅▇█▇███▇▅▅▇▆▅▆█▆▆▆▄▄▅▄▄▆▃▄▄▅▁▁▃▄▄▁▄▃▁▃▄ █
  1.24 ms      Histogram: log(frequency) by time     1.44 ms <

 Memory estimate: 416 bytes, allocs estimate: 6.

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "structured_3d_dgsem", "elixir_euler_ec.jl"),
                             volume_flux=flux_shima_etal, surface_flux=flux_shima_etal) # No need for `FluxRotated` anymore
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 4035 samples with 1 evaluations.
 Range (min … max):  1.187 ms …  2.787 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.211 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.237 ms ± 82.842 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃  █   ▄▄ ▂▂▂   ▁                                          ▁
  █▃▄███▇███████▇███▇▇▇█▇▆▆▇▇▇▆▆▆▆▆▅▃▆▆▃▅▅▃▅▄▃▄▄▃▄▅▃▁▁▄▁▃▁▄▃ █
  1.19 ms      Histogram: log(frequency) by time     1.57 ms <

 Memory estimate: 416 bytes, allocs estimate: 6.

from this PR.

Benchmarks on Rocinante are running. I will add them later when they are finished (in a few hours?).

ranocha · 2021-06-30T13:07:04Z

The curvilinear meshes are still more expensive than the Cartesian TreeMesh, of course.

julia> using BenchmarkTools, Trixi

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_ec.jl"))
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 1569 samples with 1 evaluations.
 Range (min … max):  2.988 ms …   5.868 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.081 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.182 ms ± 287.739 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃█▅▅▅▄▄▄▂▃▁▁▁▁▁▁                                             
  ████████████████████▇▇█▇▅▇▆▆▆▅▅▆▆▆▄▁▆▁▆▅▆▆▅▆▆▆▅▆▅▅▅▅▅▅▁▄▁▄▅ █
  2.99 ms      Histogram: log(frequency) by time       4.3 ms <

 Memory estimate: 384 bytes, allocs estimate: 6.

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_ec.jl"),
                   mesh=StructuredMesh((8, 8, 8), float.(coordinates_min), float.(coordinates_max)))
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 1151 samples with 1 evaluations.
 Range (min … max):  3.634 ms …   5.902 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.320 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.330 ms ± 260.993 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     ▁▁▁▂▆▅▆▄█▆▄▂▅▄▃▅▇▄▆▂▂                     
  ▃▂▂▁▄▄▃▂▄▅▅▄▆▆██▇███████████████████████▇▇▆▆▇▅▄▅▃▃▃▃▄▃▃▃▁▂▃ ▅
  3.63 ms         Histogram: frequency by time         530 ms <

 Memory estimate: 368 bytes, allocs estimate: 6.

julia> begin # RHS
           redirect_stdout(devnull) do
               trixi_include(joinpath(examples_dir(), "tree_3d_dgsem", "elixir_euler_ec.jl"),
                   mesh=P4estMesh((1, 1, 1), polydeg=3, initial_refinement_level=3; coordinates_min, coordinates_max))
           end
           u_ode = copy(sol.u[end])
           du_ode = similar(u_ode)
           @benchmark Trixi.rhs!($du_ode, $u_ode, $semi, $0.0)
       end
BechmarkTools.Trial: 765 samples with 1 evaluations.
 Range (min … max):  6.338 ms …   9.456 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.392 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.535 ms ± 332.859 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▅▃▃▃▂ ▂ ▁▁▁                                                
  █████████▇████▇▆▆▇▇▄█▆▆▇▅▇▇▅█▄▆▄▄▅▄▆▄▁▁▁▄▄▄▁▇▆▄▄▆▅▁▁▁▄▁▁▄▄▄ ▇
  6.34 ms      Histogram: log(frequency) by time      7.92 ms <

 Memory estimate: 736 bytes, allocs estimate: 11.

Note that the P4estMesh has quite some overhead due to issues such as #642 and #628.

sloede

LGTM! Although I did not check the changes to the curved part if they make sense numerically - this would be better handled by @andrewwinters5000, if necessary.

sloede · 2021-06-30T13:35:14Z

Thanks for testing this out!

andrewwinters5000 · 2021-06-30T13:39:54Z

Thanks for testing this @ranocha ! Having the volume flux able to accept the pre-averaged contravariant vectors directly make things cleaner and actually somewhat similar to how FLUXO does it ;)

codecov · 2021-06-30T14:28:43Z

Codecov Report

Merging #688 (77a5cda) into main (1525846) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #688      +/-   ##
==========================================
+ Coverage   93.61%   93.63%   +0.02%     
==========================================
  Files         171      171              
  Lines       16482    16535      +53     
==========================================
+ Hits        15429    15482      +53     
  Misses       1053     1053

Flag	Coverage Δ
unittests	`93.63% <100.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/equations/compressible_euler_1d.jl	`92.30% <ø> (ø)`
src/equations/ideal_glm_mhd_3d.jl	`96.47% <ø> (ø)`
src/equations/compressible_euler_2d.jl	`94.29% <100.00%> (+0.32%)`	⬆️
src/equations/compressible_euler_3d.jl	`95.04% <100.00%> (+0.36%)`	⬆️
src/equations/ideal_glm_mhd_2d.jl	`93.85% <100.00%> (ø)`
src/solvers/dgsem_structured/dg_3d.jl	`96.72% <100.00%> (-0.22%)`	⬇️
src/solvers/dgsem_unstructured/dg_2d.jl	`96.35% <100.00%> (-0.09%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1525846...77a5cda. Read the comment docs.

ranocha added 5 commits June 30, 2021 14:07

make curved EC methods fast

9aa43c8

flux_shima_etal with normal_direction

9eab1c0

flux_kennedy_gruber with normal_direction

6db91f6

test new fluxes vs. FluxRotated

d419ed9

inv_gamma_minus_1 -> inv_gamma_minus_one

77a5cda

ranocha requested a review from sloede June 30, 2021 12:42

sloede approved these changes Jun 30, 2021

View reviewed changes

ranocha enabled auto-merge (squash) June 30, 2021 13:22

ranocha disabled auto-merge June 30, 2021 14:39

ranocha merged commit e48f449 into trixi-framework:main Jun 30, 2021

ranocha deleted the hr/curved_ec_performance branch June 30, 2021 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

curved ec performance #688

curved ec performance #688

ranocha commented Jun 30, 2021

ranocha commented Jun 30, 2021

sloede left a comment

sloede commented Jun 30, 2021

andrewwinters5000 commented Jun 30, 2021

codecov bot commented Jun 30, 2021

curved ec performance #688

curved ec performance #688

Conversation

ranocha commented Jun 30, 2021

ranocha commented Jun 30, 2021

sloede left a comment

Choose a reason for hiding this comment

sloede commented Jun 30, 2021

andrewwinters5000 commented Jun 30, 2021

codecov bot commented Jun 30, 2021

Codecov Report