Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type instability in prolong2mortars! for P4estMesh #1185

Closed
ranocha opened this issue Jul 20, 2022 · 0 comments · Fixed by #1189
Closed

Type instability in prolong2mortars! for P4estMesh #1185

ranocha opened this issue Jul 20, 2022 · 0 comments · Fixed by #1189
Labels
performance We are greedy

Comments

@ranocha
Copy link
Member

ranocha commented Jul 20, 2022

julia> include(joinpath(examples_dir(), "p4est_2d_dgsem", "elixir_advection_amr_solution_independent.jl"))
[...]
 ──────────────────────────────────────────────────────────────────────────────────────
               Trixi.jl                       Time                    Allocations      
                                     ───────────────────────   ────────────────────────
          Tot / % measured:               1.36s /  91.4%           97.4MiB /  98.2%    

 Section                     ncalls     time    %tot     avg     alloc    %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────
 rhs!                           726    666ms   53.5%   918μs   26.5MiB   27.7%  37.4KiB
   prolong2mortars              726    575ms   46.2%   792μs   26.2MiB   27.4%  37.0KiB
   volume integral              726   40.1ms    3.2%  55.2μs     0.00B    0.0%    0.00B
   interface flux               726   28.5ms    2.3%  39.2μs     0.00B    0.0%    0.00B
   prolong2interfaces           726   8.13ms    0.7%  11.2μs     0.00B    0.0%    0.00B
   surface integral             726   5.10ms    0.4%  7.03μs     0.00B    0.0%    0.00B
   Jacobian                     726   3.74ms    0.3%  5.15μs     0.00B    0.0%    0.00B
   mortar flux                  726   3.67ms    0.3%  5.05μs     0.00B    0.0%    0.00B
   ~rhs!~                       726   1.41ms    0.1%  1.95μs    270KiB    0.3%     381B
   reset ∂u/∂t                  726    760μs    0.1%  1.05μs     0.00B    0.0%    0.00B
   prolong2boundaries           726   41.1μs    0.0%  56.6ns     0.00B    0.0%    0.00B
   source terms                 726   17.8μs    0.0%  24.6ns     0.00B    0.0%    0.00B
   boundary flux                726   16.9μs    0.0%  23.3ns     0.00B    0.0%    0.00B
 AMR                             28    406ms   32.6%  14.5ms   63.4MiB   66.3%  2.27MiB
   coarsen                       28    214ms   17.2%  7.63ms   30.2MiB   31.6%  1.08MiB
     solver                      28    206ms   16.5%  7.35ms   24.0MiB   25.1%   877KiB
     mesh                        28   7.83ms    0.6%   279μs   6.23MiB    6.5%   228KiB
       rebalance                 28   5.02ms    0.4%   179μs      896B    0.0%    32.0B
       ~mesh~                    28   2.43ms    0.2%  86.8μs   6.09MiB    6.4%   223KiB
       coarsen!                  28    370μs    0.0%  13.2μs    141KiB    0.1%  5.05KiB
     ~coarsen~                   28   25.3μs    0.0%   905ns   1.47KiB    0.0%    53.7B
   refine                        28    189ms   15.2%  6.74ms   27.9MiB   29.2%  1.00MiB
     solver                      28    182ms   14.6%  6.50ms   25.0MiB   26.2%   916KiB
     mesh                        28   6.73ms    0.5%   240μs   2.84MiB    3.0%   104KiB
       rebalance                 28   5.16ms    0.4%   184μs      896B    0.0%    32.0B
       ~mesh~                    28    944μs    0.1%  33.7μs   2.84MiB    3.0%   104KiB
       refine                    28    630μs    0.1%  22.5μs     0.00B    0.0%    0.00B
     ~refine~                    28   25.1μs    0.0%   896ns   1.47KiB    0.0%    53.7B
   indicator                     28   2.64ms    0.2%  94.4μs   2.71MiB    2.8%  99.2KiB
   ~AMR~                         28    860μs    0.1%  30.7μs   2.62MiB    2.7%  95.8KiB
 I/O                              4    153ms   12.3%  38.2ms   1.93MiB    2.0%   493KiB
   save mesh                      3   98.7ms    7.9%  32.9ms   8.88KiB    0.0%  2.96KiB
   ~I/O~                          4   42.6ms    3.4%  10.7ms   7.88KiB    0.0%  1.97KiB
   get element variables          3   9.46ms    0.8%  3.15ms   1.48MiB    1.5%   505KiB
   save solution                  3   2.12ms    0.2%   705μs    440KiB    0.4%   147KiB
 initial condition AMR            1   12.2ms    1.0%  12.2ms   3.49MiB    3.6%  3.49MiB
   AMR                            3   12.0ms    1.0%  3.99ms   3.48MiB    3.6%  1.16MiB
     refine                       3   11.8ms    0.9%  3.94ms   2.81MiB    2.9%   960KiB
       solver                     3   11.3ms    0.9%  3.76ms   2.56MiB    2.7%   874KiB
       mesh                       3    516μs    0.0%   172μs    255KiB    0.3%  85.0KiB
         rebalance                3    438μs    0.0%   146μs     96.0B    0.0%    32.0B
         refine                   3   39.2μs    0.0%  13.1μs     0.00B    0.0%    0.00B
         ~mesh~                   3   38.9μs    0.0%  13.0μs    255KiB    0.3%  84.9KiB
       ~refine~                   3   1.86μs    0.0%   620ns   1.47KiB    0.0%     501B
     indicator                    3    127μs    0.0%  42.2μs    252KiB    0.3%  84.1KiB
     ~AMR~                        3   46.7μs    0.0%  15.6μs    437KiB    0.4%   146KiB
     coarsen                      3    424ns    0.0%   141ns      192B    0.0%    64.0B
   ~initial condition AMR~        1    225μs    0.0%   225μs      752B    0.0%     752B
 analyze solution                 3   4.04ms    0.3%  1.35ms    324KiB    0.3%   108KiB
 calculate dt                   146   3.66ms    0.3%  25.1μs     0.00B    0.0%    0.00B
 ──────────────────────────────────────────────────────────────────────────────────────

Seems to be caused by type instability with axes of views in multiply_dimensionwise! - change @turbo for i in axes(data_out, 2), v in axes(data_out, 1) to @turbo for i in axes(matrix, 2), v in axes(data_in, 1) to fix this specific case in

function multiply_dimensionwise!(data_out::AbstractArray{<:Any, 2}, matrix::AbstractMatrix,
data_in ::AbstractArray{<:Any, 2})
# @tullio threads=false data_out[v, i] = matrix[i, ii] * data_in[v, ii]
@turbo for i in axes(data_out, 2), v in axes(data_out, 1)
res = zero(eltype(data_out))
for ii in axes(matrix, 2)
res += matrix[i, ii] * data_in[v, ii]
end
data_out[v, i] = res
end
return nothing
end

see also
multiply_dimensionwise!(view(cache.mortars.u, 2, :, 1, :, mortar),
mortar_l2.forward_lower,
u_buffer)

@ranocha ranocha added the performance We are greedy label Jul 20, 2022
@ranocha ranocha mentioned this issue Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance We are greedy
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant