WIP: MatrixFree and Assembled GPU operators #766

termi-official · 2023-07-14T17:29:36Z

This PR shows how to implement matrix-free operators on the GPU and provides the necessary infrastructure.

TODOs

Fix bug when block size > 1
Use Float32
Remove problematic string interpolations in exceptions (breaks GPU compilation) - also check out Base.LazyString
Performance comparison
GPUCellValues/MFCellValues or similar
Move adapt stuff into extension
Documentation entries
Document example
Check if AMDGPU.jl works, too - maybe even move to KernelAbstractions.jl

Future work

Constraint handler integration
Move from mass to steady state heat problem
Matrix-Free p-Multigrid example from https://arxiv.org/pdf/2204.01722.pdf with help from https://dl.acm.org/doi/pdf/10.1145/3322813

codecov-commenter · 2023-07-14T17:38:54Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.77%. Comparing base (1a84b77) to head (08256af).
Report is 11 commits behind head on master.

❗ Current head 08256af differs from pull request most recent head 2b7bb24. Consider uploading reports for the commit 2b7bb24 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #766      +/-   ##
==========================================
- Coverage   93.29%   92.77%   -0.53%     
==========================================
  Files          36       33       -3     
  Lines        5235     4952     -283     
==========================================
- Hits         4884     4594     -290     
- Misses        351      358       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

KnutAM · 2023-07-20T07:58:08Z

docs/src/literate-tutorials/gpu_assembly.jl

+end
+### TODO Adapt dofhandler
+
+# TODO not sure how to do this automatically


Driveby comment: I think we could change

Ferrite.jl/src/interpolations.jl

Line 521 in 3cff601

throw(ArgumentError("no shape function $i for interpolation $ip"))

to @boundscheck throw(ArgumentError("no shape function $i for interpolation $ip"))

(also in all other places) and then in this example use @inbounds?

Well, this is rather hacky. Give me a bit more time to think about this.

Right, this is actually not type stable.
Probably should return eltype(ξ)(NaN) after the @boundscheck and update the operations to not multiply by floats.

But I wouldn't call that hacky though since it just allows @inbounds to be used to elude the bounds check?

…ate NaNs at the speed of light.

termi-official · 2023-12-31T02:51:18Z

The first SpMV kernels run, but there is still quite a bit of features and optimizations to implement. Here some first benchmarks. cuSPARSE currently wins by a landslide and the full matrix-free kernel is about 3x slower thatn the CPU implementation for now.

julia> @benchmark CUDA.@sync mul!($u,$Agpu,$bgpu)
BenchmarkTools.Trial: 27 samples with 1 evaluation.
 Range (min … max):  185.123 ms … 186.871 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     186.140 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   186.102 ms ± 469.661 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                      ▃ ▃            █          ▃      █         
  ▇▁▁▁▁▁▁▁▇▁▁▁▁▁▇▇▁▁▁▇█▁█▇▁▁▁▇▇▁▁▁▁▁▇█▁▁▁▁▇▇▁▁▁▁█▁▁▇▁▇▁█▁▁▁▁▇▁▇ ▁
  185 ms           Histogram: frequency by time          187 ms <

 Memory estimate: 46.61 KiB, allocs estimate: 475.

julia> @benchmark mul!($ucpu, $Kcpu, $bcpu)
BenchmarkTools.Trial: 63 samples with 1 evaluation.
 Range (min … max):  79.763 ms …  83.167 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     80.435 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   80.446 ms ± 512.121 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

              █                                                 
  ▆▁▅▆▁▆▅▆▅▁▆██▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
  79.8 ms       Histogram: log(frequency) by time        83 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark mul!($ugpu,$Kgpu,$bgpu)
BenchmarkTools.Trial: 8654 samples with 1 evaluation.
 Range (min … max):   18.390 μs …   2.875 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     610.724 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   574.855 μs ± 144.657 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                                              █  
  ▃▂▁▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂█ ▂
  18.4 μs          Histogram: frequency by time          614 μs <

 Memory estimate: 1.44 KiB, allocs estimate: 60.

julia> @benchmark mul!($ugpu,$Aeagpu,$bgpu)
BenchmarkTools.Trial: 201 samples with 1 evaluation.
 Range (min … max):  24.722 ms …  30.899 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     24.823 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   24.871 ms ± 437.805 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

        ▃▃█▅▄▁                                                  
  ▂▂▃▃▄███████▆▅▄▄▄▃▁▁▁▁▂▂▂▂▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▃▂ ▃
  24.7 ms         Histogram: frequency by time         25.4 ms <

 Memory estimate: 4.42 KiB, allocs estimate: 151.

MatrixFree GPU heat equation prototype.

6803abf

termi-official added this to the v1.0.0 milestone Jul 14, 2023

KnutAM reviewed Jul 20, 2023

View reviewed changes

KnutAM mentioned this pull request Aug 1, 2023

Implement FaceQuadratureRule for RefPrism and RefPyramid #779

Merged

termi-official mentioned this pull request Sep 29, 2023

More flexibility for internal types #802

Merged

termi-official added 3 commits October 11, 2023 18:40

Merge branch 'master' into do/cuda-mass-example

9421fba

Temporary solution for sparsity pattern on GPU.

08256af

Add missing tests.

d8456a4

termi-official mentioned this pull request Oct 12, 2023

Batch evaluations #819

Merged

2 tasks

termi-official added 3 commits October 17, 2023 14:18

Merge branch 'master' into do/cuda-mass-example

42fc90a

Make GPU catching 🔥.

7e151bf

Packages.

6f33d09

termi-official mentioned this pull request Oct 17, 2023

Move LTG monodomain solver to GPU termi-official/Thunderbolt.jl#11

Closed

termi-official changed the title ~~MatrixFree GPU operators~~ WIP: MatrixFree GPU operators Oct 25, 2023

termi-official mentioned this pull request Nov 11, 2023

p-multigrid example #840

Open

termi-official added 9 commits December 12, 2023 15:02

Merge master.

f1916fc

...

1e5cb41

GPU grid prototype.

07a0e7a

Initial GPUGrid and GPUDofHandler + performance bugfix. Can now gener…

0aca027

…ate NaNs at the speed of light.

Derp.

955a233

Back to Float32.

75c6dd7

Fix Float32 2D grid generator.

10d6a51

Fix performance and add element assembly.

778d57b

Benchmark code.

3e9edc5

termi-official changed the title ~~WIP: MatrixFree GPU operators~~ WIP: MatrixFree and Assembled GPU operators Dec 31, 2023

Move small quadrature rule to Ferrite core

7a42d4c

termi-official mentioned this pull request Jan 5, 2024

Element assembly operators termi-official/Thunderbolt.jl#50

Open

manifest

4c56374

termi-official mentioned this pull request Jan 30, 2024

domain decomposition methods as preconditioners #877

Open

Tune runtimes

2b7bb24

termi-official mentioned this pull request Jun 11, 2024

CUDA monodomain solver baseline termi-official/Thunderbolt.jl#104

Merged

7 tasks

termi-official removed this from the v1.0.0 milestone Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: MatrixFree and Assembled GPU operators #766

WIP: MatrixFree and Assembled GPU operators #766

termi-official commented Jul 14, 2023 •

edited

Loading

codecov-commenter commented Jul 14, 2023 •

edited by codecov bot

Loading

KnutAM Jul 20, 2023

termi-official Jul 20, 2023

KnutAM Jul 20, 2023

termi-official commented Dec 31, 2023

WIP: MatrixFree and Assembled GPU operators #766

Are you sure you want to change the base?

WIP: MatrixFree and Assembled GPU operators #766

Conversation

termi-official commented Jul 14, 2023 • edited Loading

TODOs

Future work

codecov-commenter commented Jul 14, 2023 • edited by codecov bot Loading

Codecov Report

KnutAM Jul 20, 2023

Choose a reason for hiding this comment

termi-official Jul 20, 2023

Choose a reason for hiding this comment

KnutAM Jul 20, 2023

Choose a reason for hiding this comment

termi-official commented Dec 31, 2023

termi-official commented Jul 14, 2023 •

edited

Loading

codecov-commenter commented Jul 14, 2023 •

edited by codecov bot

Loading