Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: MatrixFree and Assembled GPU operators #766

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

termi-official
Copy link
Member

@termi-official termi-official commented Jul 14, 2023

This PR shows how to implement matrix-free operators on the GPU and provides the necessary infrastructure.

TODOs

  • Fix bug when block size > 1
  • Use Float32
  • Remove problematic string interpolations in exceptions (breaks GPU compilation) - also check out Base.LazyString
  • Performance comparison
  • GPUCellValues/MFCellValues or similar
  • Move adapt stuff into extension
  • Documentation entries
  • Document example
  • Check if AMDGPU.jl works, too - maybe even move to KernelAbstractions.jl

Future work

@termi-official termi-official added this to the v1.0.0 milestone Jul 14, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jul 14, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.77%. Comparing base (1a84b77) to head (08256af).
Report is 11 commits behind head on master.

❗ Current head 08256af differs from pull request most recent head 2b7bb24. Consider uploading reports for the commit 2b7bb24 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #766      +/-   ##
==========================================
- Coverage   93.29%   92.77%   -0.53%     
==========================================
  Files          36       33       -3     
  Lines        5235     4952     -283     
==========================================
- Hits         4884     4594     -290     
- Misses        351      358       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

end
### TODO Adapt dofhandler

# TODO not sure how to do this automatically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driveby comment: I think we could change

throw(ArgumentError("no shape function $i for interpolation $ip"))

to @boundscheck throw(ArgumentError("no shape function $i for interpolation $ip"))

(also in all other places) and then in this example use @inbounds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is rather hacky. Give me a bit more time to think about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is actually not type stable.
Probably should return eltype(ξ)(NaN) after the @boundscheck and update the operations to not multiply by floats.

But I wouldn't call that hacky though since it just allows @inbounds to be used to elude the bounds check?

@termi-official termi-official mentioned this pull request Oct 12, 2023
2 tasks
@termi-official termi-official changed the title MatrixFree GPU operators WIP: MatrixFree GPU operators Oct 25, 2023
@termi-official
Copy link
Member Author

The first SpMV kernels run, but there is still quite a bit of features and optimizations to implement. Here some first benchmarks. cuSPARSE currently wins by a landslide and the full matrix-free kernel is about 3x slower thatn the CPU implementation for now.

julia> @benchmark CUDA.@sync mul!($u,$Agpu,$bgpu)
BenchmarkTools.Trial: 27 samples with 1 evaluation.
 Range (min … max):  185.123 ms … 186.871 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     186.140 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   186.102 ms ± 469.661 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                      ▃ ▃            █          ▃      █         
  ▇▁▁▁▁▁▁▁▇▁▁▁▁▁▇▇▁▁▁▇█▁█▇▁▁▁▇▇▁▁▁▁▁▇█▁▁▁▁▇▇▁▁▁▁█▁▁▇▁▇▁█▁▁▁▁▇▁▇ ▁
  185 ms           Histogram: frequency by time          187 ms <

 Memory estimate: 46.61 KiB, allocs estimate: 475.

julia> @benchmark mul!($ucpu, $Kcpu, $bcpu)
BenchmarkTools.Trial: 63 samples with 1 evaluation.
 Range (min … max):  79.763 ms …  83.167 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     80.435 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   80.446 ms ± 512.121 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

              █                                                 
  ▆▁▅▆▁▆▅▆▅▁▆██▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
  79.8 ms       Histogram: log(frequency) by time        83 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark mul!($ugpu,$Kgpu,$bgpu)
BenchmarkTools.Trial: 8654 samples with 1 evaluation.
 Range (min … max):   18.390 μs …   2.875 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     610.724 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   574.855 μs ± 144.657 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                                              █  
  ▃▂▁▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂█ ▂
  18.4 μs          Histogram: frequency by time          614 μs <

 Memory estimate: 1.44 KiB, allocs estimate: 60.

julia> @benchmark mul!($ugpu,$Aeagpu,$bgpu)
BenchmarkTools.Trial: 201 samples with 1 evaluation.
 Range (min … max):  24.722 ms …  30.899 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     24.823 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   24.871 ms ± 437.805 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

        ▃▃█▅▄▁                                                  
  ▂▂▃▃▄███████▆▅▄▄▄▃▁▁▁▁▂▂▂▂▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▃▂ ▃
  24.7 ms         Histogram: frequency by time         25.4 ms <

 Memory estimate: 4.42 KiB, allocs estimate: 151.

@termi-official termi-official changed the title WIP: MatrixFree GPU operators WIP: MatrixFree and Assembled GPU operators Dec 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants