Add GPU support #137

fjebaker · 2023-07-30T21:56:49Z

Closes #8

There's still a lot of work to do here:

properly embed the GPU ensemble calls into the tracing pipeline to ensure e.g. kwargshandle isn't passed to solve, and that the dt / adaptive is set correctly
investigate why GPU tracing is returning poor trajectories
bundle into an Extra package that loads if the user also is using DiffEqGPU.jl (Julia 1.9 feature)
fix type promotion issues with using GPU in rendergeodesics
batch solve non-deterministically fails

This PR currently includes a temporary fix for handling sin / cos duals in ForwardDiff when dispatching on Metal.

State of the device

Rudimentary benchmarks look very promising (fully Float32):

sols = @btime tracegeodesics(m, us, vs, 2000.0f0)
# 100:      13.175 ms (98917 allocations: 38.06 MiB)
# 10_000:   1.727 s (9873983 allocations: 3.72 GiB)

sols = @btime tracegeodesics(m, us, vs, 2000.0f0,
    solver = GPUTsit5(), ensemble = EnsembleGPUKernel(Metal.MetalBackend())
)
# 100:      31.596 ms (3095 allocations: 2.17 MiB)
# 10_000:   231.454 ms (187074 allocations: 204.85 MiB)

However, the traces themselves do not. On the CPU, we get:

Whereas on the GPU:

Clearly there is something very wrong here. Since the impact parameters are set for $\alpha$ between 5 and 10, the lack of spread in the GPU picture might suggest that the initial steps of the integrator are poor, which then propagates further into the integration.

The performance is promising, and provided it doesn't degrade in trying to fix the numerical issues, then the GPU support should be very worthwhile.

codecov-commenter · 2023-07-30T23:04:52Z

Codecov Report

Merging #137 (db2d8d6) into main (dd23b71) will decrease coverage by 0.19%.
The diff coverage is 57.89%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main     #137      +/-   ##
==========================================
- Coverage   68.10%   67.91%   -0.19%     
==========================================
  Files          56       56              
  Lines        2414     2425      +11     
==========================================
+ Hits         1644     1647       +3     
- Misses        770      778       +8

Files Changed	Coverage Δ
src/Gradus.jl	`28.57% <0.00%> (-6.73%)`	⬇️
src/tracing/tracing.jl	`90.38% <ø> (ø)`
src/tracing/geodesic-problem.jl	`97.05% <100.00%> (+0.08%)`	⬆️
src/tracing/method-implementations/auto-diff.jl	`97.87% <100.00%> (+0.09%)`	⬆️

fjebaker · 2023-07-31T10:26:41Z

I discovered I was only plotting the CPU offload solutions. With adaptive timestepping, we don't have anything to plot beyond start and endpoint, but with fixed timestep we actually get (slow) curves to reconstruct:

Integration termination via the callback functions doesn't seem to be working at the moment. Similarly the status codes don't update.

fjebaker · 2023-08-01T09:17:19Z

GPU:

+ Starting trace...
Rendering: 100%[========================================] Time: 0:00:09 (57.58 μs/it)
+ Trace complete.
  9.217382 seconds (3.04 M allocations: 246.754 MiB, 0.43% gc time)

CPU:

+ Starting trace...
Rendering: 100%[========================================] Time: 0:00:19 ( 0.12 ms/it)
+ Trace complete.
 19.528674 seconds (1.68 M allocations: 202.045 MiB, 0.09% compilation time)

But I think the CPU Float32 implementation is all over the place, with all sorts of hidden conversions going on. This is reenforced by the shadow image it projects:

fjebaker · 2023-08-01T10:32:54Z

For reference, the above 400x400 are still around a factor 2x faster on CPU Float64 (6 threads) than GPU Float32 (Metal).

fjebaker · 2023-08-01T22:50:48Z

For a 1000x1000 shadow render:

GPU Float32: 26.861899 seconds (19.00 M allocations: 1.505 GiB, 0.91% gc time)
CPU Float64: 24.358143 seconds (10.14 M allocations: 1.722 GiB, 1.04% gc time)

There's a few things that aren't quite fair here on the GPU, since it's still doing the point function evaluation on the CPU.

fjebaker · 2023-08-01T22:58:40Z

The batch solves failing I suspect may be related to fast math calls, but I'm going to leave it for now and investigate this at a later stage.

Add GPU support

fjebaker added 5 commits July 30, 2023 14:28

fix: avoid type promotions in generated function

117092c

fix: tidy ode function

3a5f9f9

fix(todo): don't forward kwargshandle when using GPU

815ad76

fix(todo): temporary fix for sincos derivs on Metal

a2f3b8b

fix: don't propagate type too early in symbolics

db2d8d6

fix: infer float type for symbolics correctly

dc4801a

fjebaker added 3 commits July 31, 2023 12:19

fix: propagate type info to render velfunc

96e7b22

feat: integrator verbose settings forwarded

af45c2c

fix: weird performance regression when seperating generated function

a04268a

fjebaker added 4 commits August 1, 2023 10:34

fix: better type stability in impact parameters

b4fd0a5

fix: fix promotion issues in geodesic equation

5244c7f

fix: type stability in solving radii

0fbc456

feat!: type parameter for point functions

92b6c2e

fjebaker added 2 commits August 1, 2023 23:42

chore: forward kwargshandle again

f4eac1c

feat: DiffEqGPU extension

f8e60e0

fjebaker marked this pull request as ready for review August 1, 2023 22:53

fjebaker merged commit bd920f6 into main Aug 1, 2023

fjebaker deleted the fergus/gpu branch August 1, 2023 23:02

fjebaker added a commit that referenced this pull request Aug 22, 2023

Merge pull request #137 from astro-group-bristol/fergus/gpu

b359dc2

Add GPU support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPU support #137

Add GPU support #137

fjebaker commented Jul 30, 2023 •

edited

Loading

codecov-commenter commented Jul 30, 2023

fjebaker commented Jul 31, 2023

fjebaker commented Aug 1, 2023

fjebaker commented Aug 1, 2023

fjebaker commented Aug 1, 2023

fjebaker commented Aug 1, 2023

Add GPU support #137

Add GPU support #137

Conversation

fjebaker commented Jul 30, 2023 • edited Loading

State of the device

codecov-commenter commented Jul 30, 2023

Codecov Report

fjebaker commented Jul 31, 2023

fjebaker commented Aug 1, 2023

fjebaker commented Aug 1, 2023

fjebaker commented Aug 1, 2023

fjebaker commented Aug 1, 2023

fjebaker commented Jul 30, 2023 •

edited

Loading