GPU support #8

fjebaker · 2022-04-21T15:51:36Z

This has been a point of contention for a while. The current state of DiffEqGPU.jl does not support SecondOrderODEProblems out of the box, and a patch I wrote for DiffEqGPU's problematic parse reveals a more fundemental problem:

The full integration batch terminates when a single geodesic terminates. If a single geodesic falls into the black hole or intersects with the disc, the whole simultaneous solve is terminated.

We also need to evaluate to what degree we want to provide GPU support for the integration problems, as it may reduce us to Float32 operations, and be an awful lot of work for marginal performance gain.

A single trace is a relatively cheap problem, in the micro-second solve time, so it should be entirely feasible and logical to chuck this on the GPU -- the question is just how.

As a note, the above issue with SecondOrderODEProblems can also be solved by converting our problem to a first order one and manually updating the position vector.

phajy · 2022-04-21T16:14:35Z

If I remember correctly (caveat: I might not be remembering correctly!) I ran into a similar "problem" when I first wrote the ray tracer in CUDA. The issue was that each thread needed to be doing the same things simultaneously. When rays reached their destination (e.g., disk or horizon) those individual threads would stop integrating (so there was a condition on whether or not to take another step, e.g., theta < pi/2, r > r_h) but not quit until all the threads had finished. Can't remember what I used as a global termination condition though (perhaps some ray getting to +"infinity"), or how I dealt with variable time steps. Hmm.

fjebaker · 2022-04-22T05:27:27Z

I managed to get a sort of fix working with DiffEqGPU, enough to be able to get it to run... as it currently stands, the bottlenecking is highly problematic, and would require sophisticated batching in order to match similar geodesics together (even then I'm not convinced that would resolve the performance). There's also an issue I think I've found which means the DiffEqGPU methods aren't type stable...

In the time it takes for one of the threads to finish a particularly difficult geodesics, and block the warp from terminating, the CPU methods have already churned through a few thousand more.

CPU: 0.041316 seconds (193.04 k allocations: 22.450 MiB)
GPU: 0.755816 seconds (1.25 M allocations: 110.112 MiB)

This isn't a rigorous benchmark, but indicative of the problem we'd be facing. I think for now just chucking more CPU cores at the problem is probably fine, and we'll save the GPU for doing the analysis. With the render caches, we'll probably end up performing more analysis than rendering anyway, so I think this is a bit of a dead end still.

Shame really, as this is one of those embarassingly parallel problems you'd really expect the GPU to excell at :/

phajy · 2022-04-22T10:54:04Z

Yes, throwing more CPUs at the problem seems like the best way to go for now. I agree that it feels like GPUs should be fantastic for this, but perhaps it isn't quite simple enough given that the behaviour of different rays can be qualitatively different.

fjebaker · 2023-02-14T15:06:25Z

Spent a bit of time today checking out how the new GPUTsit5 integrator performs compared to the multi-threaded CPU method.

There's a number of caveats with the GPU solver, namely that the adaptive step size just seems to fall apart on this problem, so fixed step size is required. These benchmarks are also with all callbacks removed, and boil down to:

m = TestMetric(Float32(1.0), Float32(0.0))
u = SVector{4,Float32}(0.0, 1000.0, π/2, 0.0)
t_domain = Float32.((0.0, 2000.0))
αs = range(6.0f0, 20.0f0, N)

ens_prob = make_problem(m, u, αs, t_domain)
cpu = @timed solve(
    ens_prob, 
    Tsit5(), 
    EnsembleThreads(), 
    trajectories = length(αs), 
    save_everystep = false
);
gpu = @timed solve(
    ens_prob, 
    GPUTsit5(), 
    EnsembleGPUKernel(), 
    trajectories = length(αs), 
    dt=1.0f0, 
    adaptive=false, 
    save_everystep = false
);

This is not quite a fair comparison, since they are using different algorithms, but there is no reason why we would want to use the fixed step size CPU solver, so this reflects the state of this issue as of today.

I obtain similar performance on the example problem in the DiffEqGPU.jl readme.

fjebaker · 2023-02-19T20:56:09Z

Should benchmark just the geodesic_equation function and not just the full solver to try and identify if this is an algorithmic overhead, or something more fundemental with our problem formulation.

fjebaker · 2023-02-20T18:05:36Z

Using adaptive is much much faster on GPU (faster than CPU even) but from my tests this only works if the geodesics are sufficiently far from the central singularity, else it timesteps at less than 10^-14 and errors. But this is promising, and merrits trying to get the adaptive GPU working on the full domain.

fjebaker added enhancement New feature or request integrator Issue or pull request related to integration methods optimization Issue or pull request related to performance labels Apr 21, 2022

fjebaker added this to Gradus.jl Apr 21, 2022

fjebaker moved this to Todo 🐣 in Gradus.jl Apr 21, 2022

fjebaker moved this from Todo 🐣 to In Progress 🦆 in Gradus.jl Apr 22, 2022

fjebaker moved this from In Progress 🦆 to In Review 👁️ in Gradus.jl Apr 22, 2022

fjebaker mentioned this issue Apr 25, 2022

Reduce allocations in rendering #10

Closed

fjebaker mentioned this issue Jan 20, 2023

Remove code bloat and improve hygiene #68

Open

6 tasks

fjebaker removed enhancement New feature or request optimization Issue or pull request related to performance labels Feb 14, 2023

fjebaker mentioned this issue Feb 15, 2023

Feat: SVector as common currency #83

Merged

fjebaker mentioned this issue Jun 7, 2023

fix: type stability when using non-Float64 #116

Merged

fjebaker mentioned this issue Jul 30, 2023

Add GPU support #137

Merged

5 tasks

fjebaker closed this as completed in #137 Aug 1, 2023

github-project-automation bot moved this from In Review 👁️ to Done ✅ in Gradus.jl Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU support #8

GPU support #8

fjebaker commented Apr 21, 2022

phajy commented Apr 21, 2022

fjebaker commented Apr 22, 2022

phajy commented Apr 22, 2022 •

edited

Loading

fjebaker commented Feb 14, 2023

fjebaker commented Feb 19, 2023

fjebaker commented Feb 20, 2023

GPU support #8

GPU support #8

Comments

fjebaker commented Apr 21, 2022

phajy commented Apr 21, 2022

fjebaker commented Apr 22, 2022

phajy commented Apr 22, 2022 • edited Loading

fjebaker commented Feb 14, 2023

fjebaker commented Feb 19, 2023

fjebaker commented Feb 20, 2023

phajy commented Apr 22, 2022 •

edited

Loading