-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GPU support #137
Add GPU support #137
Conversation
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. @@ Coverage Diff @@
## main #137 +/- ##
==========================================
- Coverage 68.10% 67.91% -0.19%
==========================================
Files 56 56
Lines 2414 2425 +11
==========================================
+ Hits 1644 1647 +3
- Misses 770 778 +8
|
For reference, the above 400x400 are still around a factor 2x faster on CPU Float64 (6 threads) than GPU Float32 (Metal). |
For a 1000x1000 shadow render: GPU Float32: There's a few things that aren't quite fair here on the GPU, since it's still doing the point function evaluation on the CPU. |
The batch solves failing I suspect may be related to fast math calls, but I'm going to leave it for now and investigate this at a later stage. |
Closes #8
There's still a lot of work to do here:
kwargshandle
isn't passed tosolve
, and that thedt
/adaptive
is set correctlyrendergeodesics
This PR currently includes a temporary fix for handling sin / cos duals in ForwardDiff when dispatching on Metal.
State of the device
Rudimentary benchmarks look very promising (fully Float32):
However, the traces themselves do not. On the CPU, we get:
Whereas on the GPU:
Clearly there is something very wrong here. Since the impact parameters are set for$\alpha$ between 5 and 10, the lack of spread in the GPU picture might suggest that the initial steps of the integrator are poor, which then propagates further into the integration.
The performance is promising, and provided it doesn't degrade in trying to fix the numerical issues, then the GPU support should be very worthwhile.