-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce allocations in rendering #10
Comments
A small update: using Gradus, StaticArrays
m = BoyerLindquistAD(M=1.0, a=0.998)
u = @SVector [0.0, 1000.0, deg2rad(90), 0.0]
img = @time rendergeodesics(
m, u, 2000.0;
abstol=1e-9, reltol=1e-9,
image_width=200 * 4,
image_height=200 * 4,
fov_factor=13.0 * 4,
#verbose = false,
)
# 12 threads:
# 55.465356 seconds (1.47 G allocations: 51.276 GiB, 33.66% gc time, 4.16% compilation time)
# without gc 37s
# after some alloc hunting
# 51.923469 seconds (1.47 G allocations: 48.570 GiB, 35.03% gc time, 0.01% compilation time)
# 24 threads:
# 41.287283 seconds (1.47 G allocations: 48.598 GiB, 52.45% gc time) We're losing as much as 50+% of our render time just to GC. This needs to be optimized. |
I rewrote the renderer (PR pending) and the GC is just insane: using Gradus, StaticArrays
m = BoyerLindquistAD(M=1.0, a=0.998)
u = @SVector [0.0, 1000.0, deg2rad(90), 0.0]
img = @time rendergeodesics(
m, u, 2000.0;
abstol=1e-5, reltol=1e-5,
image_width=200 * 4 + 1,
image_height=200 * 4 + 1,
fov_factor=13.0 * 4,
verbose = true
)
# + Starting trace...
# 8.556839 seconds (454.59 M allocations: 20.855 GiB)
# + Trace complete.
# 8.562259 seconds (454.59 M allocations: 20.859 GiB)
# + Starting trace...
# 12.083763 seconds (454.59 M allocations: 20.855 GiB, 31.28% gc time)
# + Trace complete.
# 12.090315 seconds (454.59 M allocations: 20.859 GiB, 31.26% gc time)
# + Starting trace...
# 25.039618 seconds (454.60 M allocations: 20.855 GiB, 53.35% gc time)
# + Trace complete.
# 25.043922 seconds (454.60 M allocations: 20.859 GiB, 53.34% gc time)
# julia> GC.gc()
# + Starting trace...
# 7.484149 seconds (455.62 M allocations: 20.903 GiB)
# + Trace complete.
# 7.492415 seconds (455.62 M allocations: 20.908 GiB) If the GC isn't hit, we outperform the hand-written elliptic curve integrator YNOGK which benchmarked at 12 seconds for the identical problem on the same machine (Typhon, running 12 / 128 threads). If the GC is hit, we can be nearly twice as slow. |
I've been able to get this right down: using Gradus, StaticArrays
m = BoyerLindquistAD(M=1.0, a=0.998)
u = @SVector [0.0, 1000.0, deg2rad(90), 0.0]
img = @time rendergeodesics(
m, u, 2000.0;
abstol=1e-9, reltol=1e-9,
image_width=200 * 4,
image_height=200 * 4,
fov_factor=13.0 * 4,
verbose = true
)
# + Starting trace...
# Rendering: 100%[========================================] Time: 0:00:14
# + Trace complete.
# 14.672502 seconds (15.57 M allocations: 5.439 GiB) Note that is at # + Starting trace...
# Rendering: 100%[========================================] Time: 0:00:04
# + Trace complete.
# 4.442164 seconds (15.60 M allocations: 5.440 GiB) I noticed, when using SecondOrderODEProblem instead of ODEProblem (either because of the array tooling used under the hood or otherwise) the allocations are far greater. I've converted Gradus.jl to use ODEProblems, which also have the added bonus that position is This also puts us in a better position for #8 should we ever want to revist that. |
Tracking allocations and analyzing with Coverage.jl shows that we allocate enormously during problem setup:
The matrix dispatch already has inplace modifications, but for e.g. vectors of static vectors:
Gradus.jl/src/GeodesicTracer/constraints.jl
Lines 26 to 30 in 4356a1a
There is no need to use map here, we should be able to re-use the same container.
The text was updated successfully, but these errors were encountered: