Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce allocations in rendering #10

Closed
fjebaker opened this issue Apr 21, 2022 · 3 comments · Fixed by #18
Closed

Reduce allocations in rendering #10

fjebaker opened this issue Apr 21, 2022 · 3 comments · Fixed by #18
Labels
integrator Issue or pull request related to integration methods optimization Issue or pull request related to performance

Comments

@fjebaker
Copy link
Member

fjebaker commented Apr 21, 2022

Tracking allocations and analyzing with Coverage.jl shows that we allocate enormously during problem setup:

 CoverageTools.MallocInfo(0, "dev/Gradus/src/metrics/boyer-lindquist-fo.jl.101246.mem", 335)
 CoverageTools.MallocInfo(736, "dev/Gradus/src/Rendering/render.jl.101246.mem", 80)
 CoverageTools.MallocInfo(736, "dev/Gradus/src/Rendering/render.jl.101246.mem", 81)
 CoverageTools.MallocInfo(6528, "dev/Gradus/src/Rendering/render.jl.101246.mem", 77)
 CoverageTools.MallocInfo(7360, "dev/Gradus/src/GeodesicTracer/constraints.jl.101246.mem", 29)

The matrix dispatch already has inplace modifications, but for e.g. vectors of static vectors:

# vectors of vectors
function constrain_all(m::AbstractMetricParams{T}, us, vs, μ) where {T<:Number}
curried(u, v) = constrain_all(m, u, v, μ)
curried.(us, vs)
end

There is no need to use map here, we should be able to re-use the same container.

@fjebaker fjebaker added integrator Issue or pull request related to integration methods optimization Issue or pull request related to performance labels Apr 21, 2022
@fjebaker fjebaker changed the title Reduce allocations during velocity constraints Reduce allocations in rendering Apr 24, 2022
@fjebaker
Copy link
Member Author

A small update:

using Gradus, StaticArrays
m = BoyerLindquistAD(M=1.0, a=0.998)

u = @SVector [0.0, 1000.0, deg2rad(90), 0.0]

img = @time rendergeodesics(
    m, u, 2000.0; 
    abstol=1e-9, reltol=1e-9, 
    image_width=200 * 4, 
    image_height=200 * 4, 
    fov_factor=13.0 * 4,
    #verbose = false,
)

# 12 threads:
# 55.465356 seconds (1.47 G allocations: 51.276 GiB, 33.66% gc time, 4.16% compilation time)
# without gc 37s

# after some alloc hunting
# 51.923469 seconds (1.47 G allocations: 48.570 GiB, 35.03% gc time, 0.01% compilation time)

# 24 threads:
# 41.287283 seconds (1.47 G allocations: 48.598 GiB, 52.45% gc time)

We're losing as much as 50+% of our render time just to GC. This needs to be optimized.

@fjebaker
Copy link
Member Author

I rewrote the renderer (PR pending) and the GC is just insane:

using Gradus, StaticArrays
m = BoyerLindquistAD(M=1.0, a=0.998)

u = @SVector [0.0, 1000.0, deg2rad(90), 0.0]

img = @time rendergeodesics(
    m, u, 2000.0; 
    abstol=1e-5, reltol=1e-5,
    image_width=200 * 4 + 1, 
    image_height=200 * 4 + 1, 
    fov_factor=13.0 * 4,
    verbose = true
)

# + Starting trace...
#   8.556839 seconds (454.59 M allocations: 20.855 GiB)
# + Trace complete.
#   8.562259 seconds (454.59 M allocations: 20.859 GiB)

# + Starting trace...
#  12.083763 seconds (454.59 M allocations: 20.855 GiB, 31.28% gc time)
# + Trace complete.
#  12.090315 seconds (454.59 M allocations: 20.859 GiB, 31.26% gc time)

# + Starting trace...
#  25.039618 seconds (454.60 M allocations: 20.855 GiB, 53.35% gc time)
# + Trace complete.
#  25.043922 seconds (454.60 M allocations: 20.859 GiB, 53.34% gc time)

# julia> GC.gc()

# + Starting trace...
#   7.484149 seconds (455.62 M allocations: 20.903 GiB)
# + Trace complete.
#   7.492415 seconds (455.62 M allocations: 20.908 GiB)

If the GC isn't hit, we outperform the hand-written elliptic curve integrator YNOGK which benchmarked at 12 seconds for the identical problem on the same machine (Typhon, running 12 / 128 threads). If the GC is hit, we can be nearly twice as slow.

@fjebaker fjebaker moved this to Todo 🐣 in Gradus.jl Apr 24, 2022
@fjebaker fjebaker moved this from Todo 🐣 to In Progress 🦆 in Gradus.jl Apr 24, 2022
@fjebaker
Copy link
Member Author

I've been able to get this right down:

using Gradus, StaticArrays
m = BoyerLindquistAD(M=1.0, a=0.998)

u = @SVector [0.0, 1000.0, deg2rad(90), 0.0]

img = @time rendergeodesics(
    m, u, 2000.0; 
    abstol=1e-9, reltol=1e-9,
    image_width=200 * 4, 
    image_height=200 * 4, 
    fov_factor=13.0 * 4,
    verbose = true
)

# + Starting trace...
# Rendering: 100%[========================================] Time: 0:00:14
# + Trace complete.
#  14.672502 seconds (15.57 M allocations: 5.439 GiB)

Note that is at 1e-9 accuracy! For 1e-5:

# + Starting trace...
# Rendering: 100%[========================================] Time: 0:00:04
# + Trace complete.
#   4.442164 seconds (15.60 M allocations: 5.440 GiB)

I noticed, when using SecondOrderODEProblem instead of ODEProblem (either because of the array tooling used under the hood or otherwise) the allocations are far greater. I've converted Gradus.jl to use ODEProblems, which also have the added bonus that position is u[1:4] for both first and second order now.

This also puts us in a better position for #8 should we ever want to revist that.

Repository owner moved this from In Progress 🦆 to Done ✅ in Gradus.jl Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrator Issue or pull request related to integration methods optimization Issue or pull request related to performance
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant