-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocations when using dualcache #31
Comments
https://github.com/SciML/PreallocationTools.jl/blob/v0.4.0/src/PreallocationTools.jl#L57 The This was supposed to be solved: JuliaLang/julia#34126 JuliaLang/julia#14955 |
If using a simple using ForwardDiff, PreallocationTools
function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: ForwardDiff.Dual}
return reshape(reinterpret(T, dc.dual_du), size(dc.du))
end
function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: Float64}
return dc.du
end
function get_tmp2(dc::PreallocationTools.DiffCache, u::AbstractArray{T}) where T<:ForwardDiff.Dual
nelem = div(sizeof(T), sizeof(eltype(dc.dual_du)))*length(dc.du)
if nelem > length(dc.dual_du)
enlargedualcache!(dc, nelem)
end
reshape(reinterpret(T, view(dc.dual_du, 1:nelem)),size(dc.du))
end
function test()
ChunkSize = 12
for n in (10, 20)
println("\nTesting for n=", n)
A = zeros(n);
DA = zeros(ForwardDiff.Dual{ForwardDiff.Tag{nothing, Float64}, Float64, ChunkSize}, 10);
cache = dualcache(A, ChunkSize);
print("... get_tmp on Vector{Float64}\n")
a = @allocated my_get_tmp(cache, A); println(a)
print("... get_tmp on Vector{Dual}\n")
a = @allocated get_tmp(cache, DA); println(a)
print("... my_get_tmp on Vector{Float64}\n")
a = @allocated my_get_tmp(cache, A); println(a)
print("... my_get_tmp on Vector{Dual}\n")
a = @allocated my_get_tmp(cache, DA); println(a)
print("... get_tmp2 on Vector{Dual}\n")
a = @allocated get_tmp2(cache, DA); println(a)
end
end
test() |
Removing the view was the first thing I did when I tried to write my own, it didn't help: using ForwardDiff, PreallocationTools, ArrayInterfaceCore
function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: ForwardDiff.Dual}
return reshape(reinterpret(T, dc.dual_du), size(dc.du))
end
function get_tmp2(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: ForwardDiff.Dual}
ArrayInterfaceCore.restructure(dc.du, reinterpret(T, dc.dual_du))
end
function test()
ChunkSize = 12
n = 10
A = zeros(n);
DA = zeros(ForwardDiff.Dual{ForwardDiff.Tag{nothing, Float64}, Float64, ChunkSize}, 10);
cache = dualcache(A, ChunkSize);
print("... get_tmp on Vector{Dual}\n")
a = @allocated get_tmp(cache, DA); println(a)
print("... get_tmp2 on Vector{Dual}\n")
a = @allocated get_tmp2(cache, DA); println(a)
print("... my_get_tmp on Vector{Dual}\n")
a = @allocated my_get_tmp(cache, DA); println(a)
end
test() gives: ... get_tmp on Vector{Dual}
1168
... get_tmp2 on Vector{Dual}
1168
... my_get_tmp on Vector{Dual}
0 Since I didn't know all of what's going on under the hood in restructure I switched it to reshape and voila. |
Is restructure changing the view to an array? |
I checked with earlier versions of PreallocationTools.jl (down to 0.2.4) where it was still ArrayInterface 5.y.z; the allocations also happen there. |
And is it because of changing a view to an Array? |
Is this the right way to test it? using ArrayInterfaceCore
a = zeros(10)
ArrayInterfaceCore.restructure(a, reinterpret(Float32, view(a, 1:5))) Then yes:
|
There's the allocation then. Does Adapt.jl's |
I’ll have to look some more into it, but it looks like Regarding My proposal right now would be to do a fix for normal Arrays (simply doing a reshape, as I did above; same as dbstein’s version, but keeping the view) and care about the more advanced AbstractArrays later (ArrayPartition and friends). CuArray would still need adapt though, right? (I don’t have a readily accessible Cuda GPU at hand to test). |
Yeah we can create a specialized dispatch on Array for now. I think in general it cannot just be a reshape because yes, it may need to be type matching in which case the view would get converted. |
Just a note here to say that I am playing around on this. I hope to have a PR with a (at least partial) fix today or tomorrow. |
Thanks for the fast work on it. Writing non-allocating Jacobians for complicated functions (or even allocating ones, for that matter) is a huge pain, and getting them for free is nothing short of magical. |
#32 fixes the allocations for Base Arrays now. However, I couldn't get rid of the That said, I don't know whether the wrappers interfere with anything downstream negatively. At least in the cases that I use PreallocationTools for, it didn't cause a performance loss (actually, a small gain due to the allocations being gone). I tested with Adapt, but also with unsafe_wrap (as a last resort): function _restructure(normal_cache::Array, duals)
unsafe_wrap(Array, pointer(duals), size(normal_cache))
end That gets rid of the wrappers and returns a correctly sized Array with duals, but it also causes allocations (because the view() is also removed?). Finally, the more complex AbstractArray cases still allocate. |
@chriselrod do you know if there's a more general solution here? |
I got: julia> test()
... get_tmp on Vector{Dual}
0
... get_tmp2 on Vector{Dual}
1168
... my_get_tmp on Vector{Dual}
0
|
Yes we know. But is it possible to solve? |
Don't make a copy? Like, obviously we can avoid these allocations. SimpleChains uses just a single I'd just need more context on the design and why things are the way they are now to suggest an alternative approach. |
I am intrigued by this comment; it might be simpler to adapt this approach here as well. I don't think it will end up in a general solution for any AbstractArray type, though, but maybe it's a start towards one? |
The main issue is that it's a missing primitive, so it needs something in ArrayInterfaceCore to capture it. There is no primitive for "restructure but non-allocating". I tend to think that this is a more fundamental issue that SubArray -> Array should just be a pointer thing and shouldn't actually allocate a new array, and I'm not sure why that is true in Julia, and so it would probaby be best to just get that fixed. But this notion of restructuring is actually quite important to PreallocationTools, in fact, it's how the current DiffCache works, so having some verbiage to ask for reshaping functionality to do this kind of stuff is likely useful in many places as long as this SubArray -> Array allocation exists. |
Perhaps I'm doing something wrong in how I set things up, but dualcache seems to always generate allocations when requesting the dual portion of the cache. Here's the most minimal example I could think of; I posted a slightly less minimal example here. Writing my own (simpler, less flexible) version of get_tmp eliminates the issue. The following code:
gives:
I.e. results in allocations that scale with the vector size and are slightly larger than what I would think would be needed (8*Chunksize*n=960 for n=10 and 1920 for n=20). Using Julia v1.7.3, PreallocationTools v0.4.0, on Mac OSX 12.5.
The text was updated successfully, but these errors were encountered: