-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunked copy in copyto_unalised for nD Cartesian destination #53234
Conversation
I'm not sure if this is the correct direction. We just need to solve the bootstrap issue. |
A part of the performance difference here would be reduced if #53158 is resolved. However, profiling suggests that integer comparisons take up a lot of time in iterating over Cartesian ranges, so if we may replace that by linear ranges, there is some performance gain to be had there. |
Trying out the suggestion for the shared iterator case: julia> function copyto_unaliased!(deststyle::IndexStyle, dest::AbstractArray, srcstyle::IndexStyle, src::AbstractArray)
isempty(src) && return dest
destinds, srcinds = LinearIndices(dest), LinearIndices(src)
idf, isf = first(destinds), first(srcinds)
Δi = idf - isf
(checkbounds(Bool, destinds, isf+Δi) & checkbounds(Bool, destinds, last(srcinds)+Δi)) ||
throw(BoundsError(dest, srcinds))
if deststyle isa IndexLinear
if srcstyle isa IndexLinear
# Single-index implementation
@inbounds for i in srcinds
if isassigned(src, i)
dest[i + Δi] = src[i]
else
_unsetindex!(dest, i + Δi)
end
end
else
# Dual-index implementation
i = idf - 1
@inbounds for a in eachindex(src)
i += 1
if isassigned(src, a)
dest[i] = src[a]
else
_unsetindex!(dest, i)
end
end
end
else
iterdest, itersrc = eachindex(dest), eachindex(src)
if iterdest == itersrc
# Shared-iterator implementation
@inbounds @simd for I in iterdest
if isassigned(src, I)
dest[I] = src[I]
else
_unsetindex!(dest, I)
end
end
else
# Dual-iterator implementation
ret = iterate(iterdest)
@inbounds for a in itersrc
idx, state = ret::NTuple{2,Any}
if isassigned(src, a)
dest[idx] = src[a]
else
_unsetindex!(dest, idx)
end
ret = iterate(iterdest, state)
end
end
end
return dest
end
copyto_unaliased! (generic function with 1 method)
julia> a = rand(200, 200); b = rand(size(a)...);
julia> @btime $a[reverse.(axes($a))...] .= @view $b[reverse.(axes($b))...];
544.766 μs (0 allocations: 0 bytes)
julia> @btime copyto_unaliased!(IndexCartesian(), $(view(a, reverse.(axes(a))...)), IndexCartesian(), $(view(b, reverse.(axes(b))...)));
492.596 μs (0 allocations: 0 bytes)
julia> @btime $a[axes($a)...] .= @view $b[axes($b)...];
73.170 μs (0 allocations: 0 bytes)
julia> @btime copyto_unaliased!(IndexCartesian(), $(view(a, axes(a)...)), IndexCartesian(), $(view(b, axes(b)...)));
75.533 μs (0 allocations: 0 bytes) The performances with and without |
Closing this until I understand the reason behind this better |
The idea is that if the indices of the destination are
CartesianIndices((r1, r2))
, we may treat it as a collection of slices with indicesCartesianIndices((r1,))
, and possibly dispatch to the more efficient linear-indexing branch for the individual slices. The method is called recursively on the slices.Performance comparisons:
One concern is that constructing the
view
s may allocate in certain cases (see e.g. #53231):