-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU-GPU specialization -> dispatching #1863
Conversation
8b1f400
to
9667382
Compare
Do think it is possible to shomehow also provide a two-argument |
The top-level methods are two-argument based: Base.fill!(dest::AbstractData, val) = Base.fill!(dest, val, backed_array(dest))
Base.copyto!(
dest::AbstractData,
bc::Union{AbstractData, Base.Broadcast.Broadcasted},
) = Base.copyto!(dest, bc, backed_array(dest)) but they immediately call the 3-argument methods in order to properly dispatch. |
Perfect! I missed this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me
Ah, some of the abstract type AbstractDispatchToDevice end
struct ToCPU <: AbstractDispatchToDevice end
struct ToCUDA <: AbstractDispatchToDevice end and renamed device_dispatch(dest::AbstractData) = _device_dispatch(dest)
_device_dispatch(x::Array) = ToCPU()
_device_dispatch(x::SubArray) = _device_dispatch(parent(x))
_device_dispatch(x::Base.ReshapedArray) = _device_dispatch(parent(x))
_device_dispatch(x::AbstractData) = _device_dispatch(parent(x))
_device_dispatch(x::SArray) = ToCPU()
_device_dispatch(x::MArray) = ToCPU() And catch CPU/GPU code paths accordingly. This is perhaps just a symptom of not having the ClimaComms context/device not stored in the datalayouts. But, I think doing that would be pretty breaking, so let's just improve the existing design for now, and we can tackle the comms context part separately if we deem it important enough. |
9667382
to
988ca7f
Compare
e6834c0
to
a6a29fe
Compare
a6a29fe
to
adfc216
Compare
Ah, interesting, there is a catastrophic case. On main: function Base.copyto!(
dest::VIFH{S, Nv, Ni},
bc::BroadcastedUnionVIFH{S, Nv, Ni},
) where {S, Nv, Ni}
(_, _, _, _, Nh) = size(bc)
# copy contiguous columns
@inbounds for h in 1:Nh, i in 1:Ni
col_dest = column(dest, i, h)
col_bc = column(bc, i, h)
copyto!(col_dest, col_bc)
end
return dest
end is executed for when running on the CPU and the GPU. It works because, on the CPU, we launch We likely never run into this but, if we do / were to, this fix will be a major performance improvement. |
This PR is an alternative to #1861, and I'm feeling more inclined to move forward with this design. This PR switches from using specialization on
CuArrayBackedTypes
, which required naming specific types:that could be passed in, to using dispatch on either
Array
orCuArray
. The advantage of dispatching is that we no longer have to account for every possible way that these types are composed (e.g., the above definition will break onTSubArray{TSubArray{cu_array}}
andTSubArray{TSubArray{TSubArray{cu_array}}}
). The way that this PR achieves this is by having the top-level methods (Base.copyto!
andBase.fill!
) call (for example)Base.copyto!(dest, bc, backed_array(dest, bc))
.backed_array
does some checking, and then calls an internal_backed_array
which is recursively defined for all array types that we support:This way, every possible combination of CPU/GPU-backed
SubArray
andReshapedArray
are now supported and should "just work".The price we pay is that now we're dispatching on 3 arguments (e.g.,
fill!(::VF, val, ::Array)
), which is not as pretty, but will not involve the existing whack-a-mole situation.