Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU-GPU specialization -> dispatching #1863

Merged
merged 1 commit into from
Jul 2, 2024
Merged

Conversation

charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented Jul 1, 2024

This PR is an alternative to #1861, and I'm feeling more inclined to move forward with this design. This PR switches from using specialization on CuArrayBackedTypes, which required naming specific types:

const cu_array = CUDA.CuArray
const TSubArray{T} = SubArray{<:Any, <:Any, <:T}
const TReshapedArray{T} = Base.ReshapedArray{<:Any, <:Any, <:T}

const CuArrayBackedTypes = Union{
    cu_array,
    TSubArray{cu_array},
    TSubArray{TReshapedArray{cu_array}},
    TSubArray{TReshapedArray{TSubArray{cu_array}}},
    TReshapedArray{cu_array},
    TReshapedArray{TSubArray{cu_array}},
}

that could be passed in, to using dispatch on either Array or CuArray. The advantage of dispatching is that we no longer have to account for every possible way that these types are composed (e.g., the above definition will break on TSubArray{TSubArray{cu_array}} and TSubArray{TSubArray{TSubArray{cu_array}}}). The way that this PR achieves this is by having the top-level methods (Base.copyto! and Base.fill!) call (for example) Base.copyto!(dest, bc, backed_array(dest, bc)). backed_array does some checking, and then calls an internal _backed_array which is recursively defined for all array types that we support:

_backed_array(x::CUDA.CuArray) = x # in extension
_backed_array(x::Array) = x
_backed_array(x::SubArray) = _backed_array(parent(x))
_backed_array(x::Base.ReshapedArray) = _backed_array(parent(x))
_backed_array(x::AbstractData) = _backed_array(parent(x))

This way, every possible combination of CPU/GPU-backed SubArray and ReshapedArray are now supported and should "just work".

The price we pay is that now we're dispatching on 3 arguments (e.g., fill!(::VF, val, ::Array)), which is not as pretty, but will not involve the existing whack-a-mole situation.

@Sbozzolo
Copy link
Member

Sbozzolo commented Jul 1, 2024

Do think it is possible to shomehow also provide a two-argument fill and copyto? I think it would be very nice to do so because we would follow the same style as Base

@charleskawczynski
Copy link
Member Author

charleskawczynski commented Jul 1, 2024

Do think it is possible to shomehow also provide a two-argument fill and copyto? I think it would be very nice to do so because we would follow the same style as Base

The top-level methods are two-argument based:

Base.fill!(dest::AbstractData, val) = Base.fill!(dest, val, backed_array(dest))
Base.copyto!(
    dest::AbstractData,
    bc::Union{AbstractData, Base.Broadcast.Broadcasted},
) = Base.copyto!(dest, bc, backed_array(dest))

but they immediately call the 3-argument methods in order to properly dispatch.

@Sbozzolo
Copy link
Member

Sbozzolo commented Jul 1, 2024

Do think it is possible to shomehow also provide a two-argument fill and copyto? I think it would be very nice to do so because we would follow the same style as Base

The top-level methods are two-argument based:

Base.fill!(dest::AbstractData, val) = Base.fill!(dest, val, backed_array(dest))
Base.copyto!(
    dest::AbstractData,
    bc::Union{AbstractData, Base.Broadcast.Broadcasted},
) = Base.copyto!(dest, bc, backed_array(dest))

but they immediately call the 3-argument methods in order to properly dispatch.

Perfect! I missed this!

Copy link
Member

@Sbozzolo Sbozzolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me

@charleskawczynski
Copy link
Member Author

Ah, some of the backed_array calls lead to MArrays. I can't think of a good name for Union{Array, MArray}, so instead I've defined new types:

abstract type AbstractDispatchToDevice end
struct ToCPU <: AbstractDispatchToDevice end
struct ToCUDA <: AbstractDispatchToDevice end

and renamed backed_array to dispatch_device:

device_dispatch(dest::AbstractData) = _device_dispatch(dest)

_device_dispatch(x::Array) = ToCPU()
_device_dispatch(x::SubArray) = _device_dispatch(parent(x))
_device_dispatch(x::Base.ReshapedArray) = _device_dispatch(parent(x))
_device_dispatch(x::AbstractData) = _device_dispatch(parent(x))
_device_dispatch(x::SArray) = ToCPU()
_device_dispatch(x::MArray) = ToCPU()

And catch CPU/GPU code paths accordingly.

This is perhaps just a symptom of not having the ClimaComms context/device not stored in the datalayouts. But, I think doing that would be pretty breaking, so let's just improve the existing design for now, and we can tackle the comms context part separately if we deem it important enough.

@charleskawczynski charleskawczynski force-pushed the ck/gpu_dispatch branch 2 times, most recently from e6834c0 to a6a29fe Compare July 1, 2024 20:26
@charleskawczynski
Copy link
Member Author

charleskawczynski commented Jul 2, 2024

Ah, interesting, there is a catastrophic case. On main:

function Base.copyto!(
    dest::VIFH{S, Nv, Ni},
    bc::BroadcastedUnionVIFH{S, Nv, Ni},
) where {S, Nv, Ni}
    (_, _, _, _, Nh) = size(bc)
    # copy contiguous columns
    @inbounds for h in 1:Nh, i in 1:Ni
        col_dest = column(dest, i, h)
        col_bc = column(bc, i, h)
        copyto!(col_dest, col_bc)
    end
    return dest
end

is executed for when running on the CPU and the GPU. It works because, on the CPU, we launch Nh*Ni kernel launches (column is defined on CPU and GPU for broadcasted objects), 1 per column!

We likely never run into this but, if we do / were to, this fix will be a major performance improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants