Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gpu dispatching on ReshapedArrays #1861

Closed
wants to merge 1 commit into from

Conversation

charleskawczynski
Copy link
Member

This PR fixes gpu dispatching when the parent array fits into one of the following categories:

const cu_array = CUDA.CuArray
const TSubArray{T} = SubArray{<:Any, <:Any, <:T}
const TReshapedArray{T} = Base.ReshapedArray{<:Any, <:Any, <:T}

const CuArrayBackedTypes = Union{
    cu_array,
    TSubArray{cu_array},
    TSubArray{TReshapedArray{cu_array}},
    TSubArray{TReshapedArray{TSubArray{cu_array}}},
    TReshapedArray{cu_array},
    TReshapedArray{TSubArray{cu_array}},
}

After using launch_configuration for VIJFH datalayouts, this should fix #1854 (xref: #1854 (comment)).

This PR also changes to using CUDA's launch_configuration for the VF datalayouts, I'll take a look at the stencil benchmarks to see if/how the performance changes.

I've added a test only for DataLayouts fill!, as we use CuArrayBackedTypes for the copyto! kernels as well. We should probably add correctness tests for these other cases, as these composite arrays (and presumably their indexing) is, I imagine, quite complex.

The type declaration for CuArrayBackedTypes is rather complicated, and it seems impossible to define this for all possible types. I think we might need to switch to a dispatch solution, by recursing through the parent array until we either find Array or CuArray, and then dispatch on a 3rd argument.

@charleskawczynski
Copy link
Member Author

Superseded by #1863

@charleskawczynski charleskawczynski deleted the ck/reshaped_arrays branch August 22, 2024 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Higher resolution column cases cannot be run on GPU
1 participant