Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dispatching for some cuda kernels #1787

Merged
merged 1 commit into from
Jun 10, 2024
Merged

Conversation

charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented Jun 5, 2024

This should close #1786, but I think we need more careful testing around this.

This PR:

  • revamps and adds tests for fill!
  • Makes size return the correct number of fields for all datalayouts, instead of returning 1 (this may need to be reverted if we rely on this).
  • Fixes a bug in our setindex! for IFH datalayouts.

cc @trontrytel

@charleskawczynski charleskawczynski force-pushed the ck/fix_dispatch branch 2 times, most recently from 7cce12f to 525be4d Compare June 9, 2024 01:22
@charleskawczynski
Copy link
Member Author

charleskawczynski commented Jun 9, 2024

So, it turns out that I discovered a rather insidious bug in our DataLayouts. Both of our setindex! methods in our IFH DataLayouts use Val(3) (wrong) instead of Val(2) (correct) for the field index:

@inline function Base.setindex!(data::IFH{S}, val, i, _, _, _, h) where {S}
    @inbounds set_struct!(
        parent(data),
        convert(S, val),
        Val(3),
        CartesianIndex(i, 1, h),
    )
end
@inline function Base.setindex!(
    data::IFH{S},
    val,
    I::CartesianIndex{5},
) where {S}
    i, _, _, _, h = I.I
    @inbounds set_struct!(
        parent(data),
        convert(S, val),
        Val(3),
        CartesianIndex(i, 1, h),
    )
end

The (very simple) test that I added for fill is what caught this. I looked at all of the other datalayouts, and it seems like this was the only place where we made this mistake.

This impacts both the CPU and gpu. Looking around at ClimaCore internals, this may impact:

@charleskawczynski charleskawczynski force-pushed the ck/fix_dispatch branch 5 times, most recently from 5187e13 to da396fe Compare June 9, 2024 14:26
@charleskawczynski charleskawczynski marked this pull request as ready for review June 10, 2024 14:36
@charleskawczynski
Copy link
Member Author

Ok, CI is currently broken:

  • The latest SciMLBase is now calling ndims(::Type{<:Field})/ndims(::Type{<:AbstractData}), so we need to define it.
  • Also, the docs are not running with the latest version of ClimaCore

I've also joined a few docs from DataLayouts, because we cannot document a struct and outer constructors separately (at least Documenter doesn't seem to be picking it up).

There's a lot more that we can do, but I'd like to split off additional work so that this PR doesn't spiral out of control.

ext/ClimaCoreCUDAExt.jl Outdated Show resolved Hide resolved
@charleskawczynski charleskawczynski force-pushed the ck/fix_dispatch branch 3 times, most recently from efb0865 to abae02c Compare June 10, 2024 16:30
@charleskawczynski
Copy link
Member Author

charleskawczynski commented Jun 10, 2024

Alright, I think that this should (now) work..

Fix

Update test filenames, ndims on DataLayouts - fix CI

Build latest version in docs

Fixes for cuda kernels with empty fields
@charleskawczynski
Copy link
Member Author

Alright, the only failure is due to a fix needed in multibroadcast field copyto!s, I just fixed and checked it locally, I'm going to manually merge instead of re-running CI.

@charleskawczynski charleskawczynski merged commit 25792d4 into main Jun 10, 2024
12 of 14 checks passed
@charleskawczynski charleskawczynski deleted the ck/fix_dispatch branch June 10, 2024 17:27
@Sbozzolo
Copy link
Member

Alright, the only failure is due to a fix needed in multibroadcast field copyto!s, I just fixed and checked it locally, I'm going to manually merge instead of re-running CI.

Okay, I'd like to merge #1788 next (once we find a better name), so maybe you can add a commit to that PR that bumps the minimum version of MBF with the fix, once you release it.

@charleskawczynski
Copy link
Member Author

Okay, I'd like to merge #1788 next (once we find a better name), so maybe you can add a commit to that PR that bumps the minimum version of MBF with the fix, once you release it.

It's already done, the fix was already incorporated in this commit

@Sbozzolo
Copy link
Member

Perfect, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dispatch logic in copyto!/fill! is broken for SubArray's with parent CuArrays
2 participants