-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor DSS to better leverage DataLayouts #1958
Conversation
5e3f3ed
to
82c9e28
Compare
515b92b
to
f859925
Compare
One benefit of this change, aside from deleting hundreds of lines of code, is that it also strictly and explicitly follows conversion rules (which already exist). Our previous implementation bypasses these checks since it accesses the parent arrays. |
@charleskawczynski Even with the modifications to use the existing |
I've reached out to others on slack for additional review.
I think we should be able to get good performance from this. |
f859925
to
a46a660
Compare
Hi @sriharshakandala, I ran the GPU target pipeline, and these are the preliminary results: main (build)
This PR (build)
I don't think that this implementation is necessarily responsible for the speedup, but I also don't think that it's slowed anything down. Also, I think that we can actually make it faster by avoiding the At this point, I think I'm ready to merge if people are happy with the changes. |
Hi @akshaysridhar, I've looked into this a bit more, and it seems that DSS does have storage for the vertical component, so the 3-component case should work just fine. The behavior that I didn't expect was coming from: ClimaCore.jl/src/Topologies/dss_transform.jl Lines 90 to 125 in f36251c
which transforms a |
a46a660
to
a2bc514
Compare
It looks good to me overall. I am not sure if |
We should also remove the comments indicating that we are treating the vertical component in a special way, since this is no longer true! |
a2bc514
to
907bb8f
Compare
Alright, moving forward with this. |
Remove unused indices and methods Fix some conversions
907bb8f
to
580394f
Compare
This PR refactors DSS to avoid using DataLayout internals, and reduces code duplication. We can apply a similar pattern to the cuda code (wip).
This should make fixing #1910 easier.
What this PR does:
(I, J, F, V, H)
, which recursively indexes into the field variables viaget_struct
/set_struct
.dss_transform
to transform the data.drop_vert_dim
inGeometry.UVWVector
due to a comment indss.jl
:For DSS of Covariant123Vector, the third component is treated like a scalar and is not transformed
. I'm not 100% sure if what I did was correct hereThe last bullet point still confuses me because it seems like
N
variables come in andN - M
are defined. Can someone confirm that this is correct?Also, I should note that I purposefully did not apply changes to other similar kernels, like
dss_local_kernel!
, because they do not loop over field variables. In all the kernels refactored in this PR, a single thread loops over field variables, so memory access wasn't significantly changed. It's not clear that the same refactoring pattern would perform as well withdss_local_kernel!
because, currently, all of the field variables are processed in parallel.