Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds
Nh
to the type parameter space. This seems to be needed for GPU performance, as indicated by the recently added performance benchmark script results:Where
us
anduss
are the "universal sizes" and "static universal sizes" respectively. The only difference is howNh
is stored (dynamically vs statically). It's clear from this benchmark that simply movingNh
into the type domain, (some) of our kernels can improve.Not all of them will because, for example, our
copyto!
kernel is using static ranges since it only accessessize(dest, 4)
:This is likely why the "flat" implementation:
was being outperformed.
Also, I noticed in our
single_field_solve_kernel!
can improve on this:Rather than using
Val
everywhere (which will result in uglier interfaces and many creations of runtime types on the CPU), I think it's worth just moving this into the type space, so that we get this performance improvement by default.Closes #11 for good.