-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Templated MD arrays #39
Conversation
ports-of-call/portable_arrays.hpp
Outdated
// set_value end-case | ||
template <std::size_t Ind, typename NX> | ||
constexpr void set_value(narr &ndat, NX value) { | ||
ndat[Ind] = value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is operator[]
of std::array
marked constexpr
? I thought this was something that preventing more widespread usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I intended to replace all these with appropriate PORTABLE_
s but missed a few
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you probably meant something else. We can use a raw array, if std::array
is messing with things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check that it runs on device, but I thought std::array
worked with --expt-relaxed-constexpr
but I'm not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mauneyc-LANL you may be able to use std::get<Ind>(arr) = value;
. But we really need to check and make sure it works on device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using --expt-relaxed-constexpr
this runs smoothly using volta-x86
partition, kokkos+cuda cuda_arch=70 +wrapper
, cmake .. -DPORTS_OF_CALL_BUILD_TESTING -DPORTABILITY_STRATEGY_KOKKOS=ON
ports-of-call/portable_arrays.hpp
Outdated
PORTABLE_FORCEINLINE_FUNCTION int GetSize() const { | ||
return nx1_ * nx2_ * nx3_ * nx4_ * nx5_ * nx6_; | ||
return std::accumulate(nxs_.cbegin(), nxs_.cend(), 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested this on GPUs? I know Brendan had an issue and we had to write a custom accumulate. One could still use binary operators provided by the STL, but the accumulate itself wouldn't run. Perhaps we did something wrong, but I want to make sure this is still GPU callable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also be concerned about std::accumulate
. Might be better to hardcode this one, or write a a recursive thing ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It "worked" but gave warnings (even with --expt-relaxed-constexpr
). I rewrote it as an explicit loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice cleanup. Before merging, I'd like to:
a) Get a sense of compile time differences
b) Know for sure it all works on device
ports-of-call/portable_arrays.hpp
Outdated
// maximum number of dimensions | ||
constexpr std::size_t MAXDIM = 6; | ||
// array type of dimensions/strides | ||
using narr = std::array<std::size_t, MAXDIM>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't love that this is lower-case. I'd find this type easier to interpret if it were, e.g.,
using narr = std::array<std::size_t, MAXDIM>; | |
using Narr_t = std::array<std::size_t, MAXDIM>; |
ports-of-call/portable_arrays.hpp
Outdated
// compute_index base case, i.e. fastest moving index | ||
template <std::size_t Ind> | ||
PORTABLE_INLINE_FUNCTION size_t compute_index(const narr &nd, | ||
const size_t index) { | ||
return index; | ||
} | ||
|
||
// compute_index general case, computing slower moving index strides | ||
template <std::size_t Ind, typename... Tail> | ||
PORTABLE_INLINE_FUNCTION size_t compute_index(const narr &nd, | ||
const size_t index, | ||
const Tail... tail) { | ||
return index * nd[Ind] + compute_index<Ind + 1>(nd, tail...); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice---assuming the array on device issues work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests complete on volta-x86
, though we need to put together some more tests for ports-of-call
|
||
target_compile_options(${POCLIB} | ||
INTERFACE | ||
$<${with_cxx}:$<${ps_kokkos}:--expt-relaxed-constexpr>>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the event of +kokkos~cuda, wouldn't this result in a compile error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it, but POC doesn't have a USE_CUDA
or similar like singularity-eos
does, so I need to probe which compiler is being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how I've done it in other projects:
get_target_property(kokkos_link_libs Kokkos::kokkoscore
INTERFACE_LINK_LIBRARIES)
string(REGEX MATCH "CUDA"
kokkos_has_cuda_libs "${kokkos_link_libs}")
...
if(kokkos_has_cuda_libs)
do cuda stuff...
endif()
I've made some changes to push a lot of the indexing/building functions into the object. I don't know that I love the result, which lead to the elimination of some constexpr initialization (not that it couldnt be done, just saved time to get a working code). One question to ask is how much time should
The options are:
|
I don't know the answer to this a priori. I'm not married to either option. Which leads to the cleaner code? Probably 2? And do we see a performance hit in a simple test case? |
Would be work considering the context that I've included a simple test with the PR. As originally there wasn't very much being tested directly with |
Yeah fair questions:
No, almost never.
Also no.
Like you said, 48 bytes is basically nothing, so I lean towards (2). |
Previously we recalculated at every access it seems. It seems like the number of adds will be the same and the number of multiplies saved will only be noticeable with high rank arrays. Performance data for ranks 2-6 would helpful in this case. |
The more I think about this the more I think we should recompute the strides. Even so, I'd like to see performance differences. Integer operations can be noticeable on GPUs so pre-computed may be more performant, but a larger object may use more registers. There are a lot of competing factors so data is the best. |
now that singularity-eos 1.9.0 release has been cut, I think we should up to c++17 and get this merged in prep for the next release. |
I am in favor of moving to C++17 at this time. However, I would prefer to decouple the shift to 17 from this MR. Let's just update the cmake builds to require 17 and then merge other MRs when they are ready. |
@jonahm-LANL @dholladay00 I've pushed some benchmarks using I can take off |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this---it's a significant cleanup from the original implementation, which was quite low-level. I want @dholladay00 's build system concerns addressed before merge. Also, just to confirm, this doesn't change the API forfunctionality at all right? It's just more general.
ports-of-call/portable_arrays.hpp
Outdated
return r; | ||
} | ||
PORTABLE_FORCEINLINE_FUNCTION | ||
decltype(auto) vp_prod() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the decltype
needed? Why not just an auto
return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, I'm returning a generic lambda and that's my habbit. But it's not necessary here (tho I don't think it's harmful either)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the decltype
adds anything and it's a bit harder to read but I won't push back too much on this.
ports-of-call/portable_arrays.hpp
Outdated
PORTABLE_FORCEINLINE_FUNCTION auto make_nxs_array(NX... nxs) { | ||
std::array<std::size_t, MAXDIM> a; | ||
std::array<std::size_t, N> t{static_cast<std::size_t>(nxs)...}; | ||
for (auto i = 0; i < N; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna insist this be int
or size_t
. IMO this is not a style thing but a self-documenting thing. It's an index it's not a double.
If C++17 gets in before this is merged, then I do want to do another draft - it will make things cleaner and clearer. |
@mauneyc-LANL let us know when this is ready for re-review |
@mauneyc-LANL what's the status of this? |
@jonahm-LANL Code-proper is good but I'm fleshing out the test, benchmarks and making sure they work on other machines. That could be separated into another PR. |
Sounds good. I think 1 test on github is enough to merge this, but I'd like at least one. We can do the more broad test set later down the line. |
Ready for review @jonahm-LANL @jhp-lanl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes below. Also one high-level question: Given that this infrastructure assumes continuous data and only supports contiguous slices, is there a reason to include both sizes and strides? I think we only need to carry around sizes.
As discussed, we can skip benchmarking tests for now. Though I am somewhat concerned about performance.
test/test_utilities.hpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 these are nice
#ifndef _PORTSOFCALL_UTILITY_INDEX_ALGO_HPP_ | ||
#define _PORTSOFCALL_UTILITY_INDEX_ALGO_HPP_ | ||
|
||
#include "../portability.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be best to use <ports-of-call/portability.hpp>
ports-of-call/portable_arrays.hpp
Outdated
class PortableMDArray { | ||
public: | ||
static constexpr int MAXDIM = 6; | ||
using this_type = PortableMDArray<T, D>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have some introspection on the maximum dimension of PortableMDArray
. Can we add a member field that reports D
?
PORTABLE_INLINE_FUNCTION size_type GetRank() const noexcept { return rank_; } | ||
template <typename... NXs> | ||
PORTABLE_INLINE_FUNCTION void Reshape(NXs... nxs) { | ||
assert(util::array_reduce(IArray<D>{nxs...}, 1, std::multiplies<size_type>{}) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this was an assert previously, but now we have PORTABLE_REQUIRE
. Let's use that instead as it will provide more useful/verbose output.
Ah regarding strides, never mind. I see @dholladay00 's point above. I forgot about it. |
@mauneyc-LANL I tried updating spiner to use your branch and spiner fails with a huge swath of test failures, including several segfaults. We need to resolve that before this branch can be merged. Branch |
I think the issue is |
OK I fixed the segfault but several tests still failing. @mauneyc-LANL these need to be fixed before merge. That said, the SAP integration effort is a higher priority. If you're ok with it let's punt on this for now. |
ports-of-call/portable_arrays.hpp
Outdated
if constexpr (sizeof...(Is) == 0) { | ||
idx = 0; | ||
} else if constexpr (sizeof...(Is) == 1) { | ||
idx = get_first(idxs...); | ||
} else { | ||
idx = util::fast_findex({static_cast<size_type>(idxs)...}, nxs_, strides_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fast_findex
already checks the size of the input indices. It's missing the empty case but it should be added there.
What is the difference between
fast_findex(A const &ijk, A const &dim, A const &stride) {
constexpr auto N = get_size(A{});
if constexpr (N == 1) {
return ijk[0];
}
//...
}
and get_first
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could instead modify fast index instead of adding get_first I think. Either should be fine.
Sure--I'll share later today. But you can also just try updating ports of call in spiner and running the tests yourself locally. |
Need to rethink and reconfigure some of this work. Will try to split out the ideas into more focused PRs. |
PR Summary
This is a prototype for "templated MD", that is the reduction of code of the type
To
The single-value dimension variables
int nxN_
have been replaced with a staticly sized containerstd::array<size_t, MAXDIM> nxs_
. Also addedPortableMDArray::rank_
member to avoid some reevaluations, though I'm not sure it's necessary.Currently WIP, so documentation is wanting and a few function names/namespaces are silly/inconsistent/gibberish. I know there can be some reluctance about this coding style, so I wanted to get a "at least compiles" version up and gauge interest in iterating on it. (Actually passes a few tests! but haven't tried
spiner
with this yet).It was working fine, why change it?
What are the downsides?
std::apply()
, nostd::tuple
, no fold expressions). C++14 can be "recursion heavy" in this regard, which may degrade performance.Any suggestions, comments, questions welcome! @jonahm-LANL @chadmeyer @dholladay00 @jhp-lanl @jdolence
PR Checklist