-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory abstraction tags #1362
Comments
Would this tie in with our efforts on accessors? #1249 I think those have much of the functionality you ask for. |
I'm note sure if this planned/implemented in accessors. We have to ask @bernhardmgruber |
What you probably want to have is a hierarchy of memory spaces. However it is not easy to tell where a specific memory allocation will ever be accessible. |
@bernhardmgruber any ideas on this? |
IMO an accessor specifies what kind of operations are allowed on a memory resource and how loads and stores are handled. Accessors can be produced on top of buffers. Whether they carry on the information where the backing memory resource resides is an open questions, but I would prefer if they did not. So you can have e.g. a read-only accessor independenly of whether you access a GPU or CPU buffer. Alpaka buffers are tied to something like a memory space (Kokkos terminology) inside which they can be validly accessed. These memory spaces are related to acceleration technologies, but not the same. Having such memory spaces is complicated however, as @BenjaminW3 said, because there are increasingly wild ways on how these span devices. Some time ago we had GPU only buffers, now we can share buffers between CPU and GPU and have memory pages migrate on the fly. I believe there might be more such changes in the future so I would not like to see alpaka paint itself into a corner by choosing a memory space system. I also think you might be approaching the problem from the wrong side. It is more important on what kind of accelerator a computation runs than where a buffer resides. So taking your example: Vector<int, 5> v1;
Vector<int, 5> v2;
auto unevaluatedResult = v1 + v2; // build expression tree
Vector result = eval(unevaluatedResult , acc); // evaluate expression tree on a specific accelerator
...
template<>
Vector operator+<alpaka::AccCPU>(Vector a, Vector b){
// use eigen3 for highly optimized vector addition on CPU
}
template<>
Vector operator+<alpaka::AccOmp>(Vector a, Vector b){
// use eigen3 for highly optimized vector addition on CPU
}
template<>
Vector operator+<alpaka::CudaRt>(Vector a, Vector b){
// use cuBlas for highly optimized vector addition on CUDA GPU
}
template<>
Vector operator+<alpaka::HipRt>(Vector a, Vector b){
// use hip BLAS if such a thing exists
} You need to bring in the |
@SimeonEhrig what needs to be done here? Do you want to have tags similar to the recently added tags for accelerators from #1804? |
Yes. But I think the memory tags are more dedicated to check your code. I want to have a trait something like this: For CUDA without managed memory, this a 1:1 relation (CudaRT to CudaMem). For CPU, it is many to many. For example if you create memory with the serial backend, you can also access it with the OpenMP 2 blocks backend without memory copy. |
What about a CUDA GPU vs a pinned host memory buffer ? |
At the moment, I develop a prototype for a lazy evaluated linear algebra library basing on alpaka and vikunja: https://github.com/SimeonEhrig/lazyVikunjaVector
One of the design decision is, that I use mathematical objects like
vector
andmatrix
with implemented operators to do the mathematics.The vector contains also the data. Therefore, the vector needs to know on which device it is located, e.g.
CPU 0
orCUDA GPU 1
. This information are used for memory allocation and preventing operations between different device, e.g add a vector located onCPU 0
with a vector located onCUDA GPU 1
(it's just a design decision to keep the library "simple").At the moment I use
alpaka::AccType<TDim, TIdx>
and the device id, to decide the memory owner of the mathematical object (e.g. a vector) but @psychocoderHPC means that the memory is not bounded to the parallelization strategy. For example memory allocated withalpaka::AccCpuSerial<Dim, std::size_t>
can be also used in a kernel, which is executed with parallelization strategyalpaka::AccCpuOmp2Blocks<Dim, std::size_t>
, because both uses the same allocator.My question is, there are some template tags, like
alpaka::AccCpuSerial
andalpaka::AccCpuOmp2Blocks
for the memory, e.g.alpaka::BufCPU
oralpaka::BufCUDAGPU
and if yes, how I can used it for memory allocation?This tags would be also pretty helpful for specialization of certain function, e.g.:
The text was updated successfully, but these errors were encountered: