Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AlpakaCore/HistoContainer.h + HistoContainer_t, OneHistoContainer_t and OneToManyAssoc_t tests #165

Merged
merged 32 commits into from
Feb 2, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
faaede0
Start porting HistoContainer to AlpakaCore
ghugo83 Jan 19, 2021
adb66eb
Finish first-try porting of HistoContainer and its test
ghugo83 Jan 19, 2021
f548e27
HistoContainer and its test now compiles properly. Still segfaults is…
ghugo83 Jan 20, 2021
c4878d5
Tests now run smoothly and pass all assertions in serial and CUDA tes…
ghugo83 Jan 21, 2021
d8c9ad6
Seems to need to add alpaka::wait::wait(queue) around host function c…
ghugo83 Jan 21, 2021
5bacd9c
Can also place them inside the host function.
ghugo83 Jan 21, 2021
46805ad
Remove commented code
ghugo83 Jan 21, 2021
5b7b5ab
[alpaka] Also offload h->off initialization, as is done for Kokkos. h…
ghugo83 Jan 25, 2021
0cd5c71
minor cleaning
ghugo83 Jan 25, 2021
e8c66b6
[alpaka] Add OneToManyAssoc_t test
ghugo83 Jan 26, 2021
b465663
Change index variables names for strided access.
ghugo83 Jan 26, 2021
c6e30b8
Minor fixes
ghugo83 Jan 26, 2021
995af06
[alpaka] Add OneHistoContainer_t test
ghugo83 Jan 26, 2021
82ffd14
Fixes in OneHistoContainer_t
ghugo83 Jan 26, 2021
c6477b6
[alpaka] Important to initlize pointers properly, especially when loc…
ghugo83 Jan 26, 2021
94f94e6
Important fix in OneHistoContainer_t test: correct strided access, no…
ghugo83 Jan 27, 2021
2ff31f9
Also correct strided access in serial and TBB cases in HistoContainer…
ghugo83 Jan 27, 2021
8e9e776
Simplify Vec construction
ghugo83 Jan 27, 2021
8aaa1f3
clang-format
ghugo83 Jan 27, 2021
b6e9adc
Remove alpaka::wait::wait(queue); before host function end of scope. …
ghugo83 Jan 28, 2021
94105db
Remove using namespace ALPAKA_ACCELERATOR_NAMESPACE, and directly pre…
ghugo83 Jan 28, 2021
c08d205
OneHistoContainer: Add 1-kernel version
ghugo83 Jan 29, 2021
88545ea
Add cms::alpakatools::element_global_index_range_uncut to avoid havin…
ghugo83 Jan 29, 2021
ba641ee
Include endElementIdx within loop. NB: Will need to add a dedicated h…
ghugo83 Jan 29, 2021
5c9a8b9
Remove psws from HistoContainer class, never used. It corresponds to …
ghugo83 Jan 29, 2021
94176e9
Indices with += gridDimension stride: add endElementIdx within loop, …
ghugo83 Jan 29, 2021
cbb609f
clang-format
ghugo83 Jan 29, 2021
d759b1f
[alpaka] PrefixScan: use dynamic shared memory (as in CUDA version) i…
ghugo83 Feb 1, 2021
c1824b8
Renaming: element_global_index_range to compute non-truncated range, …
ghugo83 Feb 1, 2021
52ccf49
clang-format
ghugo83 Feb 1, 2021
788556d
minor cleaning
ghugo83 Feb 1, 2021
548b4b9
Forgot to remove device, now that memory allocation is removed
ghugo83 Feb 2, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 15 additions & 13 deletions src/alpaka/AlpakaCore/HistoContainer.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@
#include "AlpakaCore/alpakastdAlgorithm.h"
#include "AlpakaCore/prefixScan.h"

using namespace ALPAKA_ACCELERATOR_NAMESPACE;

namespace cms {
namespace alpakatools {

Expand Down Expand Up @@ -91,9 +89,10 @@ namespace cms {
};

template <typename Histo>
ALPAKA_FN_HOST ALPAKA_FN_INLINE __attribute__((always_inline)) void launchFinalize(Histo *__restrict__ h,
const DevAcc1 &device,
Queue &queue) {
ALPAKA_FN_HOST ALPAKA_FN_INLINE __attribute__((always_inline)) void launchFinalize(
Histo *__restrict__ h,
const ALPAKA_ACCELERATOR_NAMESPACE::DevAcc1 &device,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing the memory allocation, the device parameter is not needed anymore

Suggested change
const ALPAKA_ACCELERATOR_NAMESPACE::DevAcc1 &device,

ALPAKA_ACCELERATOR_NAMESPACE::Queue &queue) {
uint32_t *poff = (uint32_t *)((char *)(h) + offsetof(Histo, off));

const int num_items = Histo::totbins();
Expand All @@ -108,19 +107,19 @@ namespace cms {

alpaka::queue::enqueue(
queue,
alpaka::kernel::createTaskKernel<Acc1>(
alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(
WorkDiv1{Vec1::all(1u), Vec1::all(1u), Vec1::all(1u)}, storePrefixScanWorkingSpace(), h, nblocks));

const WorkDiv1 &workDiv = cms::alpakatools::make_workdiv(blocksPerGrid, threadsPerBlockOrElementsPerThread);
alpaka::queue::enqueue(queue,
alpaka::kernel::createTaskKernel<Acc1>(
alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(
workDiv, multiBlockPrefixScanFirstStep<uint32_t>(), poff, poff, psum_d, num_items));

const WorkDiv1 &workDivWith1Block =
cms::alpakatools::make_workdiv(Vec1::all(1), threadsPerBlockOrElementsPerThread);
alpaka::queue::enqueue(
queue,
alpaka::kernel::createTaskKernel<Acc1>(
alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(
workDivWith1Block, multiBlockPrefixScanSecondStep<uint32_t>(), poff, poff, psum_d, num_items, nblocks));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this comment is not really in the scope of this PR)

I'm curious why the multiBlockPrefixScan had to be split into two. I see now that it was done already in #31. From the code I'm guessing that the point is that

  • in CUDA code there are two parts separated by a synchronization point (isLastBlockDone): first part is run by all blocks, and the second part is run only by the "last block"
  • for alpaka the two parts were split in two kernels

I'm mostly wondering if this split was a design choice, or if there were any constraints that prevented the CUDA-style implementation. Let me tag @felicepantaleo in case he would remember.

}

Expand All @@ -132,21 +131,24 @@ namespace cms {
uint32_t const *__restrict__ offsets,
uint32_t totSize,
unsigned int nthreads,
const DevAcc1 &device,
Queue &queue) {
const ALPAKA_ACCELERATOR_NAMESPACE::DevAcc1 &device,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device parameter can be removed from here as well.

Suggested change
const ALPAKA_ACCELERATOR_NAMESPACE::DevAcc1 &device,

ALPAKA_ACCELERATOR_NAMESPACE::Queue &queue) {
const unsigned int nblocks = (totSize + nthreads - 1) / nthreads;
const Vec1 blocksPerGrid(nblocks);
const Vec1 threadsPerBlockOrElementsPerThread(nthreads);
const WorkDiv1 &workDiv = cms::alpakatools::make_workdiv(blocksPerGrid, threadsPerBlockOrElementsPerThread);

alpaka::queue::enqueue(queue, alpaka::kernel::createTaskKernel<Acc1>(workDiv, launchZero(), h));
alpaka::queue::enqueue(
queue, alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(workDiv, launchZero(), h));

alpaka::queue::enqueue(queue,
alpaka::kernel::createTaskKernel<Acc1>(workDiv, countFromVector(), h, nh, v, offsets));
alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(
workDiv, countFromVector(), h, nh, v, offsets));
launchFinalize(h, device, queue);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following removal of device.

Suggested change
launchFinalize(h, device, queue);
launchFinalize(h, queue);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha yes true, this argument is not necessary anymore now, thanks. I just removed it.


alpaka::queue::enqueue(queue,
alpaka::kernel::createTaskKernel<Acc1>(workDiv, fillFromVector(), h, nh, v, offsets));
alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(
workDiv, fillFromVector(), h, nh, v, offsets));
}

struct finalizeBulk {
Expand Down
11 changes: 6 additions & 5 deletions src/alpaka/test/alpaka/HistoContainer_t.cc
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
#include "AlpakaCore/alpakaWorkDivHelper.h"
#include "AlpakaCore/HistoContainer.h"

using namespace ALPAKA_ACCELERATOR_NAMESPACE;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but actually here (and in other unit test files) the using namespace ... is ok. It's just in headers where it causes problems (although it does make the namespace of the various types more clear in source files as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but at this stage, found it more consistent to remove it everywhere, rather than having a mix of 'using namespace' and ALPAKA_ACCELERATOR_NAMESPACE::

template <typename T>
void go(const DevHost& host, const DevAcc1& device, Queue& queue) {
void go(const DevHost& host,
const ALPAKA_ACCELERATOR_NAMESPACE::DevAcc1& device,
ALPAKA_ACCELERATOR_NAMESPACE::Queue& queue) {
std::mt19937 eng;
std::uniform_int_distribution<T> rgen(std::numeric_limits<T>::min(), std::numeric_limits<T>::max());

Expand Down Expand Up @@ -162,8 +162,9 @@ void go(const DevHost& host, const DevAcc1& device, Queue& queue) {

int main() {
const DevHost host(alpaka::pltf::getDevByIdx<PltfHost>(0u));
const DevAcc1 device(alpaka::pltf::getDevByIdx<PltfAcc1>(0u));
Queue queue(device);
const ALPAKA_ACCELERATOR_NAMESPACE::DevAcc1 device(
alpaka::pltf::getDevByIdx<ALPAKA_ACCELERATOR_NAMESPACE::PltfAcc1>(0u));
ALPAKA_ACCELERATOR_NAMESPACE::Queue queue(device);

go<int16_t>(host, device, queue);
go<int8_t>(host, device, queue);
Expand Down
53 changes: 30 additions & 23 deletions src/alpaka/test/alpaka/OneHistoContainer_t.cc
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,9 @@ struct forEachInWindow {
};

template <typename T, int NBINS = 128, int S = 8 * sizeof(T), int DELTA = 1000>
void go(const DevHost &host, const DevAcc1 &device, Queue &queue) {
void go(const DevHost &host,
const ALPAKA_ACCELERATOR_NAMESPACE::DevAcc1 &device,
ALPAKA_ACCELERATOR_NAMESPACE::Queue &queue) {
std::mt19937 eng;

int rmin = std::numeric_limits<T>::min();
Expand Down Expand Up @@ -202,35 +204,38 @@ void go(const DevHost &host, const DevAcc1 &device, Queue &queue) {
const Vec1 blocksPerGrid(1u);
const WorkDiv1 &workDiv = cms::alpakatools::make_workdiv(blocksPerGrid, threadsPerBlockOrElementsPerThread);

alpaka::queue::enqueue(
queue, alpaka::kernel::createTaskKernel<Acc1>(workDiv, setZero(), alpaka::mem::view::getPtrNative(hist_dbuf)));
alpaka::queue::enqueue(queue,
alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(
workDiv, setZero(), alpaka::mem::view::getPtrNative(hist_dbuf)));

alpaka::queue::enqueue(
queue,
alpaka::kernel::createTaskKernel<Acc1>(workDiv, setZeroBins(), alpaka::mem::view::getPtrNative(hist_dbuf)));
alpaka::queue::enqueue(queue,
alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(
workDiv, setZeroBins(), alpaka::mem::view::getPtrNative(hist_dbuf)));

alpaka::queue::enqueue(
queue,
alpaka::kernel::createTaskKernel<Acc1>(
alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(
workDiv, count(), alpaka::mem::view::getPtrNative(hist_dbuf), alpaka::mem::view::getPtrNative(v_dbuf), N));

alpaka::mem::view::copy(queue, hist_hbuf, hist_dbuf, 1u);
alpaka::wait::wait(queue);
assert(0 == hist->size());

alpaka::queue::enqueue(
queue, alpaka::kernel::createTaskKernel<Acc1>(workDiv, finalize(), alpaka::mem::view::getPtrNative(hist_dbuf)));
alpaka::queue::enqueue(queue,
alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(
workDiv, finalize(), alpaka::mem::view::getPtrNative(hist_dbuf)));

alpaka::mem::view::copy(queue, hist_hbuf, hist_dbuf, 1u);
alpaka::wait::wait(queue);
assert(N == hist->size());

alpaka::queue::enqueue(
queue, alpaka::kernel::createTaskKernel<Acc1>(workDiv, verify(), alpaka::mem::view::getPtrNative(hist_dbuf)));
alpaka::queue::enqueue(queue,
alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(
workDiv, verify(), alpaka::mem::view::getPtrNative(hist_dbuf)));

alpaka::queue::enqueue(
queue,
alpaka::kernel::createTaskKernel<Acc1>(
alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(
workDiv, fill(), alpaka::mem::view::getPtrNative(hist_dbuf), alpaka::mem::view::getPtrNative(v_dbuf), N));

alpaka::mem::view::copy(queue, hist_hbuf, hist_dbuf, 1u);
Expand All @@ -240,26 +245,28 @@ void go(const DevHost &host, const DevAcc1 &device, Queue &queue) {

alpaka::queue::enqueue(
queue,
alpaka::kernel::createTaskKernel<Acc1>(
alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(
workDiv, bin(), alpaka::mem::view::getPtrNative(hist_dbuf), alpaka::mem::view::getPtrNative(v_dbuf), N));

alpaka::queue::enqueue(queue,
alpaka::kernel::createTaskKernel<Acc1>(workDiv,
forEachInWindow(),
alpaka::mem::view::getPtrNative(hist_dbuf),
alpaka::mem::view::getPtrNative(v_dbuf),
N,
NBINS,
DELTA));
alpaka::queue::enqueue(
queue,
alpaka::kernel::createTaskKernel<ALPAKA_ACCELERATOR_NAMESPACE::Acc1>(workDiv,
forEachInWindow(),
alpaka::mem::view::getPtrNative(hist_dbuf),
alpaka::mem::view::getPtrNative(v_dbuf),
N,
NBINS,
DELTA));

alpaka::wait::wait(queue);
}
}

int main() {
const DevHost host(alpaka::pltf::getDevByIdx<PltfHost>(0u));
const DevAcc1 device(alpaka::pltf::getDevByIdx<PltfAcc1>(0u));
Queue queue(device);
const ALPAKA_ACCELERATOR_NAMESPACE::DevAcc1 device(
alpaka::pltf::getDevByIdx<ALPAKA_ACCELERATOR_NAMESPACE::PltfAcc1>(0u));
ALPAKA_ACCELERATOR_NAMESPACE::Queue queue(device);

go<int16_t>(host, device, queue);
go<uint8_t, 128, 8, 4>(host, device, queue);
Expand Down
Loading