Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added 3 64-bit MPI_Gatherv implementations #2405

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions source/adios2/helper/adiosComm.h
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,23 @@ class Comm
const size_t *recvcounts, const size_t *displs, int root,
const std::string &hint = std::string()) const;

template <typename TSend, typename TRecv>
void Gatherv64(const TSend *sendbuf, size_t sendcount, TRecv *recvbuf,
const size_t *recvcounts, const size_t *displs, int root,
const std::string &hint = std::string()) const;

template <typename TSend, typename TRecv>
void Gatherv64OneSidedPush(const TSend *sendbuf, size_t sendcount,
TRecv *recvbuf, const size_t *recvcounts,
const size_t *displs, int root,
const std::string &hint = std::string()) const;

template <typename TSend, typename TRecv>
void Gatherv64OneSidedPull(const TSend *sendbuf, size_t sendcount,
TRecv *recvbuf, const size_t *recvcounts,
const size_t *displs, int root,
const std::string &hint = std::string()) const;

template <typename T>
void Reduce(const T *sendbuf, T *recvbuf, size_t count, Op op, int root,
const std::string &hint = std::string()) const;
Expand Down Expand Up @@ -400,6 +417,26 @@ class CommImpl
Datatype recvtype, int root,
const std::string &hint) const = 0;

virtual void Gatherv64(const void *sendbuf, size_t sendcount,
Datatype sendtype, void *recvbuf,
const size_t *recvcounts, const size_t *displs,
Datatype recvtype, int root,
const std::string &hint) const = 0;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper::Comm abstraction's methods use templating so that callers don't need to pass sendtype or recvtype explicitly. If we add Gatherv64 that should be done here too instead of using void*.

However, we shouldn't need to add Gatherv64. The sendcount argument in plain Gatherv is already a size_t. Its existing implementation casts that to int, which is the cause of the existing limitation. Instead of adding Gatherv64, you just need to update the implementation of CommImplMPI::Gatherv to use the block approach. IIRC some of the other methods in CommImplMPI already do that.

Copy link
Member Author

@JasonRuonanWang JasonRuonanWang Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we shouldn't need to add Gatherv64. The sendcount argument in plain Gatherv is already a size_t. Its existing implementation casts that to int, which is the cause of the existing limitation. Instead of adding Gatherv64, you just need to update the implementation of CommImplMPI::Gatherv to use the block approach. IIRC some of the other methods in CommImplMPI already do that.

The reason to have some extra Gatherv64 implementations rather than modifying the existing Gatherv is because for use cases which does not exceed 2GB, the original Gatherv is still more optimal. If we completely remove the original Gatherv, it means that these use cases will get worse performance.

Due to the fundamental mechanism of MPI, it's also undesirable to determine 32-bit or 64-bit at runtime. Because in some cases, it's possible that only a few of the senders actually exceed the 32-bit limitation. If we determine this locally, it will cause unmatched MPI operations across all ranks. If we determine this globally, it will introduce extra MPI global operations than what is necessary to perform the Gatherv itself.

So currently the most optimal way I can think of is to keep these 32-bit and 64-bit implementations, and let specific engines to determine which to use based on it's scopes, use cases, assumptions, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The helper::Comm abstraction's methods use templating so that callers don't need to pass sendtype or recvtype explicitly. If we add Gatherv64 that should be done here too instead of using void*.

The missing templated functions were added.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing Gatherv takes a size_t count and should not fail when given a size larger than 32-bits. Therefore Gatherv's implementation should be fixed to support 64-bit sizes somehow. If a non-MPI backend is added in the future it might not need any special help or chunking.

To retain efficiency for callers that know their sizes fit in 32 bits we could offer a Gatherv32 that takes a uint32_t as a size (or if MPI is actually limited by a signed int then maybe even Gatherv31 taking a int32_t).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing Gatherv takes a size_t count and should not fail when given a size larger than 32-bits. Therefore Gatherv's implementation should be fixed to support 64-bit sizes somehow. If a non-MPI backend is added in the future it might not need any special help or chunking.

To retain efficiency for callers that know their sizes fit in 32 bits we could offer a Gatherv32 that takes a uint32_t as a size (or if MPI is actually limited by a signed int then maybe even Gatherv31 taking a int32_t).

The original MPI_Gatherv is for sure limited by a signed int, which is exactly the reason why I am doing the whole lot of this. Not only MPI_Gatherv, but also all MPI functions that take int as the size, are limited by a singed int. If you think in that way, then you will need to re-implement EVERY SINGLE MPI function in the Comm class to support 64-bit. Is that what you think we are supported to do? Even if the answer is yes, at least it's far beyond the scope of this PR...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess that's a good argument, mismatched calls obviously is obviously a bad outcome. Well, so the one thing I'd definitely still suggest is to make the version that uses MPI_Gatherv throw an error if any arguments are out of range. But it does look like it's not easily possible to have a single interface and decide at runtime.

The iffy thing with this is that errors will only occur in rare circumstances, so will be difficult to test for. If one has GatherV and GatherV64, then the question is, can you ever use GatherV safely, ie. guarantee at compile time that the args will never be > 32bit?

Not sure what the best solution is. This would be all much easier if they could have settled on a 64-bit solution within the MPI standard...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@germasch If an engine blindly uses Gatherv, then it's for sure unsafe. But the engine can determine this in several ways. It can do an MPI collective operation before calling Gatherv and then call Gatherv or Gatherv64 accordingly if it does not care about the overhead. If it does care, however, it can ask users to pass a static engine parameter, to specify whether > 32 bit case is ever supposed to happen. For most of the applications, this should be a statically known knowledge.

The reason why I am thinking it in such a uncommon way, is also because the actual applications that need 64 bit are very rare, and they probably only need Gatherv for aggregating metadata, but don't need other MPI functions being operated at 64 bit. If just for one or two this kind of apps, we re-implement the whole MPI software stack, it does not sound like a smart thing to do. If all other functions are actually 32-bit but with a 64-bit wrapper, why do we make Gatherv such an exception by have Gatherv32 and Gatherv? In the current status of the Comm class, having a Gatherv and a Gatherv64 is a very natural thing, unless all other 32-bit MPI functions with 64-bit wrappers are already addressed correctly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will need to re-implement EVERY SINGLE MPI function in the Comm class to support 64-bit. Is that what you think we are supported to do?

Yes, eventually. That was my intention when introducing the Comm abstraction. IIRC some of the methods have the chunking already.

if I follow your suggestions, we will immediately see a performance drop for all applications.

My suggestion is to re-implement the existing Gatherv using your 64-bit implementation, and introduce a new Gatherv31 method that has the int-based limitation builtin to its interface. This change is inside our internal API and does not need to affect applications. Just update all our internal call sites (there are only a couple of them) to switch from Gatherv to Gatherv31.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradking Sounds reasonable. Do you want me to do it in this PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest opening a new PR to rename the existing Gatherv to Gatherv31, change its signature, and update all its call sites in one step. That way we can be sure no call sites accidentally use the 64-bit variant. After that is merged, rebase this PR to restore the Gatherv with its original signature and use the chunking implementation.

virtual void Gatherv64OneSidedPush(const void *sendbuf, size_t sendcount,
Datatype sendtype, void *recvbuf,
const size_t *recvcounts,
const size_t *displs, Datatype recvtype,
int root,
const std::string &hint) const = 0;

virtual void Gatherv64OneSidedPull(const void *sendbuf, size_t sendcount,
Datatype sendtype, void *recvbuf,
const size_t *recvcounts,
const size_t *displs, Datatype recvtype,
int root,
const std::string &hint) const = 0;

virtual void Reduce(const void *sendbuf, void *recvbuf, size_t count,
Datatype datatype, Comm::Op op, int root,
const std::string &hint) const = 0;
Expand Down
32 changes: 32 additions & 0 deletions source/adios2/helper/adiosComm.inl
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,38 @@ void Comm::Gatherv(const TSend *sendbuf, size_t sendcount, TRecv *recvbuf,
CommImpl::GetDatatype<TRecv>(), root, hint);
}

template <typename TSend, typename TRecv>
void Comm::Gatherv64(const TSend *sendbuf, size_t sendcount, TRecv *recvbuf,
const size_t *recvcounts, const size_t *displs, int root,
const std::string &hint) const
{
return m_Impl->Gatherv64(sendbuf, sendcount, CommImpl::GetDatatype<TSend>(),
recvbuf, recvcounts, displs,
CommImpl::GetDatatype<TRecv>(), root, hint);
}

template <typename TSend, typename TRecv>
void Comm::Gatherv64OneSidedPush(const TSend *sendbuf, size_t sendcount,
TRecv *recvbuf, const size_t *recvcounts,
const size_t *displs, int root,
const std::string &hint) const
{
return m_Impl->Gatherv64OneSidedPush(
sendbuf, sendcount, CommImpl::GetDatatype<TSend>(), recvbuf, recvcounts,
displs, CommImpl::GetDatatype<TRecv>(), root, hint);
}

template <typename TSend, typename TRecv>
void Comm::Gatherv64OneSidedPull(const TSend *sendbuf, size_t sendcount,
TRecv *recvbuf, const size_t *recvcounts,
const size_t *displs, int root,
const std::string &hint) const
{
return m_Impl->Gatherv64OneSidedPush(
sendbuf, sendcount, CommImpl::GetDatatype<TSend>(), recvbuf, recvcounts,
displs, CommImpl::GetDatatype<TRecv>(), root, hint);
}

template <typename T>
void Comm::Reduce(const T *sendbuf, T *recvbuf, size_t count, Op op, int root,
const std::string &hint) const
Expand Down
64 changes: 64 additions & 0 deletions source/adios2/helper/adiosCommDummy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,23 @@ class CommImplDummy : public CommImpl
Datatype recvtype, int root,
const std::string &hint) const override;

void Gatherv64(const void *sendbuf, size_t sendcount, Datatype sendtype,
void *recvbuf, const size_t *recvcounts,
const size_t *displs, Datatype recvtype, int root,
const std::string &hint) const override;

void Gatherv64OneSidedPush(const void *sendbuf, size_t sendcount,
Datatype sendtype, void *recvbuf,
const size_t *recvcounts, const size_t *displs,
Datatype recvtype, int root,
const std::string &hint) const override;

void Gatherv64OneSidedPull(const void *sendbuf, size_t sendcount,
Datatype sendtype, void *recvbuf,
const size_t *recvcounts, const size_t *displs,
Datatype recvtype, int root,
const std::string &hint) const override;

void Reduce(const void *sendbuf, void *recvbuf, size_t count,
Datatype datatype, Comm::Op op, int root,
const std::string &hint) const override;
Expand Down Expand Up @@ -211,6 +228,53 @@ void CommImplDummy::Gatherv(const void *sendbuf, size_t sendcount,
recvtype, root, hint);
}

void CommImplDummy::Gatherv64(const void *sendbuf, size_t sendcount,
Datatype sendtype, void *recvbuf,
const size_t *recvcounts, const size_t *displs,
Datatype recvtype, int root,
const std::string &hint) const
{
const size_t recvcount = recvcounts[0];
if (recvcount != sendcount)
{
return CommDummyError("send and recv counts differ");
}
CommImplDummy::Gather(sendbuf, sendcount, sendtype, recvbuf, recvcount,
recvtype, root, hint);
}

void CommImplDummy::Gatherv64OneSidedPush(const void *sendbuf, size_t sendcount,
Datatype sendtype, void *recvbuf,
const size_t *recvcounts,
const size_t *displs,
Datatype recvtype, int root,
const std::string &hint) const
{
const size_t recvcount = recvcounts[0];
if (recvcount != sendcount)
{
return CommDummyError("send and recv counts differ");
}
CommImplDummy::Gather(sendbuf, sendcount, sendtype, recvbuf, recvcount,
recvtype, root, hint);
}

void CommImplDummy::Gatherv64OneSidedPull(const void *sendbuf, size_t sendcount,
Datatype sendtype, void *recvbuf,
const size_t *recvcounts,
const size_t *displs,
Datatype recvtype, int root,
const std::string &hint) const
{
const size_t recvcount = recvcounts[0];
if (recvcount != sendcount)
{
return CommDummyError("send and recv counts differ");
}
CommImplDummy::Gather(sendbuf, sendcount, sendtype, recvbuf, recvcount,
recvtype, root, hint);
}

void CommImplDummy::Reduce(const void *sendbuf, void *recvbuf, size_t count,
Datatype datatype, Comm::Op, int,
const std::string &) const
Expand Down
206 changes: 206 additions & 0 deletions source/adios2/helper/adiosCommMPI.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,23 @@ class CommImplMPI : public CommImpl
Datatype recvtype, int root,
const std::string &hint) const override;

void Gatherv64(const void *sendbuf, size_t sendcount, Datatype sendtype,
void *recvbuf, const size_t *recvcounts,
const size_t *displs, Datatype recvtype, int root,
const std::string &hint) const override;

void Gatherv64OneSidedPull(const void *sendbuf, size_t sendcount,
Datatype sendtype, void *recvbuf,
const size_t *recvcounts, const size_t *displs,
Datatype recvtype, int root,
const std::string &hint) const override;

void Gatherv64OneSidedPush(const void *sendbuf, size_t sendcount,
Datatype sendtype, void *recvbuf,
const size_t *recvcounts, const size_t *displs,
Datatype recvtype, int root,
const std::string &hint) const override;

void Reduce(const void *sendbuf, void *recvbuf, size_t count,
Datatype datatype, Comm::Op op, int root,
const std::string &hint) const override;
Expand Down Expand Up @@ -344,6 +361,195 @@ void CommImplMPI::Gatherv(const void *sendbuf, size_t sendcount,
hint);
}

void CommImplMPI::Gatherv64(const void *sendbuf, size_t sendcount,
Datatype sendtype, void *recvbuf,
const size_t *recvcounts, const size_t *displs,
Datatype recvtype, int root,
const std::string &hint) const
{

const int chunksize = std::numeric_limits<int>::max();

int mpiSize;
int mpiRank;
MPI_Comm_size(m_MPIComm, &mpiSize);
MPI_Comm_rank(m_MPIComm, &mpiRank);

int recvTypeSize;
int sendTypeSize;

MPI_Type_size(ToMPI(recvtype), &recvTypeSize);
MPI_Type_size(ToMPI(sendtype), &sendTypeSize);

std::vector<MPI_Request> requests;
if (mpiRank == root)
{
for (int i = 0; i < mpiSize; ++i)
{
size_t recvcount = recvcounts[i];
while (recvcount > 0)
{
requests.emplace_back();
if (recvcount > chunksize)
{
MPI_Irecv(reinterpret_cast<char *>(recvbuf) +
(displs[i] + recvcounts[i] - recvcount) *
recvTypeSize,
chunksize, ToMPI(recvtype), i, 0, m_MPIComm,
&requests.back());
recvcount -= chunksize;
}
else
{
MPI_Irecv(reinterpret_cast<char *>(recvbuf) +
(displs[i] + recvcounts[i] - recvcount) *
recvTypeSize,
static_cast<int>(recvcount), ToMPI(recvtype), i,
0, m_MPIComm, &requests.back());
recvcount = 0;
}
}
}
}

size_t sendcountvar = sendcount;

while (sendcountvar > 0)
{
requests.emplace_back();
if (sendcountvar > chunksize)
{
MPI_Isend(reinterpret_cast<const char *>(sendbuf) +
(sendcount - sendcountvar) * sendTypeSize,
chunksize, ToMPI(sendtype), root, 0, m_MPIComm,
&requests.back());
sendcountvar -= chunksize;
}
else
{
MPI_Isend(reinterpret_cast<const char *>(sendbuf) +
(sendcount - sendcountvar) * sendTypeSize,
static_cast<int>(sendcountvar), ToMPI(sendtype), root, 0,
m_MPIComm, &requests.back());
sendcountvar = 0;
}
}

MPI_Waitall(static_cast<int>(requests.size()), requests.data(),
MPI_STATUSES_IGNORE);
}

void CommImplMPI::Gatherv64OneSidedPush(const void *sendbuf, size_t sendcount,
Datatype sendtype, void *recvbuf,
const size_t *recvcounts,
const size_t *displs, Datatype recvtype,
int root, const std::string &hint) const
{
const int chunksize = std::numeric_limits<int>::max();

int mpiSize;
int mpiRank;
MPI_Comm_size(m_MPIComm, &mpiSize);
MPI_Comm_rank(m_MPIComm, &mpiRank);

int recvTypeSize;
int sendTypeSize;

MPI_Type_size(ToMPI(recvtype), &recvTypeSize);
MPI_Type_size(ToMPI(sendtype), &sendTypeSize);

size_t recvsize = displs[mpiSize - 1] + recvcounts[mpiSize - 1];

MPI_Win win;
MPI_Win_create(recvbuf, recvsize * recvTypeSize, recvTypeSize,
MPI_INFO_NULL, m_MPIComm, &win);

size_t sendcountvar = sendcount;

while (sendcountvar > 0)
{
if (sendcountvar > chunksize)
{
MPI_Put(reinterpret_cast<const char *>(sendbuf) +
(sendcount - sendcountvar) * sendTypeSize,
chunksize, ToMPI(sendtype), root,
displs[mpiRank] + sendcount - sendcountvar, chunksize,
ToMPI(sendtype), win);
sendcountvar -= chunksize;
}
else
{
MPI_Put(reinterpret_cast<const char *>(sendbuf) +
(sendcount - sendcountvar) * sendTypeSize,
static_cast<int>(sendcountvar), ToMPI(sendtype), root,
static_cast<int>(displs[mpiRank]) + sendcount -
sendcountvar,
static_cast<int>(sendcountvar), ToMPI(sendtype), win);
sendcountvar = 0;
}
}

MPI_Win_free(&win);
}

void CommImplMPI::Gatherv64OneSidedPull(const void *sendbuf, size_t sendcount,
Datatype sendtype, void *recvbuf,
const size_t *recvcounts,
const size_t *displs, Datatype recvtype,
int root, const std::string &hint) const
{

const int chunksize = std::numeric_limits<int>::max();

int mpiSize;
int mpiRank;
MPI_Comm_size(m_MPIComm, &mpiSize);
MPI_Comm_rank(m_MPIComm, &mpiRank);

int recvTypeSize;
int sendTypeSize;

MPI_Type_size(ToMPI(recvtype), &recvTypeSize);
MPI_Type_size(ToMPI(sendtype), &sendTypeSize);

MPI_Win win;
MPI_Win_create(const_cast<void *>(sendbuf), sendcount * sendTypeSize,
sendTypeSize, MPI_INFO_NULL, m_MPIComm, &win);

if (mpiRank == root)
{
for (int i = 0; i < mpiSize; ++i)
{
size_t recvcount = recvcounts[i];
while (recvcount > 0)
{
if (recvcount > chunksize)
{
MPI_Get(reinterpret_cast<char *>(recvbuf) +
(displs[i] + recvcounts[i] - recvcount) *
recvTypeSize,
chunksize, ToMPI(recvtype), i,
recvcounts[i] - recvcount, chunksize,
ToMPI(recvtype), win);
recvcount -= chunksize;
}
else
{
MPI_Get(reinterpret_cast<char *>(recvbuf) +
(displs[i] + recvcounts[i] - recvcount) *
recvTypeSize,
static_cast<int>(recvcount), ToMPI(recvtype), i,
recvcounts[i] - recvcount,
static_cast<int>(recvcount), ToMPI(recvtype), win);
recvcount = 0;
}
}
}
}

MPI_Win_free(&win);
}

void CommImplMPI::Reduce(const void *sendbuf, void *recvbuf, size_t count,
Datatype datatype, Comm::Op op, int root,
const std::string &hint) const
Expand Down