-
Notifications
You must be signed in to change notification settings - Fork 280
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How To Use MPIX_Stream
?
#6517
Comments
cc @jczhang07 |
1.1 MPIX_Stream lives before and after the stream communicators that are using the stream. 2 Each MPIX Stream is associated with distinct internal network endpoint to isolate communication resources. The network endpoint need be initialized for connections. It is the establishing connections part that prevents the dynamic allocation |
To be precise, the
It's not so much necessary as it is a convenience :). Consider the following use-case: PetscErrorCode VecDoSomethingCollective(Vec v, ...)
{
MPI_Comm comm;
MPI_Comm scomm;
MPIX_Stream mpi_stream;
// every PETSc object (e.g. Vec) holds a corresponding communicator
PetscObjectGetComm(v, &comm);
PetscGetMPIXStream(cuda_stream, &mpi_stream);
MPIX_Stream_comm_create(comm, mpi_stream, &scomm);
MPIX_Allreduce_enqueue(..., scomm);
// Performant code probably should now cache scomm somewhere, likely needs reference counting to ensure
// it is properly managed. This adds more complexity
} We already have a PetscErrorCode VecNorm_CUDA(Vec v, ..., PetscScalar *norm)
{
MPI_Comm comm;
MPIX_Stream mpi_stream;
PetscObjectGetComm(v, &comm);
PetscGetMPIXStream(cuda_stream, &mpi_stream);
MPIX_Comm_set_stream(comm, mpi_stream);
MPIX_Allreduce_enqueue(..., comm);
// no need to deal with extra comm object!
}
Can you explain this limitation more directly? I am not at all familiar with networking. |
Also, is there a invalid struct PetscMPIXStream
{
bool init{};
MPIX_Stream stream{};
};
void foo(PetscMPIXStream &strm)
{
if (!strm.init) {
MPIX_Stream_create(..., &strm.stream);
strm.init = true;
}
// use strm.stream...
} Ideally we would want something like void foo(MPIX_Stream &strm)
{
if (strm == MPIX_STREAM_INVALID) { // for example
MPIX_Stream_create(..., &strm);
}
// use strm...
} |
Petsc currently caches an internal duplicated comm, and I think it will need to cache a separate stream communicator. After all, the regular comm and stream comm are used very differently and it is cleaner not to mix them up.
|
Hmmm, this implies you cannot use a stream comm in place of a regular comm? I.e. MPIX_Stream_comm_create(..., &scomm);
MPI_Allreduce(..., scomm);
Indeed. The code snippet linked in the description stores all created objects in a std::unordered_map<cudaStreamId_t, std::unordered_map<MPI_Comm, MPI_Comm>> stream_comm_cache;
MPI_Comm scomm = stream_comm_cache[cuda_stream.get_id()][src_comm];
What should the value of Suppose however that we have decided on a value for it. How should external code (i.e. PETSc) go about setting this variable? It leaks the abstraction to ask users to set setenv("MPIR_CVAR_CH4_RESERVE_VCIS", "whatever");
MPI_Init(&argc, &argv); is also unsavory. It means that PETSc must be the one to initialize MPI. We currently allow users to initialize it themselves prior to |
That is correct. Actually, there are regular stream comms with regular streams (not CUDA streams). The regular stream comms can be used in place of a regular comm. The stream comms that are attached with CUDA streams have different semantics, i.e. all operations need be asynchronously queued to the CUDA stream, thus they are not interchangeable with regular comms.
We are well aware of the inconvenience, and I think it is possible to make it a bit implicit or dynamic. First we are focusing on the usability and functionality, then we will address the convenience part especially those that require more effort. |
Ah, that is much easier to handle then!
Of course. On this note, we would be more than happy to collaborate closely to help iron out any kinks. I am sure a good stress test of |
I am getting
what does this error indicate? I am doing (roughly) MPIX_Irecv_enqueue(...);
...
MPIX_Waitall_enqueue(...); // error fires here (@jczhang07 this is error is firing from Note that I do not get this error if I |
In |
The default is using
This approach involves a wait kernel -- a cuda kernel that busy waits on an external atomic variables. We are hitting some deadlock issues. This is likely because CUDA runtime is unaware of the dependency and may have extra locks or synchronizations that is causing the deadlock. Effort is needed to pin-down the issue and work out a mechanism to ensure the robutsness, as well as verifying the performance. |
OK I've also tried this, but am running into errors. For reference I am calling
|
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
I am a developer for PETSc (down the hall), and am working on integrating an experimental GPU stream-aware MPI layer. We are using MPICH’s experimental
MPIX_Stream
extension, but are running into some difficulties/have some questions.For reference, the relevant PETSc code for this new feature can be found in the function here
I have summarized the biggest ones below, but I am sure there will be many more...
Just to refresh everyone's memory, you must do
———
1.1. What is the lifetime of the
MPIX_Stream
vis-a-vis the stream comm (scomm
)? Can it outlivescomm
? If not, what order should we destroympi_stream
andscomm
?1.2. What is the lifetime of the
MPIX_Stream
vis-a-vis the underlying device stream? I assume that theMPIX_Stream
must be destroyed immediately before the CUDA stream is destroyed.1.3. Do we need a brand new
scomm
for eachcudaStream_t
-MPI_Comm
pairing? Reading throughMPIR_Stream_comm_create_impl()
this appears to be the case.1.4. Following on from 1.3, can we reuse the same
MPIX_Stream
to creating multiple separatesrc_comm
s (but same underlying CUDA stream)?1.5. Following on again from 1.3, any way to make a non-stream communicator into a stream comm in-place? I.e. something like
MPIX_Set_stream(PETSC_COMM_WORLD, some_mpix_stream)
?———
Which I dutifully rectify by doing
But this seems clunky, and leads to me to believe I have not done some other necessary setup. Why the environment variable? To me this implies some one-time setup, likely in
MPI_Init()
. Surely there is a way to dynamically allocate more streams while the application is running?The text was updated successfully, but these errors were encountered: