You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering how I can run cuML with NCCL and MPI.
I am able to run a distributed sample code (nearest neighbor given as an example) on two workers.
It looks like this code is using pt2pt UCX communication. Is there any algorithm that uses NCCL collectives?
The text was updated successfully, but these errors were encountered:
All of the distributed algorithms that use the C++ communications layer API use NCCL currently. For example, the distributed nearest neighbors implementation uses it to broadcast batches to other ranks. UCX is used there to gather the results of each batch since the NCCL public API doesn't have a gather
Are you wanting to run the cuML distributed algorithms directly in MPI without going through Python / Dask? We have a couple examples that can be executed using the MPI/NCCL backend, which will use MPI to bootstrap a NCCL clique and then use NCCL for collectives. If you use a CUDA-aware MPI implementation that was built with UCX support, you'll get the equivalent of our NCCL/UCX backend.
You should be able to run the examples above in MPI by creating a cumlHandle and injecting an instance of the MPI communicator into it. Finally, if you are interested in using this from the Python layer and Dask, this pytest might be a good place to start.
Let us know if you encounter any issues. It's also worth mentioning that our communications layer is in the beginning stages of being transferred to a new project named raft.
@aj-prime, assuming the above answers your question, I'm going to close this for now. Please feel free to open this back up in you have further questions.
I was wondering how I can run cuML with NCCL and MPI.
I am able to run a distributed sample code (nearest neighbor given as an example) on two workers.
It looks like this code is using pt2pt UCX communication. Is there any algorithm that uses NCCL collectives?
The text was updated successfully, but these errors were encountered: