-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FP16 and FP64 Supports to PureNcclComunicator #187
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite a few nitpicks the code looks good to me.
chainermn/communicators/__init__.py
Outdated
@@ -41,6 +45,10 @@ def create_communicator( | |||
import mpi4py.MPI | |||
mpi_comm = mpi4py.MPI.COMM_WORLD | |||
|
|||
if communicator_name != 'pure_nccl' and allreduce_grad_dtype is not None: | |||
raise ValueError( | |||
'allreduce_grad_dtype is not supported except for \'pure_nccl\'.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'allreduce_grad_dtype is only available at 'pure_nccl' communicator.'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I will fix this error message according to your comment.
chainermn/communicators/__init__.py
Outdated
@@ -31,6 +34,7 @@ def create_communicator( | |||
``hierarchical``, ``two_dimensional``, ``pure_nccl``, or | |||
``single_node``) | |||
mpi_comm: MPI4py communicator | |||
allreduce_grad_dtype: Data type of gradient used in All-Reduce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think allowed value here is only np.float32
, np.float16
or None
. Maybe we can add value check somewhere around here, before actually creating communicator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry there are already value check in PureNcclCommunicator, so may be we just can add documentation here, like "Allowed types are numpy.float16, 32, 64 or None....".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the behaviour when None
is passed should be described here, which is to use float type of models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will add the description.
@@ -70,8 +70,8 @@ def ptr(self): | |||
def buffer(self, size): | |||
return self.ffi.buffer(self.ffi.cast('void *', self.memory.ptr), size) | |||
|
|||
def array(self, shape, offset=0): | |||
return cp.ndarray(shape, memptr=self.memory + offset, dtype=cp.float32) | |||
def array(self, shape, offset=0, dtype=np.float32): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we don't want cupy's default behaviour setting dtype as float64
when None
is passed, we might better add non-None assertion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I I also think so, I will add the assertion.
This PR adds FP16 and FP64 support to allreduce_grad of PureNcclCommunicator.