Add FP16 and FP64 Supports to PureNcclComunicator #187

shu65 · 2018-01-17T08:50:54Z

This PR adds FP16 and FP64 support to allreduce_grad of PureNcclCommunicator.

… support_fp16_fp64

kuenishi

Despite a few nitpicks the code looks good to me.

kuenishi · 2018-02-16T04:37:35Z

chainermn/communicators/__init__.py

@@ -41,6 +45,10 @@ def create_communicator(
        import mpi4py.MPI
        mpi_comm = mpi4py.MPI.COMM_WORLD

+    if communicator_name != 'pure_nccl' and allreduce_grad_dtype is not None:
+        raise ValueError(
+            'allreduce_grad_dtype is not supported except for \'pure_nccl\'.')


'allreduce_grad_dtype is only available at 'pure_nccl' communicator.'

Thank you. I will fix this error message according to your comment.

kuenishi · 2018-02-16T04:40:52Z

chainermn/communicators/__init__.py

@@ -31,6 +34,7 @@ def create_communicator(
          ``hierarchical``, ``two_dimensional``, ``pure_nccl``, or
          ``single_node``)
        mpi_comm: MPI4py communicator
+        allreduce_grad_dtype: Data type of gradient used in All-Reduce


I think allowed value here is only np.float32, np.float16 or None. Maybe we can add value check somewhere around here, before actually creating communicator?

Sorry there are already value check in PureNcclCommunicator, so may be we just can add documentation here, like "Allowed types are numpy.float16, 32, 64 or None....".

So the behaviour when None is passed should be described here, which is to use float type of models.

OK. I will add the description.

kuenishi · 2018-02-16T07:42:40Z

chainermn/communicators/_memory_utility.py

@@ -70,8 +70,8 @@ def ptr(self):
    def buffer(self, size):
        return self.ffi.buffer(self.ffi.cast('void *', self.memory.ptr), size)

-    def array(self, shape, offset=0):
-        return cp.ndarray(shape, memptr=self.memory + offset, dtype=cp.float32)
+    def array(self, shape, offset=0, dtype=np.float32):


So if we don't want cupy's default behaviour setting dtype as float64 when None is passed, we might better add non-None assertion here.

I I also think so, I will add the assertion.

shu65 added 14 commits January 17, 2018 17:43

support fp16 and fp64 in allreduce_grad of PureNcclComunicator

151d0a7

add arg for allreduce_grad_dtype to create_communicator

f2711a4

merge master

e5fc924

fix errors of flake8

bac2038

fix bug

c56d88f

refactoring

16d9310

Merge branch 'support_fp16_fp64' of github.com:chainer/chainermn into…

50456dd

… support_fp16_fp64

fix bugs

3d37d84

fix bugs

309678b

Merge remote-tracking branch 'origin/master' into support_fp16_fp64

ca4d611

fix arg of create_communicator

e3f8d4f

fix the bug in pure_nccl

2b74d40

refactoring

2b76da2

add test for fp16 and fp64 models

9c6a0da

shu65 changed the title ~~[WIP] Support FP16 and FP64 in allreduce_grad of PureNcclComunicator~~ Add FP16 and FP64 Support to PureNcclComunicator Feb 1, 2018

shu65 changed the title ~~Add FP16 and FP64 Support to PureNcclComunicator~~ Add FP16 and FP64 Supports to PureNcclComunicator Feb 1, 2018

fix flake8 errors

65c96ae

iwiwi requested a review from kuenishi February 2, 2018 04:38

iwiwi assigned kuenishi Feb 2, 2018

Merge remote-tracking branch 'origin/master' into support_fp16_fp64

f3ba474

kuenishi requested changes Feb 16, 2018

View reviewed changes

shu65 added 2 commits February 22, 2018 15:04

fix according to reviewer comments

9ed42fd

fix bugs

81d1d73

kuenishi approved these changes Feb 23, 2018

View reviewed changes

kuenishi merged commit c2e1440 into master Feb 23, 2018

kuenishi deleted the support_fp16_fp64 branch February 23, 2018 01:58

kuenishi added this to the v1.3.0 milestone Apr 5, 2018

kuenishi mentioned this pull request Apr 5, 2018

FP16 support #31

Closed

shu65 added the feature label May 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FP16 and FP64 Supports to PureNcclComunicator #187

Add FP16 and FP64 Supports to PureNcclComunicator #187

shu65 commented Jan 17, 2018 •

edited

Loading

kuenishi left a comment

kuenishi Feb 16, 2018

shu65 Feb 22, 2018

kuenishi Feb 16, 2018

kuenishi Feb 16, 2018

kuenishi Feb 16, 2018

shu65 Feb 22, 2018

kuenishi Feb 16, 2018

shu65 Feb 22, 2018

Add FP16 and FP64 Supports to PureNcclComunicator #187

Add FP16 and FP64 Supports to PureNcclComunicator #187

Conversation

shu65 commented Jan 17, 2018 • edited Loading

kuenishi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shu65 commented Jan 17, 2018 •

edited

Loading