Fix the gradient of gather_nd #9200

sxjscience · 2017-12-26T05:08:16Z

Description

Add _backward_gather_nd, which accumulates the value when the indices are same. Should solve #9172

Checklist

Passed code style checking (make lint)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Code is well-documented:
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Add new type switch macro that can be used when int8 is not supported
atomicAdd support for int64
Add _backward_gather_nd
Set the gradient of gather_nd to _backward_gather_nd

Comments

I use atomicAdd to implement the operator. The current CPU implementation does not used openmp. Also, int8 and uint8 are not supported.

fix fix fix update only support real_type update update try to fix update fix update revise test fix lint

piiswrong · 2017-12-27T18:56:30Z

@reminisce

piiswrong · 2017-12-27T18:56:49Z

src/common/cuda_utils.h

@@ -479,6 +479,11 @@ static inline __device__ void atomicAdd(mshadow::half::half_t *address,
  } while (assumed != old);
 }

+// Overload atomicAdd to work for signed int64 on all architectures
+static inline  __device__  void atomicAdd(int64_t *address, int64_t val) {
+  atomicAdd(reinterpret_cast<unsigned long long*>(address), static_cast<unsigned long long>(val)); // NOLINT


are you sure this works for negative value?

It should be safe if CUDA uses 2's complement to implement the signed long long.

I found this: https://devtalk.nvidia.com/default/topic/506105/atomicadd-with-signed-long-long-not-working/

sxjscience · 2017-12-27T19:45:24Z

tests/python/unittest/test_operator.py

+            data = mx.nd.array([2123162361283621, -31231236374787,
+                                -112372937128970, -1378278798172378], dtype=dtype)
+            idx = mx.nd.array([[0, 0, 0, 0]], dtype='int32')
+            assert (mx.nd.scatter_nd_acc(data, idx, shape=(1,)).asnumpy()[0] == data.asnumpy().sum())


@piiswrong I've added another test case for the signed int64 case.

reminisce · 2017-12-27T19:34:31Z

src/operator/tensor/indexing_op.cc

+                                    const DType* data,
+                                    const IType* indices,
+                                    mshadow::Stream<cpu> *s) {
+  for (int i = 0; i < N; i++) {


This is single-threaded. Can we use #pragma omp critical or #pragma omp atomic for the cpu kernel?

Yes, we can use openmp. Let me have a try.

reminisce · 2017-12-27T21:44:50Z

src/operator/tensor/indexing_op.cu

@@ -209,6 +240,9 @@ NNVM_REGISTER_OP(gather_nd)
 NNVM_REGISTER_OP(scatter_nd)
 .set_attr<FCompute>("FCompute<gpu>", ScatterNDForward<gpu>);

+NNVM_REGISTER_OP(scatter_nd_acc)


The string acc looks ambiguous. I thought it standed for accurate in the beginning, but realized that it means accumulate later. It's named scatter_nd_add in TF, as there are also scatter_nd_sub, scatter_nd_mul, and scatter_nd_div. Shall we also call it scatter_nd_add to be precise?

There is a slight difference between scatter_nd_add and scatter_nd_acc. In scatter_nd_add, the results are added to another array. While in scatter_nd_acc, the values are added to a all-zero array. The number of arguments are different for these two OPs.

reminisce · 2017-12-27T21:51:28Z

src/operator/tensor/indexing_op.cu

+                                    mshadow::Stream<gpu> *s) {
+  using namespace mshadow::cuda;
+  int ngrid = std::min(kMaxGridNum, (N + kBaseThreadNum - 1) / kBaseThreadNum);
+  ScatterNDAccForwardImplKernel


Does Kernel::Launch not fit here?

It does not fit due to the atomicAdd.

Why does atomicAdd prevent Kernel::Launch from being used?

Okay, I can still use launch, but can only use it for GPU.

reminisce · 2017-12-27T21:53:45Z

src/operator/tensor/indexing_op.cu

@@ -179,6 +179,37 @@ inline void SparseEmbeddingOpBackwardRspImpl<gpu>(const OpContext& ctx,
  });
 }

+template<typename DType, typename IType>
+__global__ void ScatterNDAccForwardImplKernel(int N, int M, int K,
+                                         const mshadow::Shape<10> strides,


reminisce · 2017-12-27T22:56:33Z

src/operator/tensor/indexing_op.cc

+    }
+    for (int j = 0; j < K; ++j) {
+#pragma omp atomic
+      out[offset + j] += data[i * K + j];


You can consolidate this with the gpu kernel by using #if __CUDA__ #elsein the header file since this line is the only difference between cpu and gpu kernels. Then in the FCompute function, you can use Kernel::Launch for both cpu and gpu kernels. That would make the implementation less verbose.

@reminisce I've specialized the implementation of half_t and now it passes the test

sxjscience · 2017-12-28T07:28:21Z

It's very strange. The CI test fails on all windows machines.

This reverts commit 3eb3ac6.

This reverts commit a28fa53.

This reverts commit e99ffd0.

This reverts commit 399ba02.

sxjscience · 2017-12-28T19:45:01Z

@reminisce I find I cannot use omp atomic. Also, using omp critic will not have any parallelism. I've reverted back to the original version.

reminisce · 2017-12-28T19:50:54Z

What is the error of using omp atomic?

sxjscience · 2017-12-28T19:58:45Z

"#pragma omp atomic" has improper form on Windows
invalid expression type for ‘#pragma omp atomic’ on Linux

sxjscience · 2017-12-28T20:06:46Z

@reminisce I think it's caused by mshadow::half::half_t, which is not supported by omp atomic.

reminisce · 2017-12-28T20:42:32Z

I see. Is this a runtime error. If it's only float16 not supported, I suggest we'd better use omp atomic for all other types since float32 is the most common one.

sxjscience · 2017-12-28T22:18:21Z

@piiswrong @reminisce Can it be merged?

reminisce · 2017-12-28T22:26:41Z

tests/python/unittest/test_operator.py

-
-    assert (mx.nd.scatter_nd(data, idx, shape=(2, 2)).asnumpy() == [[0, 0], [2, 3]]).all()
+        assert (mx.nd.scatter_nd_acc(y, idx, shape=data.shape).asnumpy() == data.grad.asnumpy()).all()
+    for dtype in ['int32', 'int64', 'float16', 'float32', 'float64']:


It seems that only int64 has been tested for scatter_nd_acc on the same index case. Could you confirm?

sxjscience · 2017-12-29T01:21:53Z

tests/python/unittest/test_operator.py

+        data_npy = np.random.randint(0, 10, (100,))
+        data = mx.nd.array(data_npy, dtype=dtype)
+        idx = mx.nd.zeros(shape=(1, 100), dtype='int32')
+        assert (mx.nd.scatter_nd_acc(data, idx, shape=(1,)).asscalar() == data_npy.sum())


@reminisce I've added another test for all the dtypes.

sxjscience · 2017-12-30T21:49:34Z

Should I merge it in?

piiswrong · 2018-01-02T19:03:02Z

rename to _backward_gather_nd

sxjscience · 2018-01-03T22:49:05Z

@piiswrong I've renamed accordingly.

piiswrong · 2018-01-03T23:57:14Z

src/operator/tensor/indexing_op.cc

@@ -510,6 +548,10 @@ The elements in output is defined as follows::

 all other entries in output are 0.

+WARNING!!! If the indices have duplicates, the result will be non-deterministic and


This looks ugly. Standard warning message is
.. Warning:: xxx

* try to implement scatter_nd_acc fix fix fix update only support real_type update update try to fix update fix update revise test fix lint * fix * mark line as no lint * fix test * revise test * fix test case * revise * remove openmp * update * update * update * update test * Revert "update test" This reverts commit 3eb3ac6. * Revert "update" This reverts commit a28fa53. * Revert "update" This reverts commit e99ffd0. * Revert "update" This reverts commit 399ba02. * add atomic and specialize the behavior of half_t * use "!" instead of not * add test * fix test * fix test * fix test * rename to backward_gather_nd * fix * fix * fix doc

sxjscience added 3 commits December 25, 2017 20:50

try to implement scatter_nd_acc

1bf688b

fix fix fix update only support real_type update update try to fix update fix update revise test fix lint

fix

3469c19

mark line as no lint

b8f420a

sxjscience requested review from piiswrong and cjolivier01 December 26, 2017 21:09

fix test

0e56e75

piiswrong reviewed Dec 27, 2017

View reviewed changes

sxjscience added 2 commits December 27, 2017 11:42

revise test

37b2d32

fix test case

5760392

sxjscience commented Dec 27, 2017

View reviewed changes

reminisce reviewed Dec 27, 2017

View reviewed changes

revise

d869a85

reminisce reviewed Dec 27, 2017

View reviewed changes

sxjscience added 3 commits December 27, 2017 15:00

remove openmp

5cf2d8a

update

399ba02

update

e99ffd0

sxjscience added 6 commits December 27, 2017 23:30

update

a28fa53

update test

3eb3ac6

Revert "update test"

6b8c348

This reverts commit 3eb3ac6.

Revert "update"

026e709

This reverts commit a28fa53.

Revert "update"

0c9cddc

This reverts commit e99ffd0.

Revert "update"

8ca6c8f

This reverts commit 399ba02.

add atomic and specialize the behavior of half_t

6ccf08b

use "!" instead of not

808d331

reminisce reviewed Dec 28, 2017

View reviewed changes

sxjscience added 4 commits December 28, 2017 14:38

add test

7c3c327

fix test

7be9821

fix test

4e19b0b

fix test

04375dd

sxjscience commented Dec 29, 2017

View reviewed changes

reminisce approved these changes Dec 29, 2017

View reviewed changes

sxjscience added 3 commits January 3, 2018 10:57

rename to backward_gather_nd

b1f5fc3

fix

ec2f169

fix

4d0b31a

piiswrong reviewed Jan 3, 2018

View reviewed changes

fix doc

f6ffa4d

piiswrong merged commit d918868 into apache:master Jan 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the gradient of gather_nd #9200

Fix the gradient of gather_nd #9200

sxjscience commented Dec 26, 2017 •

edited

Loading

piiswrong commented Dec 27, 2017

piiswrong Dec 27, 2017

sxjscience Dec 27, 2017

cjolivier01 Jan 4, 2018

sxjscience Dec 27, 2017

reminisce Dec 27, 2017

sxjscience Dec 27, 2017

reminisce Dec 27, 2017

sxjscience Dec 27, 2017

reminisce Dec 27, 2017

sxjscience Dec 27, 2017

reminisce Dec 27, 2017

sxjscience Dec 27, 2017

reminisce Dec 27, 2017

reminisce Dec 27, 2017

sxjscience Dec 28, 2017

sxjscience commented Dec 28, 2017 •

edited

Loading

sxjscience commented Dec 28, 2017

reminisce commented Dec 28, 2017

sxjscience commented Dec 28, 2017 •

edited

Loading

sxjscience commented Dec 28, 2017

reminisce commented Dec 28, 2017

sxjscience commented Dec 28, 2017

reminisce Dec 28, 2017

sxjscience Dec 29, 2017

sxjscience commented Dec 30, 2017

piiswrong commented Jan 2, 2018

sxjscience commented Jan 3, 2018

piiswrong Jan 3, 2018

		@@ -510,6 +548,10 @@ The elements in output is defined as follows::

		all other entries in output are 0.

		WARNING!!! If the indices have duplicates, the result will be non-deterministic and

Fix the gradient of gather_nd #9200

Fix the gradient of gather_nd #9200

Conversation

sxjscience commented Dec 26, 2017 • edited Loading

Description

Checklist

Changes

Comments

piiswrong commented Dec 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sxjscience commented Dec 28, 2017 • edited Loading

sxjscience commented Dec 28, 2017

reminisce commented Dec 28, 2017

sxjscience commented Dec 28, 2017 • edited Loading

sxjscience commented Dec 28, 2017

reminisce commented Dec 28, 2017

sxjscience commented Dec 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sxjscience commented Dec 30, 2017

piiswrong commented Jan 2, 2018

sxjscience commented Jan 3, 2018

Choose a reason for hiding this comment

sxjscience commented Dec 26, 2017 •

edited

Loading

sxjscience commented Dec 28, 2017 •

edited

Loading

sxjscience commented Dec 28, 2017 •

edited

Loading