[oneDNN] Added Elementwise Mul grad fp32/bf16 #31647

jczaja · 2021-03-15T15:47:34Z

PR types

Performance optimization

PR changes

OPs

Describe

Added implementation of elementwise_mul grad (fp32 & bf16)

- compilabale - working elementwise_mul fp32 without broadcasting - Some more changes - change of format setting - fix - lint and disabling not working UT

jczaja · 2021-03-16T12:23:20Z

@arlesniak , @arogowie-intel Could you please review?

jczaja · 2021-03-16T16:22:33Z

@jakpiase Please review

arogowie-intel

Great job. Have some comments.

arogowie-intel · 2021-03-17T13:20:40Z

paddle/fluid/operators/elementwise/mkldnn/elementwise_mul_mkldnn_op.cc

+namespace paddle {
+namespace framework {
+class ExecutionContext;
+}  // namespace framework
+namespace platform {
+class CPUDeviceContext;
+struct CPUPlace;
+}  // namespace platform
+}  // namespace paddle


Why are you using forward declarations instead of including an appropriate header file?

I copied that from elementwise_add which this is implementation is based on. What is advantage of not using forward declaration? I always sow them as speeding up a compilation process ?

arogowie-intel · 2021-03-17T14:53:40Z

python/paddle/fluid/tests/unittests/mkldnn/test_elementwise_mul_bf16_mkldnn_op.py

+            user_defined_grad_outputs=[self.x_bf16])
+
+
+class TestElementwiseMulBroadCastingBf16MklDNNOp(


Suggested change

class TestElementwiseMulBroadCastingBf16MklDNNOp(

class TestElementwiseMulBroadcastingBf16MklDNNOp(

arogowie-intel · 2021-03-18T08:02:42Z

paddle/fluid/operators/elementwise/mkldnn/elementwise_mul_mkldnn_op.cc

+      // Handler should have dy passed but for broadcasting
+      // it is not good. So we pass x (for dims)


This comment is a bit mysterious IMHO. Please explain why dy should have been passed and why it's actually not. Please explain what are the consequences of passing nullptr as z tensor for this primitive.

ok. I rephrased

arogowie-intel · 2021-03-18T08:49:07Z

python/paddle/fluid/tests/unittests/mkldnn/test_elementwise_mul_mkldnn_op.py

-    def test_check_grad_ingore_x(self):
-        pass
-
    def test_check_grad_ingore_y(self):
        pass


Why removing ignore_x and leaving ignore_y ? Shouldn't all grad_normal ignore_x as well as ignore_y work?

Very nice catch. Reason is that there maybe something broken for reference implementation of test_check_grad_ingore_y , because it failed for that UT even without oneDNN being used. So I enabled only this test that I checked that works and left other commented out.

arogowie-intel · 2021-03-18T08:57:06Z

paddle/fluid/operators/elementwise/elementwise_op.h

@@ -276,16 +276,15 @@ class ElementwiseOpGrad : public framework::OperatorWithKernel {

 #ifdef PADDLE_WITH_MKLDNN
    // If broadcasting is needed, use native implementation
-    auto CanMKLDNNElementwiseAddGradBeUsed = [&]() {
+    auto CanMKLDNNElementwiseGradBeUsed = [&]() {
      auto dx_dims = ctx.Input<Tensor>("X")->dims();
      auto dy_dims = ctx.Input<Tensor>("Y")->dims();
      // No broadcast or broadcasting of data on inner dims is supported
      return (dx_dims[dx_dims.size() - 1] == dy_dims[dy_dims.size() - 1]);


What about a case like this: dx_dims = [2, 3, 4, 5], and dy_dims = [3, 3, 5, 5] This couldn't be broadcasted together and this will pass this condition.

Yes, but operator itself will never be given such a values. If they were given such a shape infershape of op should reject them.

jczaja · 2021-03-18T18:11:28Z

@arogowie-intel Please continue your review

jczaja · 2021-03-19T09:50:47Z

@luotao1 could you please start your review? PR-CI-APPROVAL needs your approval.

- Initial implementation of grad elementwise_mul fp32

6c6fad9

- compilabale - working elementwise_mul fp32 without broadcasting - Some more changes - change of format setting - fix - lint and disabling not working UT

jczaja added the Intel label Mar 15, 2021

jczaja added 4 commits March 15, 2021 17:57

- lint fixes

0cb0653

- lint fixes

fb7842e

- lint

cdfb75c

- fix

0b652b1

jczaja requested a review from wozna March 16, 2021 12:22

arogowie-intel reviewed Mar 18, 2021

View reviewed changes

- Fixes after review

7e53ba7

arogowie-intel approved these changes Mar 19, 2021

View reviewed changes

jczaja assigned luotao1 Mar 19, 2021

luotao1 approved these changes Mar 19, 2021

View reviewed changes

luotao1 merged commit 25fc2a1 into PaddlePaddle:develop Mar 19, 2021

lidanqing-intel mentioned this pull request Apr 14, 2021

Enable BF16 on Paddle Parameter Server Distributed Training #30560

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[oneDNN] Added Elementwise Mul grad fp32/bf16 #31647

[oneDNN] Added Elementwise Mul grad fp32/bf16 #31647

jczaja commented Mar 15, 2021

jczaja commented Mar 16, 2021

jczaja commented Mar 16, 2021

arogowie-intel left a comment

arogowie-intel Mar 17, 2021

jczaja Mar 18, 2021

arogowie-intel Mar 17, 2021

arogowie-intel Mar 18, 2021

jczaja Mar 18, 2021

arogowie-intel Mar 18, 2021

jczaja Mar 18, 2021

arogowie-intel Mar 18, 2021

jczaja Mar 18, 2021

jczaja commented Mar 18, 2021

jczaja commented Mar 19, 2021

		user_defined_grad_outputs=[self.x_bf16])


		class TestElementwiseMulBroadCastingBf16MklDNNOp(

		// Handler should have dy passed but for broadcasting
		// it is not good. So we pass x (for dims)

[oneDNN] Added Elementwise Mul grad fp32/bf16 #31647

[oneDNN] Added Elementwise Mul grad fp32/bf16 #31647

Conversation

jczaja commented Mar 15, 2021

PR types

PR changes

Describe

jczaja commented Mar 16, 2021

jczaja commented Mar 16, 2021

arogowie-intel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jczaja commented Mar 18, 2021

jczaja commented Mar 19, 2021