-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[oneDNN] Added Elementwise Mul grad fp32/bf16 #31647
[oneDNN] Added Elementwise Mul grad fp32/bf16 #31647
Conversation
- compilabale - working elementwise_mul fp32 without broadcasting - Some more changes - change of format setting - fix - lint and disabling not working UT
@arlesniak , @arogowie-intel Could you please review? |
@jakpiase Please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job. Have some comments.
namespace paddle { | ||
namespace framework { | ||
class ExecutionContext; | ||
} // namespace framework | ||
namespace platform { | ||
class CPUDeviceContext; | ||
struct CPUPlace; | ||
} // namespace platform | ||
} // namespace paddle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using forward declarations instead of including an appropriate header file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied that from elementwise_add which this is implementation is based on. What is advantage of not using forward declaration? I always sow them as speeding up a compilation process ?
user_defined_grad_outputs=[self.x_bf16]) | ||
|
||
|
||
class TestElementwiseMulBroadCastingBf16MklDNNOp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class TestElementwiseMulBroadCastingBf16MklDNNOp( | |
class TestElementwiseMulBroadcastingBf16MklDNNOp( |
// Handler should have dy passed but for broadcasting | ||
// it is not good. So we pass x (for dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a bit mysterious IMHO. Please explain why dy
should have been passed and why it's actually not. Please explain what are the consequences of passing nullptr
as z
tensor for this primitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I rephrased
def test_check_grad_ingore_x(self): | ||
pass | ||
|
||
def test_check_grad_ingore_y(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removing ignore_x
and leaving ignore_y
? Shouldn't all grad_normal
ignore_x
as well as ignore_y
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice catch. Reason is that there maybe something broken for reference implementation of test_check_grad_ingore_y , because it failed for that UT even without oneDNN being used. So I enabled only this test that I checked that works and left other commented out.
@@ -276,16 +276,15 @@ class ElementwiseOpGrad : public framework::OperatorWithKernel { | |||
|
|||
#ifdef PADDLE_WITH_MKLDNN | |||
// If broadcasting is needed, use native implementation | |||
auto CanMKLDNNElementwiseAddGradBeUsed = [&]() { | |||
auto CanMKLDNNElementwiseGradBeUsed = [&]() { | |||
auto dx_dims = ctx.Input<Tensor>("X")->dims(); | |||
auto dy_dims = ctx.Input<Tensor>("Y")->dims(); | |||
// No broadcast or broadcasting of data on inner dims is supported | |||
return (dx_dims[dx_dims.size() - 1] == dy_dims[dy_dims.size() - 1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a case like this: dx_dims = [2, 3, 4, 5]
, and dy_dims = [3, 3, 5, 5]
This couldn't be broadcasted together and this will pass this condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but operator itself will never be given such a values. If they were given such a shape infershape of op should reject them.
@arogowie-intel Please continue your review |
@luotao1 could you please start your review? PR-CI-APPROVAL needs your approval. |
PR types
Performance optimization
PR changes
OPs
Describe
Added implementation of elementwise_mul grad (fp32 & bf16)