Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[oneDNN] Added Elementwise Mul grad fp32/bf16 #31647

Merged
merged 6 commits into from
Mar 19, 2021

Conversation

jczaja
Copy link
Contributor

@jczaja jczaja commented Mar 15, 2021

PR types

Performance optimization

PR changes

OPs

Describe

Added implementation of elementwise_mul grad (fp32 & bf16)

- compilabale

- working elementwise_mul fp32 without broadcasting

- Some more changes

- change of format setting

- fix

- lint and disabling not working UT
@jczaja jczaja added the Intel label Mar 15, 2021
@jczaja jczaja requested a review from wozna March 16, 2021 12:22
@jczaja
Copy link
Contributor Author

jczaja commented Mar 16, 2021

@arlesniak , @arogowie-intel Could you please review?

@jczaja
Copy link
Contributor Author

jczaja commented Mar 16, 2021

@jakpiase Please review

Copy link
Contributor

@arogowie-intel arogowie-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job. Have some comments.

Comment on lines +17 to +25
namespace paddle {
namespace framework {
class ExecutionContext;
} // namespace framework
namespace platform {
class CPUDeviceContext;
struct CPUPlace;
} // namespace platform
} // namespace paddle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using forward declarations instead of including an appropriate header file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied that from elementwise_add which this is implementation is based on. What is advantage of not using forward declaration? I always sow them as speeding up a compilation process ?

user_defined_grad_outputs=[self.x_bf16])


class TestElementwiseMulBroadCastingBf16MklDNNOp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class TestElementwiseMulBroadCastingBf16MklDNNOp(
class TestElementwiseMulBroadcastingBf16MklDNNOp(

Comment on lines 75 to 76
// Handler should have dy passed but for broadcasting
// it is not good. So we pass x (for dims)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a bit mysterious IMHO. Please explain why dy should have been passed and why it's actually not. Please explain what are the consequences of passing nullptr as z tensor for this primitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I rephrased

Comment on lines -54 to 56
def test_check_grad_ingore_x(self):
pass

def test_check_grad_ingore_y(self):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing ignore_x and leaving ignore_y ? Shouldn't all grad_normal ignore_x as well as ignore_y work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice catch. Reason is that there maybe something broken for reference implementation of test_check_grad_ingore_y , because it failed for that UT even without oneDNN being used. So I enabled only this test that I checked that works and left other commented out.

@@ -276,16 +276,15 @@ class ElementwiseOpGrad : public framework::OperatorWithKernel {

#ifdef PADDLE_WITH_MKLDNN
// If broadcasting is needed, use native implementation
auto CanMKLDNNElementwiseAddGradBeUsed = [&]() {
auto CanMKLDNNElementwiseGradBeUsed = [&]() {
auto dx_dims = ctx.Input<Tensor>("X")->dims();
auto dy_dims = ctx.Input<Tensor>("Y")->dims();
// No broadcast or broadcasting of data on inner dims is supported
return (dx_dims[dx_dims.size() - 1] == dy_dims[dy_dims.size() - 1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a case like this: dx_dims = [2, 3, 4, 5], and dy_dims = [3, 3, 5, 5] This couldn't be broadcasted together and this will pass this condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but operator itself will never be given such a values. If they were given such a shape infershape of op should reject them.

@jczaja
Copy link
Contributor Author

jczaja commented Mar 18, 2021

@arogowie-intel Please continue your review

@jczaja
Copy link
Contributor Author

jczaja commented Mar 19, 2021

@luotao1 could you please start your review? PR-CI-APPROVAL needs your approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants