Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support BFloat16 in convolution_backward #7807

Merged
merged 1 commit into from
Jan 23, 2025
Merged

Conversation

swolchok
Copy link
Contributor

Partial fix for #7748.

[ghstack-poisoned]
@swolchok
Copy link
Contributor Author

swolchok commented Jan 21, 2025

Stack from ghstack (oldest at bottom):

Copy link

pytorch-bot bot commented Jan 21, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7807

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 11f4d2d with merge base 466d98f (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 21, 2025
swolchok added a commit that referenced this pull request Jan 21, 2025
Partial fix for #7748.

ghstack-source-id: f51e6f2acd84901aaf7a997658a7d02c93b958e7
ghstack-comment-id: 2605752234
Pull Request resolved: #7807
@swolchok swolchok added the release notes: ops & kernels Changes to the opset and any new / changed kernel implementations label Jan 21, 2025
auto expected_grad_weight = tf.make({4, 3, 4, 2}, expected_grad_weight_data);
auto expected_grad_bias = tf.make({4}, expected_grad_bias_data);
if (DTYPE == ScalarType::Half || DTYPE == ScalarType::BFloat16) {
EXPECT_TENSOR_CLOSE_WITH_TOL(grad_input, expected_grad_input, 1e-2, 1e-8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use defaults here? EXPECT_TENSOR_CLOSE_WITH_TOL should apply the right tolerance given the type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the default rtol is 1e-5; rtol and atol are different

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, but in the same way that we have kDefaultHalfAtol and kDefaultBFloat16Atol I think we should have kDefaultHalfRtol and kDefaultBFloat16Rtol and set it to a proper value.
You seem to be using 1e-2 for most of these tests. Why not introduced kDefaultHalfRtol and kDefaultBFloat16Rtol with value 1e-2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not introduced kDefaultHalfRtol and kDefaultBFloat16Rtol with value 1e-2?

Because not all operators require the higher rtol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not particularly uncommon to need to set rtol in pytorch core: https://github.com/search?q=repo%3Apytorch%2Fpytorch+%2Frtol%3D%5B1-9%5D%2F&type=code

@swolchok swolchok merged commit dabd72f into main Jan 23, 2025
44 of 47 checks passed
@swolchok swolchok deleted the gh/swolchok/158/head branch January 23, 2025 17:40
YIWENX14 pushed a commit that referenced this pull request Jan 28, 2025
zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this pull request Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: ops & kernels Changes to the opset and any new / changed kernel implementations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants