-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor MX4 Kernel to operate on flat tensors #2836
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
This pull request was exported from Phabricator. Differential Revision: D59653809 |
This pull request was exported from Phabricator. Differential Revision: D59653809 |
Summary: Pull Request resolved: pytorch#2836 Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array. The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes. Reviewed By: sryap Differential Revision: D59653809
70efd01
to
7b8b45d
Compare
Summary: Pull Request resolved: pytorch#2836 Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array. The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes. Differential Revision: D59653809 Reviewed By: sryap
This pull request was exported from Phabricator. Differential Revision: D59653809 |
Summary: Pull Request resolved: pytorch#2836 Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array. The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes. Reviewed By: sryap Differential Revision: D59653809
7b8b45d
to
eca585e
Compare
) Summary: X-link: facebookresearch/FBGEMM#20 Pull Request resolved: pytorch#2816 As noted in [this doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr), using a ceiling round for scale calculation does a better job of not truncating some mantissa bits. This diff switches triton's floor rounding to ceil rounding. Note that currently mx4_test doesnt pass as the cuda kernel now has different behavior than triton. Once we rebase this diff onto a similar change to the cuda kernel, we should see exact matching outputs again. Differential Revision: D59527463 Reviewed By: jianyuh
Summary: Pull Request resolved: pytorch#2836 Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array. The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes. Differential Revision: D59653809 Reviewed By: sryap
This pull request was exported from Phabricator. Differential Revision: D59653809 |
Summary: Pull Request resolved: pytorch#2836 Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array. The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes. Reviewed By: sryap Differential Revision: D59653809
eca585e
to
40cc6d0
Compare
Summary: Pull Request resolved: pytorch#2836 Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array. The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes. Reviewed By: sryap Differential Revision: D59653809
This pull request was exported from Phabricator. Differential Revision: D59653809 |
40cc6d0
to
6002a71
Compare
Summary: Pull Request resolved: pytorch#2836 Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array. The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes. Differential Revision: D59653809 Reviewed By: sryap
Summary: Pull Request resolved: pytorch#2836 Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array. The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes. Differential Revision: D59653809 Reviewed By: sryap
Summary: Pull Request resolved: pytorch#2836 Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array. The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes. Differential Revision: D59653809 Reviewed By: sryap
Summary: Pull Request resolved: pytorch#2836 Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array. The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes. Differential Revision: D59653809 Reviewed By: sryap
This pull request has been merged in f5906e0. |
Summary:
Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array.
The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes.
Reviewed By: sryap
Differential Revision: D59653809