New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Refactor MX4 Kernel to operate on flat tensors #2836

Closed

jwfromm wants to merge 2 commits into pytorch:main from jwfromm:export-D59653809

Contributor

jwfromm commented Jul 12, 2024

Summary:
Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array.

The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes.

Reviewed By: sryap

Differential Revision: D59653809

facebook-github-bot added the cla signed label

netlify bot commented Jul 12, 2024 •

edited

Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`6002a71`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/669190e562a96b0008b99819
😎 Deploy Preview	https://deploy-preview-2836--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Contributor

facebook-github-bot commented Jul 12, 2024

This pull request was exported from Phabricator. Differential Revision: D59653809

facebook-github-bot added the fb-exported label

Contributor

facebook-github-bot commented Jul 12, 2024

This pull request was exported from Phabricator. Differential Revision: D59653809

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request


          Refactor MX4 Kernel to operate on flat tensors (pytorch#2836)

7b8b45d

Summary:
Pull Request resolved: pytorch#2836

Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array.

The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes.

Reviewed By: sryap

Differential Revision: D59653809

jwfromm force-pushed the export-D59653809 branch from 70efd01 to 7b8b45d Compare

July 12, 2024 19:57

jwfromm pushed a commit to jwfromm/FBGEMM that referenced this pull request


          Refactor MX4 Kernel to operate on flat tensors (pytorch#2836)

a7cd500

Summary:
Pull Request resolved: pytorch#2836

Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array.

The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes.

Differential Revision: D59653809

Reviewed By: sryap

Contributor

facebook-github-bot commented Jul 12, 2024

This pull request was exported from Phabricator. Differential Revision: D59653809

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request


          Refactor MX4 Kernel to operate on flat tensors (pytorch#2836)

eca585e

Summary:
Pull Request resolved: pytorch#2836

Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array.

The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes.

Reviewed By: sryap

Differential Revision: D59653809

jwfromm force-pushed the export-D59653809 branch from 7b8b45d to eca585e Compare

July 12, 2024 20:03


          Use better exponent rounding in Triton MX4 quantize kernel (pytorch#2816

49d8bc6

)

Summary:
X-link: facebookresearch/FBGEMM#20

Pull Request resolved: pytorch#2816

As noted in [this doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr), using a ceiling round for scale calculation does a better job of not truncating some mantissa bits. This diff switches triton's floor rounding to ceil rounding.

Note that currently mx4_test doesnt pass as the cuda kernel now has different behavior than triton. Once we rebase this diff onto a similar change to the cuda kernel, we should see exact matching outputs again.

Differential Revision: D59527463

Reviewed By: jianyuh

jwfromm pushed a commit to jwfromm/FBGEMM that referenced this pull request


          Refactor MX4 Kernel to operate on flat tensors (pytorch#2836)

d1f21e1

Summary:
Pull Request resolved: pytorch#2836

Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array.

The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes.

Differential Revision: D59653809

Reviewed By: sryap

Contributor

facebook-github-bot commented Jul 12, 2024

This pull request was exported from Phabricator. Differential Revision: D59653809

jwfromm added a commit to jwfromm/FBGEMM that referenced this pull request


          Refactor MX4 Kernel to operate on flat tensors (pytorch#2836)

40cc6d0

Summary:
Pull Request resolved: pytorch#2836

Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array.

The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes.

Reviewed By: sryap

Differential Revision: D59653809

jwfromm force-pushed the export-D59653809 branch from eca585e to 40cc6d0 Compare

July 12, 2024 20:13


          Refactor MX4 Kernel to operate on flat tensors (pytorch#2836)

6002a71

Summary:
Pull Request resolved: pytorch#2836

Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array.

The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes.

Reviewed By: sryap

Differential Revision: D59653809

Contributor

facebook-github-bot commented Jul 12, 2024

This pull request was exported from Phabricator. Differential Revision: D59653809

jwfromm force-pushed the export-D59653809 branch from 40cc6d0 to 6002a71 Compare

July 12, 2024 20:24

jwfromm pushed a commit to jwfromm/FBGEMM that referenced this pull request


          Refactor MX4 Kernel to operate on flat tensors (pytorch#2836)

Summary:
Pull Request resolved: pytorch#2836

Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array.

The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes.

Differential Revision: D59653809

Reviewed By: sryap

jwfromm pushed a commit to jwfromm/FBGEMM that referenced this pull request


          Refactor MX4 Kernel to operate on flat tensors (pytorch#2836)

3cf2e46

Summary:
Pull Request resolved: pytorch#2836

Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array.

The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes.

Differential Revision: D59653809

Reviewed By: sryap

jwfromm pushed a commit to jwfromm/FBGEMM that referenced this pull request


          Refactor MX4 Kernel to operate on flat tensors (pytorch#2836)

46847b1

Summary:
Pull Request resolved: pytorch#2836

Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array.

The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes.

Differential Revision: D59653809

Reviewed By: sryap

jwfromm pushed a commit to jwfromm/FBGEMM that referenced this pull request


          Refactor MX4 Kernel to operate on flat tensors (pytorch#2836)

be92f74

Summary:
Pull Request resolved: pytorch#2836

Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array.

The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes.

Differential Revision: D59653809

Reviewed By: sryap

facebook-github-bot closed this in

f5906e0

Contributor

facebook-github-bot commented Jul 14, 2024

This pull request has been merged in f5906e0.

facebook-github-bot added the Merged label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported Merged