-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Triton MX4 Quantize Rounding Mode Support. #2821
Conversation
This pull request was exported from Phabricator. Differential Revision: D59562029 |
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
This pull request was exported from Phabricator. Differential Revision: D59562029 |
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support (almost) all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Stochastic coming soon. Differential Revision: D59562029
a3684eb
to
1f00502
Compare
This pull request was exported from Phabricator. Differential Revision: D59562029 |
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support (almost) all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Stochastic coming soon. Differential Revision: D59562029
1f00502
to
600bb2f
Compare
This pull request was exported from Phabricator. Differential Revision: D59562029 |
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Differential Revision: D59562029
600bb2f
to
1a56f8a
Compare
This pull request was exported from Phabricator. Differential Revision: D59562029 |
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Differential Revision: D59562029
1a56f8a
to
031cb49
Compare
Summary: Pull Request resolved: pytorch#2836 Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array. The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes. Differential Revision: D59653809 Reviewed By: sryap
Summary: We previously had to use python to unravel values from exponents and feed them to triton as two separate tensors. This introduced a lot of overhead as it introduced large copies. This diff does a bunch of fancy indexing to directly operate on a tensor with mixed elements and exponents. The result is that triton dequantize is now slightly faster than the cuda kernel. My hope is that this allows us to standardize on a single implementation. Differential Revision: D59661776
Summary: We apply a similar technique as we did to dequantize in D59661776 to MX4 quantization. Specifically we do fancy indexing to be able to write both exponents and values to the same output tensor within the triton kernel. This allows us to only allocate a single output and do no extra copies, giving a sizeable 40% performance boost. Before this change: ``` INFO:root:input size: 1073741824 group size: 32 INFO:root:Start to benchmark ... INFO:root:Start to benchmark ... input_size=1073741824 MX4 quantized time per iter: 7563us input_size=1073741824 MX4 dequantized time per iter: 2756us INFO:root:Start to benchmark ... INFO:root:Start to benchmark ... input_size=1073741824 MX4 triton quantized time per iter: 5110us input_size=1073741824 MX4 triton dequantized time per iter: 2417us INFO:root:Start to benchmark ... INFO:root:Start to benchmark ... input_size=1073741824 FP8 quantized time per iter: 6274us input_size=1073741824 FP8 dequantized time per iter: 4223us ``` After this change: ``` INFO:root:input size: 1073741824 group size: 32 INFO:root:Start to benchmark ... INFO:root:Start to benchmark ... input_size=1073741824 MX4 quantized time per iter: 7560us input_size=1073741824 MX4 dequantized time per iter: 2758us INFO:root:Start to benchmark ... INFO:root:Start to benchmark ... input_size=1073741824 MX4 triton quantized time per iter: 3138us input_size=1073741824 MX4 triton dequantized time per iter: 2418us INFO:root:Start to benchmark ... INFO:root:Start to benchmark ... input_size=1073741824 FP8 quantized time per iter: 6274us input_size=1073741824 FP8 dequantized time per iter: 4226us ``` Differential Revision: D59688150
This pull request was exported from Phabricator. Differential Revision: D59562029 |
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Differential Revision: D59562029
031cb49
to
ce0b3c9
Compare
This pull request was exported from Phabricator. Differential Revision: D59562029 |
ce0b3c9
to
ce96749
Compare
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029
This pull request was exported from Phabricator. Differential Revision: D59562029 |
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029
ce96749
to
fffb0aa
Compare
This pull request was exported from Phabricator. Differential Revision: D59562029 |
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029
fffb0aa
to
309245a
Compare
This pull request was exported from Phabricator. Differential Revision: D59562029 |
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029
309245a
to
3348e83
Compare
This pull request was exported from Phabricator. Differential Revision: D59562029 |
3348e83
to
a248e38
Compare
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029
This pull request was exported from Phabricator. Differential Revision: D59562029 |
a248e38
to
31b8087
Compare
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029
This pull request was exported from Phabricator. Differential Revision: D59562029 |
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029
31b8087
to
b81689e
Compare
This pull request was exported from Phabricator. Differential Revision: D59562029 |
b81689e
to
81d7e41
Compare
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029
This pull request was exported from Phabricator. Differential Revision: D59562029 |
81d7e41
to
573b3b0
Compare
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029
Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029
This pull request was exported from Phabricator. Differential Revision: D59562029 |
573b3b0
to
a044115
Compare
This pull request has been merged in 5bf8ce9. |
Summary: This diff adds the
rounding_mode
argument to triton quantize. We support (almost) all the rounding described in the best practices doc. Stochastic coming soon.Differential Revision: D59562029