Triton MX4 Quantize Rounding Mode Support. #2821

jwfromm · 2024-07-10T22:53:34Z

Summary: This diff adds the rounding_mode argument to triton quantize. We support (almost) all the rounding described in the best practices doc. Stochastic coming soon.

Differential Revision: D59562029

facebook-github-bot · 2024-07-10T22:53:54Z

This pull request was exported from Phabricator. Differential Revision: D59562029

netlify · 2024-07-10T22:53:59Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`a044115`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/66944289a1fb13000870a244
😎 Deploy Preview	https://deploy-preview-2821--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

facebook-github-bot · 2024-07-10T23:02:42Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support (almost) all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Stochastic coming soon. Differential Revision: D59562029

facebook-github-bot · 2024-07-10T23:07:10Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support (almost) all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Stochastic coming soon. Differential Revision: D59562029

facebook-github-bot · 2024-07-12T19:47:40Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Differential Revision: D59562029

facebook-github-bot · 2024-07-12T19:59:10Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2836 Rather than try to reshape inputs to 2D matrices with each thread operating on one row, this refactor uses 1D inputs and has each thread operate on an offset of the array. The main benefit of this is that it avoid ragged tensors where we cant divide an input into even sized rows. This should enable us to be compatible with more shapes. Differential Revision: D59653809 Reviewed By: sryap

Summary: We previously had to use python to unravel values from exponents and feed them to triton as two separate tensors. This introduced a lot of overhead as it introduced large copies. This diff does a bunch of fancy indexing to directly operate on a tensor with mixed elements and exponents. The result is that triton dequantize is now slightly faster than the cuda kernel. My hope is that this allows us to standardize on a single implementation. Differential Revision: D59661776

Summary: We apply a similar technique as we did to dequantize in D59661776 to MX4 quantization. Specifically we do fancy indexing to be able to write both exponents and values to the same output tensor within the triton kernel. This allows us to only allocate a single output and do no extra copies, giving a sizeable 40% performance boost. Before this change: ``` INFO:root:input size: 1073741824 group size: 32 INFO:root:Start to benchmark ... INFO:root:Start to benchmark ... input_size=1073741824 MX4 quantized time per iter: 7563us input_size=1073741824 MX4 dequantized time per iter: 2756us INFO:root:Start to benchmark ... INFO:root:Start to benchmark ... input_size=1073741824 MX4 triton quantized time per iter: 5110us input_size=1073741824 MX4 triton dequantized time per iter: 2417us INFO:root:Start to benchmark ... INFO:root:Start to benchmark ... input_size=1073741824 FP8 quantized time per iter: 6274us input_size=1073741824 FP8 dequantized time per iter: 4223us ``` After this change: ``` INFO:root:input size: 1073741824 group size: 32 INFO:root:Start to benchmark ... INFO:root:Start to benchmark ... input_size=1073741824 MX4 quantized time per iter: 7560us input_size=1073741824 MX4 dequantized time per iter: 2758us INFO:root:Start to benchmark ... INFO:root:Start to benchmark ... input_size=1073741824 MX4 triton quantized time per iter: 3138us input_size=1073741824 MX4 triton dequantized time per iter: 2418us INFO:root:Start to benchmark ... INFO:root:Start to benchmark ... input_size=1073741824 FP8 quantized time per iter: 6274us input_size=1073741824 FP8 dequantized time per iter: 4226us ``` Differential Revision: D59688150

facebook-github-bot · 2024-07-12T20:09:05Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Differential Revision: D59562029

facebook-github-bot · 2024-07-12T20:13:06Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029

facebook-github-bot · 2024-07-12T20:17:46Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029

facebook-github-bot · 2024-07-12T20:21:26Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029

facebook-github-bot · 2024-07-12T20:24:49Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029

facebook-github-bot · 2024-07-12T20:29:46Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029

facebook-github-bot · 2024-07-12T23:23:23Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029

facebook-github-bot · 2024-07-12T23:32:54Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029

facebook-github-bot · 2024-07-12T23:38:25Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029

facebook-github-bot · 2024-07-12T23:47:42Z

This pull request was exported from Phabricator. Differential Revision: D59562029

Summary: Pull Request resolved: pytorch#2821 X-link: facebookresearch/FBGEMM#22 This diff adds the `rounding_mode` argument to triton quantize. We support all the rounding described in the [best practices doc](https://docs.google.com/document/d/156Du0hBRH6umG_i-OrYC574XhpQMUU5SJYG0RTS2tTg/edit#heading=h.akfcp7xpg8cr). Reviewed By: summerdengfb Differential Revision: D59562029

facebook-github-bot · 2024-07-14T21:26:24Z

This pull request was exported from Phabricator. Differential Revision: D59562029

facebook-github-bot · 2024-07-15T01:37:02Z

This pull request has been merged in 5bf8ce9.

facebook-github-bot added the cla signed label Jul 10, 2024

facebook-github-bot added the fb-exported label Jul 10, 2024

jwfromm force-pushed the export-D59562029 branch from a3684eb to 1f00502 Compare July 10, 2024 23:02

jwfromm force-pushed the export-D59562029 branch from 1f00502 to 600bb2f Compare July 10, 2024 23:07

jwfromm force-pushed the export-D59562029 branch from 600bb2f to 1a56f8a Compare July 12, 2024 19:47

jwfromm force-pushed the export-D59562029 branch from 1a56f8a to 031cb49 Compare July 12, 2024 19:59

Josh Fromm added 3 commits July 12, 2024 13:04

jwfromm force-pushed the export-D59562029 branch from 031cb49 to ce0b3c9 Compare July 12, 2024 20:09

jwfromm force-pushed the export-D59562029 branch from ce0b3c9 to ce96749 Compare July 12, 2024 20:13

jwfromm force-pushed the export-D59562029 branch from ce96749 to fffb0aa Compare July 12, 2024 20:17

jwfromm force-pushed the export-D59562029 branch from fffb0aa to 309245a Compare July 12, 2024 20:21

jwfromm force-pushed the export-D59562029 branch from 309245a to 3348e83 Compare July 12, 2024 20:24

jwfromm force-pushed the export-D59562029 branch from 3348e83 to a248e38 Compare July 12, 2024 20:29

jwfromm force-pushed the export-D59562029 branch from a248e38 to 31b8087 Compare July 12, 2024 23:23

jwfromm force-pushed the export-D59562029 branch from 31b8087 to b81689e Compare July 12, 2024 23:33

jwfromm force-pushed the export-D59562029 branch from b81689e to 81d7e41 Compare July 12, 2024 23:38

jwfromm force-pushed the export-D59562029 branch from 81d7e41 to 573b3b0 Compare July 12, 2024 23:47

jwfromm force-pushed the export-D59562029 branch from 573b3b0 to a044115 Compare July 14, 2024 21:26

facebook-github-bot closed this in 5bf8ce9 Jul 15, 2024

facebook-github-bot added the Merged label Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton MX4 Quantize Rounding Mode Support. #2821

Triton MX4 Quantize Rounding Mode Support. #2821

jwfromm commented Jul 10, 2024

facebook-github-bot commented Jul 10, 2024

netlify bot commented Jul 10, 2024 •

edited

Loading

facebook-github-bot commented Jul 10, 2024

facebook-github-bot commented Jul 10, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 14, 2024

facebook-github-bot commented Jul 15, 2024

Triton MX4 Quantize Rounding Mode Support. #2821

Triton MX4 Quantize Rounding Mode Support. #2821

Conversation

jwfromm commented Jul 10, 2024

facebook-github-bot commented Jul 10, 2024

netlify bot commented Jul 10, 2024 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented Jul 10, 2024

facebook-github-bot commented Jul 10, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 12, 2024

facebook-github-bot commented Jul 14, 2024

facebook-github-bot commented Jul 15, 2024

netlify bot commented Jul 10, 2024 •

edited

Loading