Feat: Implementation of the DeepSeek blockwise quantization for fp8 tensors #1763

Degnel · 2025-02-22T13:26:25Z

This PR is the first step towards addressing issue #1594. It includes the following implementations:

fp8 triton gemm for blockwise quantisation
quant, dequant and linear utilities
time & precision benchmarks
basic tests

If the code is validated, it would be great to bench it on H100.

- fp8 triton gemm - quant, dequant and linear utils - time & precision benchmarks - basic tests

…_ker

pytorch-bot · 2025-02-22T13:26:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1763

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

danielvegamyhre · 2025-02-23T03:29:40Z

Thanks for your work on this! I'll take a closer look next week.

cc @vkuzo @drisspg

…:Degnel/ao into feat/blockwise_fp8_quant_triton_gemm_ker

Degnel · 2025-02-25T15:48:46Z

Thanks for running the tests. I have two questions regarding the errors:

Where should I add Triton to allow the tests to run successfully without introducing unnecessary dependencies in dev-requirements.txt?
Does torchao provide any utility to check the available FP8 types for each gpu architecture?

danielvegamyhre · 2025-02-27T17:30:46Z

Thanks for running the tests. I have two questions regarding the errors:

Where should I add Triton to allow the tests to run successfully without introducing unnecessary dependencies in dev-requirements.txt?

Can you clarify what you mean? Are tests failing in CI due to a missing triton installation? That shouldn't be happening, please share the link/logs if so.

Does torchao provide any utility to check the available FP8 types for each gpu architecture?

We just use helpers which skip tests if GPU architecture is not at least SM 89:

ao/torchao/utils.py

Line 619 in f478692

def is_sm_at_least_89():

You can find examples in the float8 tests (example).

Degnel and others added 2 commits February 22, 2025 14:13

Feat: Integration of DeepSeek's blockwise quantization

518873f

- fp8 triton gemm - quant, dequant and linear utils - time & precision benchmarks - basic tests

Merge branch 'pytorch:main' into feat/blockwise_fp8_quant_triton_gemm…

75985ec

…_ker

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 22, 2025

Degnel mentioned this pull request Feb 22, 2025

Feat/blockwise fp8 quant #1668

Open

Degnel added 2 commits February 23, 2025 16:29

Doc: init + linting + readme

0caf9de

Merge branch 'feat/blockwise_fp8_quant_triton_gemm_ker' of github.com…

350bddf

…:Degnel/ao into feat/blockwise_fp8_quant_triton_gemm_ker

Feat: adding triton dependency, adaptative testing dtype

0aad4c3

danielvegamyhre self-assigned this Feb 27, 2025

danielvegamyhre self-requested a review February 27, 2025 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Implementation of the DeepSeek blockwise quantization for fp8 tensors #1763

Feat: Implementation of the DeepSeek blockwise quantization for fp8 tensors #1763

Degnel commented Feb 22, 2025

pytorch-bot bot commented Feb 22, 2025 •

edited

Loading

danielvegamyhre commented Feb 23, 2025

Degnel commented Feb 25, 2025

danielvegamyhre commented Feb 27, 2025 •

edited

Loading

Feat: Implementation of the DeepSeek blockwise quantization for fp8 tensors #1763

Are you sure you want to change the base?

Feat: Implementation of the DeepSeek blockwise quantization for fp8 tensors #1763

Conversation

Degnel commented Feb 22, 2025

pytorch-bot bot commented Feb 22, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1763

danielvegamyhre commented Feb 23, 2025

Degnel commented Feb 25, 2025

danielvegamyhre commented Feb 27, 2025 • edited Loading

pytorch-bot bot commented Feb 22, 2025 •

edited

Loading

danielvegamyhre commented Feb 27, 2025 •

edited

Loading