Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update readme for fp8 #1979

Merged
merged 2 commits into from
Aug 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ pip install "neural-compressor>=2.3" "transformers>=4.34.0" torch torchvision
```
After successfully installing these packages, try your first quantization program.

### [FP8 Quantization](./examples/3.x_api/pytorch/cv/fp8_quant/)
### [FP8 Quantization](./docs/source/3x/PT_FP8Quant.md)
Following example code demonstrates FP8 Quantization, it is supported by Intel Gaudi2 AI Accelerator.

To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
Expand Down Expand Up @@ -147,7 +147,7 @@ Intel Neural Compressor will convert the model format from auto-gptq to hpu form
</tr>
<tr>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_WeightOnlyQuant.md">Weight-Only Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/3x/PT_FP8Quant.md">FP8 Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_FP8Quant.md">FP8 Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_MXQuant.md">MX Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_MixedPrecision.md">Mixed Precision</a></td>
</tr>
Expand Down
2 changes: 1 addition & 1 deletion docs/3x/PT_FP8Quant.md → docs/source/3x/PT_FP8Quant.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,6 @@ model = convert(model)
| Task | Example |
|----------------------|---------|
| Computer Vision (CV) | [Link](../../examples/3.x_api/pytorch/cv/fp8_quant/) |
| Large Language Model (LLM) | [Link](https://github.com/HabanaAI/optimum-habana-fork/tree/habana-main/examples/text-generation#running-with-fp8) |
| Large Language Model (LLM) | [Link](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation#running-with-fp8) |

> Note: For LLM, Optimum-habana provides higher performance based on modified modeling files, so here the Link of LLM goes to Optimum-habana, which utilize Intel Neural Compressor for FP8 quantization internally.
13 changes: 9 additions & 4 deletions docs/source/3x/PyTorch.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,16 +176,21 @@ def load(output_dir="./saved_results", model=None):
<td class="tg-9wq8"><a href="PT_SmoothQuant.md">link</a></td>
</tr>
<tr>
<td class="tg-9wq8" rowspan="2">Static Quantization</td>
<td class="tg-9wq8" rowspan="2"><a href=https://pytorch.org/docs/master/quantization.html#post-training-static-quantization>Post-traning Static Quantization</a></td>
<td class="tg-9wq8">intel-extension-for-pytorch</td>
<td class="tg-9wq8" rowspan="3">Static Quantization</td>
<td class="tg-9wq8" rowspan="3"><a href=https://pytorch.org/docs/master/quantization.html#post-training-static-quantization>Post-traning Static Quantization</a></td>
<td class="tg-9wq8">intel-extension-for-pytorch (INT8)</td>
<td class="tg-9wq8">&#10004</td>
<td class="tg-9wq8"><a href="PT_StaticQuant.md">link</a></td>
</tr>
<tr>
<td class="tg-9wq8"><a href=https://pytorch.org/docs/stable/torch.compiler_deepdive.html>TorchDynamo</a></td>
<td class="tg-9wq8"><a href=https://pytorch.org/docs/stable/torch.compiler_deepdive.html>TorchDynamo (INT8)</a></td>
<td class="tg-9wq8">&#10004</td>
<td class="tg-9wq8"><a href="PT_StaticQuant.md">link</a></td>
<tr>
<td class="tg-9wq8"><a href=https://docs.habana.ai/en/latest/index.html>Intel Gaudi AI accelerator (FP8)</a></td>
<td class="tg-9wq8">&#10004</td>
<td class="tg-9wq8"><a href="PT_FP8Quant.md">link</a></td>
</tr>
</tr>
<tr>
<td class="tg-9wq8">Dynamic Quantization</td>
Expand Down
Loading