Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for API 3.0 online doc #1940

Merged
merged 6 commits into from
Jul 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 26 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,21 +39,21 @@ pip install neural-compressor[pt]
# Install 2.X API + Framework extension API + TensorFlow dependency
pip install neural-compressor[tf]
```
> **Note**:
> **Note**:
> Further installation methods can be found under [Installation Guide](https://github.com/intel/neural-compressor/blob/master/docs/source/installation_guide.md). check out our [FAQ](https://github.com/intel/neural-compressor/blob/master/docs/source/faq.md) for more details.

## Getting Started

Setting up the environment:
Setting up the environment:
```bash
pip install "neural-compressor>=2.3" "transformers>=4.34.0" torch torchvision
```
After successfully installing these packages, try your first quantization program.

### Weight-Only Quantization (LLMs)
Following example code demonstrates Weight-Only Quantization on LLMs, it supports Intel CPU, Intel Gaudi2 AI Accelerator, Nvidia GPU, best device will be selected automatically.
Following example code demonstrates Weight-Only Quantization on LLMs, it supports Intel CPU, Intel Gaudi2 AI Accelerator, Nvidia GPU, best device will be selected automatically.

To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
```bash
# Run a container with an interactive shell
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.14.0/ubuntu22.04/habanalabs/pytorch-installer-2.1.1:latest
Expand Down Expand Up @@ -91,9 +91,9 @@ woq_conf = PostTrainingQuantConfig(
)
quantized_model = fit(model=float_model, conf=woq_conf, calib_dataloader=dataloader)
```
**Note:**
**Note:**

To try INT4 model inference, please directly use [Intel Extension for Transformers](https://github.com/intel/intel-extension-for-transformers), which leverages Intel Neural Compressor for model quantization.
To try INT4 model inference, please directly use [Intel Extension for Transformers](https://github.com/intel/intel-extension-for-transformers), which leverages Intel Neural Compressor for model quantization.

### Static Quantization (Non-LLMs)

Expand Down Expand Up @@ -121,10 +121,10 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
</thead>
<tbody>
<tr>
<td colspan="2" align="center"><a href="./docs/3x/design.md#architecture">Architecture</a></td>
<td colspan="2" align="center"><a href="./docs/3x/design.md#workflow">Workflow</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/design.md#architecture">Architecture</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/design.md#workflow">Workflow</a></td>
<td colspan="2" align="center"><a href="https://intel.github.io/neural-compressor/latest/docs/source/api-doc/apis.html">APIs</a></td>
<td colspan="1" align="center"><a href="./docs/3x/llm_recipes.md">LLMs Recipes</a></td>
<td colspan="1" align="center"><a href="./docs/source/3x/llm_recipes.md">LLMs Recipes</a></td>
<td colspan="1" align="center">Examples</td>
</tr>
</tbody>
Expand All @@ -135,15 +135,15 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
</thead>
<tbody>
<tr>
<td colspan="2" align="center"><a href="./docs/3x/PyTorch.md">Overview</a></td>
<td colspan="2" align="center"><a href="./docs/3x/PT_StaticQuant.md">Static Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/3x/PT_DynamicQuant.md">Dynamic Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/3x/PT_SmoothQuant.md">Smooth Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PyTorch.md">Overview</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_StaticQuant.md">Static Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_DynamicQuant.md">Dynamic Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_SmoothQuant.md">Smooth Quantization</a></td>
</tr>
<tr>
<td colspan="4" align="center"><a href="./docs/3x/PT_WeightOnlyQuant.md">Weight-Only Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/3x/PT_MXQuant.md">MX Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/3x/PT_MixedPrecision.md">Mixed Precision</a></td>
<td colspan="4" align="center"><a href="./docs/source/3x/PT_WeightOnlyQuant.md">Weight-Only Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_MXQuant.md">MX Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/PT_MixedPrecision.md">Mixed Precision</a></td>
</tr>
</tbody>
<thead>
Expand All @@ -153,9 +153,9 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
</thead>
<tbody>
<tr>
<td colspan="3" align="center"><a href="./docs/3x/TensorFlow.md">Overview</a></td>
<td colspan="3" align="center"><a href="./docs/3x/TF_Quant.md">Static Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/3x/TF_SQ.md">Smooth Quantization</a></td>
<td colspan="3" align="center"><a href="./docs/source/3x/TensorFlow.md">Overview</a></td>
<td colspan="3" align="center"><a href="./docs/source/3x/TF_Quant.md">Static Quantization</a></td>
<td colspan="2" align="center"><a href="./docs/source/3x/TF_SQ.md">Smooth Quantization</a></td>
</tr>
</tbody>
<thead>
Expand All @@ -165,24 +165,24 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
</thead>
<tbody>
<tr>
<td colspan="4" align="center"><a href="./docs/3x/autotune.md">Auto Tune</a></td>
<td colspan="4" align="center"><a href="./docs/3x/benchmark.md">Benchmark</a></td>
<td colspan="4" align="center"><a href="./docs/source/3x/autotune.md">Auto Tune</a></td>
<td colspan="4" align="center"><a href="./docs/source/3x/benchmark.md">Benchmark</a></td>
</tr>
</tbody>
</table>

> **Note**:
> **Note**:
> From 3.0 release, we recommend to use 3.X API. Compression techniques during training such as QAT, Pruning, Distillation only available in [2.X API](https://github.com/intel/neural-compressor/blob/master/docs/source/2x_user_guide.md) currently.

## Selected Publications/Events
* Blog by Intel: [Neural Compressor: Boosting AI Model Efficiency](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Neural-Compressor-Boosting-AI-Model-Efficiency/post/1604740) (June 2024)
* Blog by Intel: [Neural Compressor: Boosting AI Model Efficiency](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Neural-Compressor-Boosting-AI-Model-Efficiency/post/1604740) (June 2024)
* Blog by Intel: [Optimization of Intel AI Solutions for Alibaba Cloud’s Qwen2 Large Language Models](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-ai-solutions-accelerate-alibaba-qwen2-llms.html) (June 2024)
* Blog by Intel: [Accelerate Meta* Llama 3 with Intel AI Solutions](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-meta-llama3-with-intel-ai-solutions.html) (Apr 2024)
* EMNLP'2023 (Under Review): [TEQ: Trainable Equivalent Transformation for Quantization of LLMs](https://openreview.net/forum?id=iaI8xEINAf&referrer=%5BAuthor%20Console%5D) (Sep 2023)
* arXiv: [Efficient Post-training Quantization with FP8 Formats](https://arxiv.org/abs/2309.14592) (Sep 2023)
* arXiv: [Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs](https://arxiv.org/abs/2309.05516) (Sep 2023)

> **Note**:
> **Note**:
> View [Full Publication List](https://github.com/intel/neural-compressor/blob/master/docs/source/publication_list.md).

## Additional Content
Expand All @@ -192,8 +192,8 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
* [Legal Information](./docs/source/legal_information.md)
* [Security Policy](SECURITY.md)

## Communication
## Communication
- [GitHub Issues](https://github.com/intel/neural-compressor/issues): mainly for bug reports, new feature requests, question asking, etc.
- [Email](mailto:[email protected]): welcome to raise any interesting research ideas on model compression techniques by email for collaborations.
- [Email](mailto:[email protected]): welcome to raise any interesting research ideas on model compression techniques by email for collaborations.
- [Discord Channel](https://discord.com/invite/Wxk3J3ZJkU): join the discord channel for more flexible technical discussion.
- [WeChat group](/docs/source/imgs/wechat_group.jpg): scan the QA code to join the technical discussion.
88 changes: 0 additions & 88 deletions docs/3x/get_started.md

This file was deleted.

14 changes: 9 additions & 5 deletions docs/build_docs/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -84,17 +84,18 @@ cp -rf ../docs/ ./source
cp -f "../README.md" "./source/docs/source/Welcome.md"
cp -f "../SECURITY.md" "./source/docs/source/SECURITY.md"


all_md_files=`find ./source/docs -name "*.md"`
for md_file in ${all_md_files}
do
sed -i 's/.md/.html/g' ${md_file}
done


sed -i 's/.\/docs\/source\/_static/./g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
sed -i 's/.md/.html/g; s/.\/docs\/source\//.\//g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
sed -i 's/\/examples\/README.html/https:\/\/github.com\/intel\/neural-compressor\/blob\/master\/examples\/README.md/g' ./source/docs/source/user_guide.md
sed -i 's/https\:\/\/intel.github.io\/neural-compressor\/lates.\/api-doc\/apis.html/https\:\/\/intel.github.io\/neural-compressor\/latest\/docs\/source\/api-doc\/apis.html/g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
# sed -i 's/.\/docs\/source\/_static/./g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
#sed -i 's/.md/.html/g; s/.\/docs\/source\//.\//g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
#sed -i 's/\/examples\/README.html/https:\/\/github.com\/intel\/neural-compressor\/blob\/master\/examples\/README.md/g' ./source/docs/source/user_guide.md
#sed -i 's/https\:\/\/intel.github.io\/neural-compressor\/lates.\/api-doc\/apis.html/https\:\/\/intel.github.io\/neural-compressor\/latest\/docs\/source\/api-doc\/apis.html/g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md

sed -i 's/examples\/README.html/https:\/\/github.com\/intel\/neural-compressor\/blob\/master\/examples\/README.md/g' ./source/docs/source/Welcome.md

Expand Down Expand Up @@ -130,6 +131,8 @@ if [[ ${UPDATE_VERSION_FOLDER} -eq 1 ]]; then
cp -r ${SRC_FOLDER}/* ${DST_FOLDER}
python update_html.py ${DST_FOLDER} ${VERSION}
cp -r ./source/docs/source/imgs ${DST_FOLDER}/docs/source
cp -r ./source/docs/source/3x/imgs ${DST_FOLDER}/docs/source/3x


cp source/_static/index.html ${DST_FOLDER}
else
Expand All @@ -143,6 +146,7 @@ if [[ ${UPDATE_LATEST_FOLDER} -eq 1 ]]; then
cp -r ${SRC_FOLDER}/* ${LATEST_FOLDER}
python update_html.py ${LATEST_FOLDER} ${VERSION}
cp -r ./source/docs/source/imgs ${LATEST_FOLDER}/docs/source
cp -r ./source/docs/source/3x/imgs ${LATEST_FOLDER}/docs/source/3x
cp source/_static/index.html ${LATEST_FOLDER}
else
echo "skip to create ${LATEST_FOLDER}"
Expand All @@ -152,7 +156,7 @@ echo "Create document is done"

if [[ ${CHECKOUT_GH_PAGES} -eq 1 ]]; then
git clone -b gh-pages --single-branch https://github.com/intel/neural-compressor.git ${RELEASE_FOLDER}

if [[ ${UPDATE_VERSION_FOLDER} -eq 1 ]]; then
python update_version.py ${ROOT_DST_FOLDER} ${VERSION}
cp -rf ${DST_FOLDER} ${RELEASE_FOLDER}
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes.
29 changes: 29 additions & 0 deletions docs/source/api-doc/api_2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
2.0 API
####

**User facing APIs:**

.. toctree::
:maxdepth: 1

quantization.rst
mix_precision.rst
training.rst
benchmark.rst
config.rst
objective.rst


**Advanced APIs:**

.. toctree::
:maxdepth: 1

compression.rst
strategy.rst
model.rst

**API document example:**

.. toctree::
api_doc_example.rst
27 changes: 27 additions & 0 deletions docs/source/api-doc/api_3.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
3.0 API
####

**PyTorch Extension API:**

.. toctree::
:maxdepth: 1

torch_quantization_common.rst
torch_quantization_config.rst
torch_quantization_autotune.rst

**Tensorflow Extension API:**

.. toctree::
:maxdepth: 1

tf_quantization_common.rst
tf_quantization_config.rst
tf_quantization_autotune.rst

**Other Modules:**

.. toctree::
:maxdepth: 1

benchmark.rst
21 changes: 2 additions & 19 deletions docs/source/api-doc/apis.rst
Original file line number Diff line number Diff line change
@@ -1,29 +1,12 @@
APIs
####

**User facing APIs:**

.. toctree::
:maxdepth: 1

quantization.rst
mix_precision.rst
training.rst
benchmark.rst
config.rst
objective.rst


**Advanced APIs:**
api_3.rst

.. toctree::
:maxdepth: 1

compression.rst
strategy.rst
model.rst

**API document example:**

.. toctree::
api_doc_example.rst
api_2.rst
6 changes: 6 additions & 0 deletions docs/source/api-doc/tf_quantization_autotune.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Tensorflow Quantization AutoTune
============

.. autoapisummary::

neural_compressor.tensorflow.quantization.autotune
6 changes: 6 additions & 0 deletions docs/source/api-doc/tf_quantization_common.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Tensorflow Quantization Base API
#################################

.. autoapisummary::

neural_compressor.tensorflow.quantization.quantize
6 changes: 6 additions & 0 deletions docs/source/api-doc/tf_quantization_config.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Tensorflow Quantization Config
============

.. autoapisummary::

neural_compressor.tensorflow.quantization.config
6 changes: 6 additions & 0 deletions docs/source/api-doc/torch_quantization_autotune.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Pytorch Quantization AutoTune
============

.. autoapisummary::

neural_compressor.torch.quantization.autotune
6 changes: 6 additions & 0 deletions docs/source/api-doc/torch_quantization_common.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Pytorch Quantization Base API
#################################

.. autoapisummary::

neural_compressor.torch.quantization.quantize
6 changes: 6 additions & 0 deletions docs/source/api-doc/torch_quantization_config.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Pytorch Quantization Config
============

.. autoapisummary::

neural_compressor.torch.quantization.config
Loading
Loading