intel · chensuyue · Jul 23, 2024 · Jul 17, 2024 · Jul 18, 2024 · Jul 19, 2024
diff --git a/README.md b/README.md
@@ -39,21 +39,21 @@ pip install neural-compressor[pt]
 # Install 2.X API + Framework extension API + TensorFlow dependency
 pip install neural-compressor[tf]
 ```
-> **Note**: 
+> **Note**:
 > Further installation methods can be found under [Installation Guide](https://github.com/intel/neural-compressor/blob/master/docs/source/installation_guide.md). check out our [FAQ](https://github.com/intel/neural-compressor/blob/master/docs/source/faq.md) for more details.
 
 ## Getting Started
 
-Setting up the environment:  
+Setting up the environment:
 ```bash
 pip install "neural-compressor>=2.3" "transformers>=4.34.0" torch torchvision
 ```
 After successfully installing these packages, try your first quantization program.
 
 ### Weight-Only Quantization (LLMs)
-Following example code demonstrates Weight-Only Quantization on LLMs, it supports Intel CPU, Intel Gaudi2 AI Accelerator, Nvidia GPU, best device will be selected automatically. 
+Following example code demonstrates Weight-Only Quantization on LLMs, it supports Intel CPU, Intel Gaudi2 AI Accelerator, Nvidia GPU, best device will be selected automatically.
 
-To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built). 
+To try on Intel Gaudi2, docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
 ```bash
 # Run a container with an interactive shell
 docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.14.0/ubuntu22.04/habanalabs/pytorch-installer-2.1.1:latest
@@ -91,9 +91,9 @@ woq_conf = PostTrainingQuantConfig(
 )
 quantized_model = fit(model=float_model, conf=woq_conf, calib_dataloader=dataloader)
 ```
-**Note:** 
+**Note:**
 
-To try INT4 model inference, please directly use [Intel Extension for Transformers](https://github.com/intel/intel-extension-for-transformers), which leverages Intel Neural Compressor for model quantization.        
+To try INT4 model inference, please directly use [Intel Extension for Transformers](https://github.com/intel/intel-extension-for-transformers), which leverages Intel Neural Compressor for model quantization.
 
 ### Static Quantization (Non-LLMs)
 
@@ -121,10 +121,10 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
   </thead>
   <tbody>
     <tr>
-      <td colspan="2" align="center"><a href="./docs/3x/design.md#architecture">Architecture</a></td>
-      <td colspan="2" align="center"><a href="./docs/3x/design.md#workflow">Workflow</a></td>
+      <td colspan="2" align="center"><a href="./docs/source/3x/design.md#architecture">Architecture</a></td>
+      <td colspan="2" align="center"><a href="./docs/source/3x/design.md#workflow">Workflow</a></td>
       <td colspan="2" align="center"><a href="https://intel.github.io/neural-compressor/latest/docs/source/api-doc/apis.html">APIs</a></td>
-      <td colspan="1" align="center"><a href="./docs/3x/llm_recipes.md">LLMs Recipes</a></td>
+      <td colspan="1" align="center"><a href="./docs/source/3x/llm_recipes.md">LLMs Recipes</a></td>
       <td colspan="1" align="center">Examples</td>
     </tr>
   </tbody>
@@ -135,15 +135,15 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
   </thead>
   <tbody>
     <tr>
-        <td colspan="2" align="center"><a href="./docs/3x/PyTorch.md">Overview</a></td>
-        <td colspan="2" align="center"><a href="./docs/3x/PT_StaticQuant.md">Static Quantization</a></td>
-        <td colspan="2" align="center"><a href="./docs/3x/PT_DynamicQuant.md">Dynamic Quantization</a></td>
-        <td colspan="2" align="center"><a href="./docs/3x/PT_SmoothQuant.md">Smooth Quantization</a></td>
+        <td colspan="2" align="center"><a href="./docs/source/3x/PyTorch.md">Overview</a></td>
+        <td colspan="2" align="center"><a href="./docs/source/3x/PT_StaticQuant.md">Static Quantization</a></td>
+        <td colspan="2" align="center"><a href="./docs/source/3x/PT_DynamicQuant.md">Dynamic Quantization</a></td>
+        <td colspan="2" align="center"><a href="./docs/source/3x/PT_SmoothQuant.md">Smooth Quantization</a></td>
     </tr>
     <tr>
-        <td colspan="4" align="center"><a href="./docs/3x/PT_WeightOnlyQuant.md">Weight-Only Quantization</a></td>
-        <td colspan="2" align="center"><a href="./docs/3x/PT_MXQuant.md">MX Quantization</a></td>
-        <td colspan="2" align="center"><a href="./docs/3x/PT_MixedPrecision.md">Mixed Precision</a></td>
+        <td colspan="4" align="center"><a href="./docs/source/3x/PT_WeightOnlyQuant.md">Weight-Only Quantization</a></td>
+        <td colspan="2" align="center"><a href="./docs/source/3x/PT_MXQuant.md">MX Quantization</a></td>
+        <td colspan="2" align="center"><a href="./docs/source/3x/PT_MixedPrecision.md">Mixed Precision</a></td>
     </tr>
   </tbody>
   <thead>
@@ -153,9 +153,9 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
   </thead>
   <tbody>
       <tr>
-          <td colspan="3" align="center"><a href="./docs/3x/TensorFlow.md">Overview</a></td>
-          <td colspan="3" align="center"><a href="./docs/3x/TF_Quant.md">Static Quantization</a></td>
-          <td colspan="2" align="center"><a href="./docs/3x/TF_SQ.md">Smooth Quantization</a></td>
+          <td colspan="3" align="center"><a href="./docs/source/3x/TensorFlow.md">Overview</a></td>
+          <td colspan="3" align="center"><a href="./docs/source/3x/TF_Quant.md">Static Quantization</a></td>
+          <td colspan="2" align="center"><a href="./docs/source/3x/TF_SQ.md">Smooth Quantization</a></td>
       </tr>
   </tbody>
   <thead>
@@ -165,24 +165,24 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
   </thead>
   <tbody>
       <tr>
-          <td colspan="4" align="center"><a href="./docs/3x/autotune.md">Auto Tune</a></td>
-          <td colspan="4" align="center"><a href="./docs/3x/benchmark.md">Benchmark</a></td>
+          <td colspan="4" align="center"><a href="./docs/source/3x/autotune.md">Auto Tune</a></td>
+          <td colspan="4" align="center"><a href="./docs/source/3x/benchmark.md">Benchmark</a></td>
       </tr>
   </tbody>
 </table>
 
-> **Note**:   
+> **Note**:
 > From 3.0 release, we recommend to use 3.X API. Compression techniques during training such as QAT, Pruning, Distillation only available in [2.X API](https://github.com/intel/neural-compressor/blob/master/docs/source/2x_user_guide.md) currently.
 
 ## Selected Publications/Events
-* Blog by Intel: [Neural Compressor: Boosting AI Model Efficiency](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Neural-Compressor-Boosting-AI-Model-Efficiency/post/1604740) (June 2024) 
+* Blog by Intel: [Neural Compressor: Boosting AI Model Efficiency](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Neural-Compressor-Boosting-AI-Model-Efficiency/post/1604740) (June 2024)
 * Blog by Intel: [Optimization of Intel AI Solutions for Alibaba Cloud’s Qwen2 Large Language Models](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-ai-solutions-accelerate-alibaba-qwen2-llms.html) (June 2024)
 * Blog by Intel: [Accelerate Meta* Llama 3 with Intel AI Solutions](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-meta-llama3-with-intel-ai-solutions.html) (Apr 2024)
 * EMNLP'2023 (Under Review): [TEQ: Trainable Equivalent Transformation for Quantization of LLMs](https://openreview.net/forum?id=iaI8xEINAf&referrer=%5BAuthor%20Console%5D) (Sep 2023)
 * arXiv: [Efficient Post-training Quantization with FP8 Formats](https://arxiv.org/abs/2309.14592) (Sep 2023)
 * arXiv: [Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs](https://arxiv.org/abs/2309.05516) (Sep 2023)
 
-> **Note**: 
+> **Note**:
 > View [Full Publication List](https://github.com/intel/neural-compressor/blob/master/docs/source/publication_list.md).
 
 ## Additional Content
@@ -192,8 +192,8 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
 * [Legal Information](./docs/source/legal_information.md)
 * [Security Policy](SECURITY.md)
 
-## Communication 
+## Communication
 - [GitHub Issues](https://github.com/intel/neural-compressor/issues): mainly for bug reports, new feature requests, question asking, etc.
-- [Email](mailto:[email protected]): welcome to raise any interesting research ideas on model compression techniques by email for collaborations.  
+- [Email](mailto:[email protected]): welcome to raise any interesting research ideas on model compression techniques by email for collaborations.
 - [Discord Channel](https://discord.com/invite/Wxk3J3ZJkU): join the discord channel for more flexible technical discussion.
 - [WeChat group](/docs/source/imgs/wechat_group.jpg): scan the QA code to join the technical discussion.
diff --git a/docs/3x/get_started.md b/docs/3x/get_started.md
diff --git a/docs/build_docs/build.sh b/docs/build_docs/build.sh
@@ -84,17 +84,18 @@ cp -rf ../docs/ ./source
 cp -f "../README.md" "./source/docs/source/Welcome.md"
 cp -f "../SECURITY.md" "./source/docs/source/SECURITY.md"
 
+
 all_md_files=`find ./source/docs -name "*.md"`
 for md_file in ${all_md_files}
 do
   sed -i 's/.md/.html/g' ${md_file}
 done
 
 
-sed -i 's/.\/docs\/source\/_static/./g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
-sed -i 's/.md/.html/g; s/.\/docs\/source\//.\//g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
-sed -i 's/\/examples\/README.html/https:\/\/github.com\/intel\/neural-compressor\/blob\/master\/examples\/README.md/g' ./source/docs/source/user_guide.md
-sed -i 's/https\:\/\/intel.github.io\/neural-compressor\/lates.\/api-doc\/apis.html/https\:\/\/intel.github.io\/neural-compressor\/latest\/docs\/source\/api-doc\/apis.html/g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
+# sed -i 's/.\/docs\/source\/_static/./g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
+#sed -i 's/.md/.html/g; s/.\/docs\/source\//.\//g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
+#sed -i 's/\/examples\/README.html/https:\/\/github.com\/intel\/neural-compressor\/blob\/master\/examples\/README.md/g' ./source/docs/source/user_guide.md
+#sed -i 's/https\:\/\/intel.github.io\/neural-compressor\/lates.\/api-doc\/apis.html/https\:\/\/intel.github.io\/neural-compressor\/latest\/docs\/source\/api-doc\/apis.html/g' ./source/docs/source/Welcome.md ./source/docs/source/user_guide.md
 
 sed -i 's/examples\/README.html/https:\/\/github.com\/intel\/neural-compressor\/blob\/master\/examples\/README.md/g' ./source/docs/source/Welcome.md
 
@@ -130,6 +131,8 @@ if [[ ${UPDATE_VERSION_FOLDER} -eq 1 ]]; then
   cp -r ${SRC_FOLDER}/* ${DST_FOLDER}
   python update_html.py ${DST_FOLDER} ${VERSION}
   cp -r ./source/docs/source/imgs ${DST_FOLDER}/docs/source
+  cp -r ./source/docs/source/3x/imgs ${DST_FOLDER}/docs/source/3x
+
 
   cp source/_static/index.html ${DST_FOLDER}
 else
@@ -143,6 +146,7 @@ if [[ ${UPDATE_LATEST_FOLDER} -eq 1 ]]; then
   cp -r ${SRC_FOLDER}/* ${LATEST_FOLDER}
   python update_html.py ${LATEST_FOLDER} ${VERSION}
   cp -r ./source/docs/source/imgs ${LATEST_FOLDER}/docs/source
+  cp -r ./source/docs/source/3x/imgs ${LATEST_FOLDER}/docs/source/3x
   cp source/_static/index.html ${LATEST_FOLDER}
 else
   echo "skip to create ${LATEST_FOLDER}"
@@ -152,7 +156,7 @@ echo "Create document is done"
 
 if [[ ${CHECKOUT_GH_PAGES} -eq 1 ]]; then
   git clone -b gh-pages --single-branch https://github.com/intel/neural-compressor.git ${RELEASE_FOLDER}
- 
+
   if [[ ${UPDATE_VERSION_FOLDER} -eq 1 ]]; then
     python update_version.py ${ROOT_DST_FOLDER} ${VERSION}
     cp -rf ${DST_FOLDER} ${RELEASE_FOLDER}

diff --git a/docs/3x/PT_DynamicQuant.md → docs/source/3x/PT_DynamicQuant.md b/docs/3x/PT_DynamicQuant.md → docs/source/3x/PT_DynamicQuant.md
diff --git a/docs/3x/PT_MXQuant.md → docs/source/3x/PT_MXQuant.md b/docs/3x/PT_MXQuant.md → docs/source/3x/PT_MXQuant.md
diff --git a/docs/3x/PT_MixedPrecision.md → docs/source/3x/PT_MixedPrecision.md b/docs/3x/PT_MixedPrecision.md → docs/source/3x/PT_MixedPrecision.md
diff --git a/docs/3x/PT_SmoothQuant.md → docs/source/3x/PT_SmoothQuant.md b/docs/3x/PT_SmoothQuant.md → docs/source/3x/PT_SmoothQuant.md
diff --git a/docs/3x/PT_StaticQuant.md → docs/source/3x/PT_StaticQuant.md b/docs/3x/PT_StaticQuant.md → docs/source/3x/PT_StaticQuant.md
diff --git a/docs/3x/PT_WeightOnlyQuant.md → docs/source/3x/PT_WeightOnlyQuant.md b/docs/3x/PT_WeightOnlyQuant.md → docs/source/3x/PT_WeightOnlyQuant.md
diff --git a/docs/3x/PyTorch.md → docs/source/3x/PyTorch.md b/docs/3x/PyTorch.md → docs/source/3x/PyTorch.md
diff --git a/docs/3x/TF_Quant.md → docs/source/3x/TF_Quant.md b/docs/3x/TF_Quant.md → docs/source/3x/TF_Quant.md
diff --git a/docs/3x/TF_SQ.md → docs/source/3x/TF_SQ.md b/docs/3x/TF_SQ.md → docs/source/3x/TF_SQ.md
diff --git a/docs/3x/TensorFlow.md → docs/source/3x/TensorFlow.md b/docs/3x/TensorFlow.md → docs/source/3x/TensorFlow.md
diff --git a/docs/3x/autotune.md → docs/source/3x/autotune.md b/docs/3x/autotune.md → docs/source/3x/autotune.md
diff --git a/docs/3x/benchmark.md → docs/source/3x/benchmark.md b/docs/3x/benchmark.md → docs/source/3x/benchmark.md
diff --git a/docs/3x/design.md → docs/source/3x/design.md b/docs/3x/design.md → docs/source/3x/design.md
diff --git a/docs/3x/imgs/architecture.png → docs/source/3x/imgs/architecture.png b/docs/3x/imgs/architecture.png → docs/source/3x/imgs/architecture.png
diff --git a/docs/3x/imgs/data_format.png → docs/source/3x/imgs/data_format.png b/docs/3x/imgs/data_format.png → docs/source/3x/imgs/data_format.png
diff --git a/docs/3x/imgs/mx_workflow.png → docs/source/3x/imgs/mx_workflow.png b/docs/3x/imgs/mx_workflow.png → docs/source/3x/imgs/mx_workflow.png
diff --git a/docs/3x/imgs/smoothquant.png → docs/source/3x/imgs/smoothquant.png b/docs/3x/imgs/smoothquant.png → docs/source/3x/imgs/smoothquant.png
diff --git a/docs/3x/imgs/sq_convert.png → docs/source/3x/imgs/sq_convert.png b/docs/3x/imgs/sq_convert.png → docs/source/3x/imgs/sq_convert.png
diff --git a/docs/3x/imgs/sq_pc.png → docs/source/3x/imgs/sq_pc.png b/docs/3x/imgs/sq_pc.png → docs/source/3x/imgs/sq_pc.png
diff --git a/docs/3x/imgs/workflow.png → docs/source/3x/imgs/workflow.png b/docs/3x/imgs/workflow.png → docs/source/3x/imgs/workflow.png
diff --git a/docs/3x/llm_recipes.md → docs/source/3x/llm_recipes.md b/docs/3x/llm_recipes.md → docs/source/3x/llm_recipes.md
diff --git a/docs/3x/quantization.md → docs/source/3x/quantization.md b/docs/3x/quantization.md → docs/source/3x/quantization.md
diff --git a/docs/source/api-doc/api_2.rst b/docs/source/api-doc/api_2.rst
@@ -0,0 +1,29 @@
+2.0 API
+####
+
+**User facing APIs:**
+
+.. toctree::
+   :maxdepth: 1
+
+   quantization.rst
+   mix_precision.rst
+   training.rst
+   benchmark.rst
+   config.rst
+   objective.rst
+
+
+**Advanced APIs:**
+
+.. toctree::
+   :maxdepth: 1
+
+   compression.rst
+   strategy.rst
+   model.rst
+
+**API document example:**
+
+.. toctree::
+  api_doc_example.rst
diff --git a/docs/source/api-doc/api_3.rst b/docs/source/api-doc/api_3.rst
@@ -0,0 +1,27 @@
+3.0 API
+####
+
+**PyTorch Extension API:**
+
+.. toctree::
+   :maxdepth: 1
+
+   torch_quantization_common.rst
+   torch_quantization_config.rst
+   torch_quantization_autotune.rst
+
+**Tensorflow Extension API:**
+
+.. toctree::
+   :maxdepth: 1
+
+   tf_quantization_common.rst
+   tf_quantization_config.rst
+   tf_quantization_autotune.rst
+
+**Other Modules:**
+
+.. toctree::
+   :maxdepth: 1
+
+   benchmark.rst
diff --git a/docs/source/api-doc/apis.rst b/docs/source/api-doc/apis.rst
@@ -1,29 +1,12 @@
 APIs
 ####
 
-**User facing APIs:**
-
 .. toctree::
    :maxdepth: 1
 
-   quantization.rst
-   mix_precision.rst
-   training.rst
-   benchmark.rst
-   config.rst
-   objective.rst
-
-
-**Advanced APIs:**
+   api_3.rst
 
 .. toctree::
    :maxdepth: 1
 
-   compression.rst
-   strategy.rst
-   model.rst
-
-**API document example:**
-
-.. toctree::
-  api_doc_example.rst
+   api_2.rst
diff --git a/docs/source/api-doc/tf_quantization_autotune.rst b/docs/source/api-doc/tf_quantization_autotune.rst
@@ -0,0 +1,6 @@
+Tensorflow Quantization AutoTune
+============
+
+.. autoapisummary::
+
+   neural_compressor.tensorflow.quantization.autotune
diff --git a/docs/source/api-doc/tf_quantization_common.rst b/docs/source/api-doc/tf_quantization_common.rst
@@ -0,0 +1,6 @@
+Tensorflow Quantization Base API
+#################################
+
+.. autoapisummary::
+
+   neural_compressor.tensorflow.quantization.quantize
diff --git a/docs/source/api-doc/tf_quantization_config.rst b/docs/source/api-doc/tf_quantization_config.rst
@@ -0,0 +1,6 @@
+Tensorflow Quantization Config
+============
+
+.. autoapisummary::
+
+   neural_compressor.tensorflow.quantization.config
diff --git a/docs/source/api-doc/torch_quantization_autotune.rst b/docs/source/api-doc/torch_quantization_autotune.rst
@@ -0,0 +1,6 @@
+Pytorch Quantization AutoTune
+============
+
+.. autoapisummary::
+
+   neural_compressor.torch.quantization.autotune
diff --git a/docs/source/api-doc/torch_quantization_common.rst b/docs/source/api-doc/torch_quantization_common.rst
@@ -0,0 +1,6 @@
+Pytorch Quantization Base API
+#################################
+
+.. autoapisummary::
+
+   neural_compressor.torch.quantization.quantize
diff --git a/docs/source/api-doc/torch_quantization_config.rst b/docs/source/api-doc/torch_quantization_config.rst
@@ -0,0 +1,6 @@
+Pytorch Quantization Config
+============
+
+.. autoapisummary::
+
+   neural_compressor.torch.quantization.config