add Hyper-parameter Optimization algorithm (#786)

* add hpo * add ut&example, update code * fix Signed-off-by: Guo, Heng <[email protected]> * add requirement Signed-off-by: Guo, Heng <[email protected]> * pylint * add readme * requir * modify readme Signed-off-by: Guo, Heng <[email protected]> * modify api * logger * modify example and ut * add hpo api&config Signed-off-by: Guo, Heng <[email protected]> * fix * modify api Signed-off-by: Guo, Heng <[email protected]> * modify readme Signed-off-by: Guo, Heng <[email protected]> * spell Signed-off-by: Guo, Heng <[email protected]> * sync readme Signed-off-by: Guo, Heng <[email protected]> * modify readme Signed-off-by: Guo, Heng <[email protected]> --------- Signed-off-by: Guo, Heng <[email protected]>
intel · Jul 27, 2023 · 6613cfa · 6613cfa
1 parent 888b3bf
commit 6613cfa
Show file tree

Hide file tree

Showing 14 changed files with 1,525 additions and 6 deletions.
diff --git a/.azure-pipelines/scripts/codeScan/pylint/pylint.sh b/.azure-pipelines/scripts/codeScan/pylint/pylint.sh
@@ -31,7 +31,8 @@ pip install torch==1.12.0 \
             onnxruntime_extensions \
             tf_slim \
             transformers \
-            flask==2.1.3
+            flask==2.1.3 \
+            xgboost
 
 if [ "${scan_module}" = "neural_solution" ]; then
     cd /neural-compressor

diff --git a/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt b/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt
@@ -2698,3 +2698,7 @@ Vanhoucke
 ONNXCommunityMeetup
 luYBWA
 pQ
+xgb
+xgboost
+hpo
+HPO
diff --git a/docs/source/pruning.md b/docs/source/pruning.md
@@ -52,8 +52,9 @@ Pruning
 
 4. [Sparse Model Deployment](#sparse-model-deployment)
 
+5. [Pruning With HPO](#pruning-with-hyperparameter-optimization)
 
-5. [Reference](#reference)
+6. [Reference](#reference)
 
 
 ## Introduction
@@ -104,7 +105,7 @@ Pruning patterns defines the rules of pruned weights' arrangements in space. Int
 </div>
 
 
-- Multi-head Attention Pruning (Work in progress)
+- Multi-head Attention Pruning
 
   Multi-head attention mechanism boosts transformer models' capability of contextual information analysis. However, different heads' contribution to the final output varies. In most situation, a number of heads can be removed without causing accuracy drop. Head pruning can be applied in a wide range of scenes including BERT, GPT as well as other large language models. **We haven't support it in pruning, but we have provided experimental feature in Model Auto Slim**. Please refer to [multi-head attention auto slim examples](https://github.com/intel/neural-compressor/blob/master/examples/pytorch/nlp/huggingface_models/question-answering/model_slim)
 
@@ -386,6 +387,10 @@ Please refer to [pruning examples](../../examples/README.md#Pruning-1) for more
 
 Particular hardware/software like [Intel Extension for Transformer](https://github.com/intel/intel-extension-for-transformers) are required to obtain inference speed and footprints' optimization for most sparse models. However, using [model slim](#click) for some special structures can obtain significant inference speed improvements and footprint reduction without the post-pruning deployment. In other words, you can achieve model acceleration directly under your training framework (PyTorch, etc.)
 
+## Pruning with Hyperparameter Optimization
+Intel® Neural Compressor currently support grid search, random, bayesian optimization and xgboost search algorithms for pruning with HPO. 
+For more details, please refer to [HPO document](../../neural_compressor/compression/hpo/README.md)
+
 ## Reference
 
 [1] Namhoon Lee, Thalaiyasingam Ajanthan, and Philip Torr. SNIP: Single-shot network pruning based on connection sensitivity. In International Conference on Learning Representations, 2019.

diff --git a/examples/pytorch/nlp/huggingface_models/text-classification/pruning/hpo/README.md b/examples/pytorch/nlp/huggingface_models/text-classification/pruning/hpo/README.md
@@ -0,0 +1,29 @@
+Step-by-Step
+============
+
+This document presents step-by-step instructions for pruning Huggingface models with HPO feature using the Intel® Neural Compressor.
+
+# Prerequisite
+## 1. Environment
+Python 3.6 or higher version is recommended.
+The dependent packages are listed in `requirements.txt`, please install them as follows,
+```shell
+cd examples/pytorch/nlp/huggingface_models/text-classification/pruning/hpo/
+pip install -r requirements.txt
+```
+## 2. Prepare Dataset
+
+The dataset will be downloaded automatically from the datasets Hub.
+See more about loading [huggingface dataset](https://huggingface.co/docs/datasets/loading_datasets.html)
+
+# Run
+To get tuned model and its accuracy: 
+```shell
+python run_glue_no_trainer.py \
+        --model_name_or_path M-FAC/bert-mini-finetuned-mrpc \
+        --task_name mrpc \
+        --per_device_eval_batch_size 18 \
+        --per_device_train_batch_size 18 \
+        --do_prune
+
+```
diff --git a/examples/pytorch/nlp/huggingface_models/text-classification/pruning/hpo/requirements.txt b/examples/pytorch/nlp/huggingface_models/text-classification/pruning/hpo/requirements.txt
@@ -0,0 +1,12 @@
+accelerate
+datasets
+sentencepiece
+scipy
+scikit-learn
+protobuf
+torch
+evaluate
+transformers
+tqdm
+xgboost
+
-Original file line number
+Diff line change
@@ Expand Up / @@ -2698,3 +2698,7 @@ Vanhoucke @@
     ONNXCommunityMeetup
     luYBWA
     pQ
+    xgb
+    xgboost
+    hpo
+    HPO