Skip to content

Commit

Permalink
notebook example update to INC2.0 API (#772)
Browse files Browse the repository at this point in the history
Signed-off-by: Vishnu Madhu <[email protected]>
Co-authored-by: u110737 <[email protected]>
(cherry picked from commit a028192)
  • Loading branch information
vishnumadhu365 authored and chensuyue committed May 9, 2023
1 parent c11fbeb commit 54d2f58
Show file tree
Hide file tree
Showing 26 changed files with 255 additions and 259 deletions.
3 changes: 2 additions & 1 deletion .azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2593,4 +2593,5 @@ instancenorm
leakyrelu
llamanorm
nbias
pc
pc
cdrdv
2 changes: 1 addition & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Intel® Neural Compressor validated examples with multiple compression technique

* *[BERT Mini SST2 performance boost with INC](/examples/notebook/bert_mini_distillation): train a BERT-Mini model on SST-2 dataset through distillation, and leverage quantization to accelerate the inference while maintaining the accuracy using Intel® Neural Compressor.
* [Performance of FP32 Vs. INT8 ResNet50 Model](/examples/notebook/perf_fp32_int8_tf): compare existed FP32 & INT8 ResNet50 model directly.
* *[Intel® Neural Compressor Sample for PyTorch*](/examples/notebook/pytorch/alexnet_fashion_mnist): an End-To-End pipeline to build up a CNN model by PyTorch to recognize fashion image and speed up AI model by Intel® Neural Compressor.
* [Intel® Neural Compressor Sample for PyTorch*](/examples/notebook/pytorch/alexnet_fashion_mnist): an End-To-End pipeline to build up a CNN model by PyTorch to recognize fashion image and speed up AI model by Intel® Neural Compressor.
* [Intel® Neural Compressor Sample for TensorFlow*](/examples/notebook/tensorflow/alexnet_mnist): an End-To-End pipeline to build up a CNN model by TensorFlow to recognize handwriting number and speed up AI model by Intel® Neural Compressor.
* [Accelerate VGG19 Inference on Intel® Gen4 Xeon® Sapphire Rapids](/examples/notebook/tensorflow/vgg19_ibean): an End-To-End pipeline to train VGG19 model by transfer learning based on pre-trained model from [TensorFlow Hub](https://tfhub.dev); quantize it by Intel® Neural Compressor on Intel® Gen4 Xeon® Sapphire Rapids.

Expand Down
95 changes: 50 additions & 45 deletions examples/notebook/pytorch/alexnet_fashion_mnist/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,28 @@

## Background

Low-precision inference can speed up inference obviously, by converting the fp32 model to int8 or bf16 model. Intel provides Intel® Deep Learning Boost technology in the Second Generation Intel® Xeon® Scalable Processors and newer Xeon®, which supports to speed up int8 and bf16 model by hardware.
Low-precision inference can significantly speed up inference pipelines. This is achieved by converting an FP32 model to quantized INT8 or BF16 model. Second Generation Intel® Xeon® Scalable Processors (and newer) have Intel® Deep Learning Boost technology, which provides dedicated silicon for speeding up INT8 and BF16 operations.

Intel® Neural Compressor helps the user to simplify the processing to convert the fp32 model to int8/bf16.
Intel® Neural Compressor (INC in short) helps developers in quantizing models, thereby converting an FP32 model into lower precisions like INT8 and BF16.

At the same time, Intel® Neural Compressor will tune the quantization method to reduce the accuracy loss, which is a big blocker for low-precision inference.

Intel® Neural Compressor is released in Intel® AI Analytics Toolkit and works with Intel® Optimization of PyTorch*.
Intel® Neural Compressor is packaged into Intel® AI Analytics Toolkit and works with Intel® Optimization for PyTorch*.

Please refer to the official website for detailed info and news: [https://github.com/intel/neural-compressor](https://github.com/intel/neural-compressor)

## Introduction

This is a demo to show an End-To-End pipeline to build up a CNN model by Pytorch to recognize fashion image and speed up AI model by Intel® Neural Compressor.
This sample is an End-To-End pipeline which demonstrates the usage specifics of the Intel® Neural Compressor. The pipeline does the following:

1. Train a CNN AlexNet model by PyTorch based on dataset Fashion-MNIST.
1. Using Pytorch, **Train** an ResNet50 model(CNN) on the Fashion-MNIST dataset.

2. Quantize the frozen PB model file by Intel® Neural Compressor to INT8 model.
2. Using the Intel® Neural Compressor, **quantize** the FP32 Pytorch model file(.pth) to an INT8 model.

3. Compare the performance of FP32 and INT8 model by same script.
3. **Compare** the inference performance of the FP32 and INT8 model.


We will learn the acceleration of AI inference by Intel AI technology:
The sample showcases AI inference performance optimizations delivered by,

1. Intel® Deep Learning Boost

Expand All @@ -39,21 +39,24 @@ We will learn the acceleration of AI inference by Intel AI technology:
|Test performance|profiling_inc.py|alexnet_mnist_fp32_mod.pth<br>alexnet_mnist_int8_mod|32.json<br>8.json|
|Compare the performance|compare_perf.py|32.json<br>8.json|stdout/stderr<br>log file<br>fp32_int8_absolute.png<br>fp32_int8_times.png|

**run_sample.sh** will call above python scripts to finish the demo.
**run_sample.sh** will call above python scripts to finish the demo.<br>
Bash scripts are placed in 'scripts' directory <br>
Python files are placed in 'scripts/python_src' directory <br>


## Hardware Environment

This demo could be executed on any Intel CPU. But it's recommended to use 2nd Generation Intel® Xeon® Scalable Processors or newer, which include:

1. AVX512 instruction to speed up training & inference AI model.

2. Intel® Deep Learning Boost: Vector Neural Network Instruction (VNNI) to accelerate AI/DL Inference with INT8/BF16 Model.
1. AVX512 instruction to speed up training & inference of AI models.

With Intel® Deep Learning Boost, the performance will be increased obviously.
2. Intel® Deep Learning Boost: Vector Neural Network Instruction (VNNI) & [Intel® AMX](https://www.intel.in/content/www/in/en/products/docs/accelerator-engines/advanced-matrix-extensions/overview.html) (Advanced Matrix Extensions) to accelerate AI/DL Inference of INT8/BF16 Model.

3. Intel® DevCloud

If you have no such CPU support Intel® Deep Learning Boost, you could register to Intel® DevCloud and try this example on new Xeon with Intel® Deep Learning Boost freely. To learn more about working with Intel® DevCloud, please refer to [Intel® DevCloud](https://www.intel.com/content/www/us/en/developer/tools/devcloud/overview.html)
In case you don't have access to the latest Intel® Xeon® CPU's, you could use the Intel® DevCloud for running this sample.<br>
Intel® DevCloud offers free access to the newer Intel® hardware.<br>
To learn more about working with Intel® DevCloud, please refer to [Intel® DevCloud](https://devcloud.intel.com/oneapi/home/)


## Running Environment
Expand All @@ -68,8 +71,8 @@ This article assumes you are familiar with Intel® DevCloud environment. To lear
Specifically, this article assumes:

1. You have an Intel® DevCloud account.
2. You are familiar with usage of Intel® DevCloud, like login by SSH client..
3. Developers are familiar with Python, AI model training and inference based on PyTorch*.
2. You are familiar with usage of Intel® DevCloud, like login by SSH client or using the Jupyter* lab interface.
3. You are familiar with Python, AI model training and inference based on PyTorch*.

#### Setup based on Intel® oneAPI AI Analytics Toolkit

Expand All @@ -78,37 +81,40 @@ Specifically, this article assumes:
2. Create virtual environment **env_inc**:

```
./devcloud_setup_env.sh
cd neural-compressor/examples/notebook/pytorch/alexnet_fashion_mnist
chmod +x -R scripts/*
bash scripts/devcloud_setup_env.sh
```
Note : If you are running this for the first time, it could take a while to download all the required packages.

#### Run in Jupyter Notebook in Intel® DevCloud for oneAPI
#### Run the Jupyter Notebook in Intel® DevCloud for oneAPI

Please open **inc_sample_for_pytorch.ipynb** in Jupyter Notebook.
Open **inc_sample_for_pytorch.ipynb** in Jupyter Notebook. Follow the steps in the notebook to complete the sample

Following the guide to run this demo.

#### Run in SSH Login Intel® DevCloud for oneAPI

This demo will show the obviously acceleration by VNNI. In Intel® DevCloud, please choose compute node with the property 'clx' or 'icx' or 'spr' which support VNNI.
This demo is intended to show the performance acceleration provided by,
1. [Intel® VNNI](https://cdrdv2-public.intel.com/727804/dl-boost-product-overview.pdf) (Vector Neural Network Instructions). On Intel® DevCloud, choose compute node with the property 'clx' or 'icx' or 'spr'. These node types offer support for Intel® VNNI
2. [Intel® AMX](https://www.intel.in/content/www/in/en/products/docs/accelerator-engines/advanced-matrix-extensions/overview.html) (Advanced Matrix Extensions). On Intel® DevCloud, choose compute node with the property 'spr'. This node type offer support for Intel® AMX

##### Job Submit
```
!qsub run_in_intel_devcloud.sh -d `pwd` -l nodes=1:icx:ppn=2
28029.v-qsvr-nda.aidevcloud
qsub scripts/run_in_intel_devcloud.sh -d `pwd` -l nodes=1:icx:ppn=2 -o output/ -e output/
```

Note, please run above command in login node. There will be error as below if run it on compute node:
Note: You have to run the above command in the "login node". If you run it on the "compute node" by mistake, the system will throw an error message as below .
```
qsub: submit error (Bad UID for job execution MSG=ruserok failed validating uXXXXX/uXXXXX from s001-n054.aidevcloud)
```

##### Check job status

```
qstat
qstat -a
```

After the job is over (successfully or fault), there will be log files, like:
Once the job execution completes (either successfully or error-out), look out for log files in the 'output' directory. Below are two log file names for reference:

1. **run_in_intel_devcloud.sh.o28029**
2. **run_in_intel_devcloud.sh.e28029**
Expand All @@ -119,6 +125,7 @@ After the job is over (successfully or fault), there will be log files, like:

```
tail -23 `ls -lAtr run_in_intel_devcloud.sh.o* | tail -1 | awk '{print $9}'`
```
Or
Check the result in a log file, like : **run_in_intel_devcloud.sh.o28029**:
Expand All @@ -128,16 +135,16 @@ Check the result in a log file, like : **run_in_intel_devcloud.sh.o28029**:
Model FP32 INT8
throughput(fps) 572.4982883964987 3030.70552731285
latency(ms) 2.8339174329018104 2.128233714979522
accuracy(%) 0.9799 0.9796
throughput(fps) xxx.4982883964987 xxx.70552731285
latency(ms) x.8339174329018104 x.128233714979522
accuracy(%) 0.x799 0.x796
Save to fp32_int8_absolute.png
Model FP32 INT8
throughput_times 1 5.293824608282245
latency_times 1 0.7509864932092611
accuracy_times 1 0.9996938463108482
throughput_times 1 x.293824608282245
latency_times 1 x.7509864932092611
accuracy_times 1 0.x996938463108482
Save to fp32_int8_times.png
Please check the PNG files to see the performance!
Expand All @@ -153,11 +160,11 @@ Thank you!
```

We will see the performance and accuracy of FP32 and INT8 model. The performance could be obviously increased if running on Xeon with VNNI.
The output shows the performance and accuracy of FP32 and INT8 model.

##### Check Result in PNG file

The demo creates figure files: fp32_int8_absolute.png, fp32_int8_times.png to show performance bar. They could be used in report.
The demo saves performance comparison as PNG files: fp32_int8_absolute.png, fp32_int8_times.png

Copy files from DevCloud in host:

Expand All @@ -172,15 +179,15 @@ Set up own running environment in local server, cloud (including Intel® DevClou

#### Install by PyPi

Create virtual environment **env_inc**:
Create virtual environment **pip_env_inc**:

```
pip_set_env.sh
```
Activate it by:

```
source env_inc/bin/activate
source pip_env_inc/bin/activate
```

#### Install by Conda
Expand All @@ -200,24 +207,24 @@ conda activate env_inc
#### Run by SSH

```
./run_sample.sh
bash scripts/run_sample.sh
```

1. Check the result in screen print out:
```
...
Model FP32 INT8
throughput(fps) 572.4982883964987 3030.70552731285
latency(ms) 2.8339174329018104 2.128233714979522
accuracy(%) 0.9799 0.9796
throughput(fps) xxx.4982883964987 xxx.70552731285
latency(ms) x.8339174329018104 x.128233714979522
accuracy(%) 0.x799 0.x796
Save to fp32_int8_absolute.png
Model FP32 INT8
throughput_times 1 5.293824608282245
latency_times 1 0.7509864932092611
accuracy_times 1 0.9996938463108482
throughput_times 1 x.293824608282245
latency_times 1 x.7509864932092611
accuracy_times 1 x.9996938463108482
Save to fp32_int8_times.png
Please check the PNG files to see the performance!
Expand All @@ -238,8 +245,6 @@ Please open **inc_sample_for_pytorch.ipynb** in Jupyter Notebook.

Following the guide of chapter **Run in Customer Server or Cloud** to run this demo.



## License

Code samples are licensed under the MIT license. See
Expand Down
17 changes: 0 additions & 17 deletions examples/notebook/pytorch/alexnet_fashion_mnist/alexnet.yaml

This file was deleted.

10 changes: 0 additions & 10 deletions examples/notebook/pytorch/alexnet_fashion_mnist/conda_set_env.sh

This file was deleted.

This file was deleted.

This file was deleted.

Loading

0 comments on commit 54d2f58

Please sign in to comment.