-
Notifications
You must be signed in to change notification settings - Fork 263
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Neural Insights: step by step debug example docs (#1103)
Signed-off-by: aradys-intel <[email protected]> Co-authored-by: chen, suyue <[email protected]>
- Loading branch information
Showing
14 changed files
with
243 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -113,6 +113,9 @@ When the quantization is started, the workload should appear on the Neural Insig | |
|
||
> Note that above example uses dummy data which is used to describe usage of Neural Insights. For diagnosis purposes you should use real dataset specific for your use case. | ||
|
||
## Step by Step Diagnosis Example | ||
Refer to [Step by Step Diagnosis Example with TensorFlow](https://github.com/intel/neural-compressor/tree/master/neural_insights/docs/source/tf_accuracy_debug.md) and [Step by Step Diagnosis Example with ONNXRT](https://github.com/intel/neural-compressor/tree/master/neural_insights/docs/source/onnx_accuracy_debug.md) to get started with some basic quantization accuracy diagnostic skills. | ||
|
||
## Research Collaborations | ||
|
||
Welcome to raise any interesting research ideas on model compression techniques and feel free to reach us ([email protected]). Look forward to our collaborations on Neural Insights! |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
# Step by step example how to debug accuracy with Neural Insights | ||
1. [Introduction](#introduction) | ||
2. [Preparation](#preparation) | ||
3. [Running the quantization](#running-the-quantization) | ||
4. [Analyzing the result of quantization](#-analyzing-the-result-of-quantization) | ||
|
||
# Introduction | ||
In this instruction accuracy issue will be debugged using Neural Insights. ONNX LayoutLMv3 model will be used as an example. It will be quantized and the results will be analyzed to find the cause of the accuracy loss. | ||
|
||
# Preparation | ||
## Requirements | ||
First you need to install Intel® Neural Compressor and other requirements. | ||
```shell | ||
pip install neural-compressor | ||
pip install datasets transformers torch torchvision | ||
pip install onnx onnxruntime onnxruntime-extensions | ||
pip install accelerate seqeval tensorboard sentencepiece timm fvcore Pillow einops textdistance shapely protobuf setuptools optimum | ||
``` | ||
|
||
## Model | ||
Get the LayoutLMv3 model from Intel® Neural Compressor [LayoutLMv3 example](https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/token_classification/layoutlmv3/quantization/ptq_static). | ||
```shell | ||
optimum-cli export onnx --model HYPJUDY/layoutlmv3-base-finetuned-funsd layoutlmv3-base-finetuned-funsd-onnx/ --task=token-classification | ||
``` | ||
|
||
# Running the quantization | ||
Generate a quantized model. | ||
```python | ||
onnx_model = onnx.load(input_model) | ||
calib_dataset = IncDataset(eval_dataset, onnx_model) | ||
config = PostTrainingQuantConfig(approach='static', quant_format="QOperator") | ||
q_model = quantization.fit(onnx_model, | ||
config, | ||
calib_dataloader=DataLoader(framework='onnxruntime', dataset=calib_dataset)) | ||
``` | ||
|
||
Execute benchmark to get the F1 score of both FP32 and INT8 models and then compute the relative accuracy ratio. | ||
The output results indicate that the quantized model's accuracy is noticeably poor. | ||
|
||
``` | ||
fp32 f1 = 0.9049, int8 f1 = 0.2989, accuracy ratio = -66.9631% | ||
``` | ||
|
||
# Analyzing the result of quantization | ||
In this section, the diagnosis tool is used for debugging to achieve higher INT8 model accuracy. | ||
We need to set `diagnosis` parameter to `True` as shown below. | ||
```python | ||
config = PostTrainingQuantConfig(approach="static", quant_format="QOperator", quant_level=1, diagnosis=True) # set 'diagnosis' to True | ||
q_model = quantization.fit(onnx_model, | ||
config, | ||
eval_func=eval_func, | ||
calib_dataloader=DataLoader(framework='onnxruntime', dataset=calib_dataset)) | ||
``` | ||
The diagnosis tool will output `Activations summary` and `Weights summary` in terminal. | ||
|
||
For easy to check, here we reload them to .csv files as shown below. | ||
```python | ||
import glob | ||
import pandas as pd | ||
pd.set_option('display.max_rows',None) | ||
pd.set_option('display.max_columns',None) | ||
|
||
subfolders = glob.glob("./nc_workspace" + "/*/") | ||
subfolders.sort(key=os.path.getmtime, reverse=True) | ||
if subfolders: | ||
activations_table = os.path.join(subfolders[0], "activations_table.csv") | ||
weights_table = os.path.join(subfolders[0], "weights_table.csv") | ||
|
||
activations_table = pd.read_csv(activations_table) | ||
weights_table = pd.read_csv(weights_table) | ||
|
||
print("Activations summary") | ||
display(activations_table) | ||
|
||
print("\nWeights summary") | ||
display(weights_table) | ||
``` | ||
|
||
## Weights summary | ||
These are the top 10 rows from weights summary table: | ||
|
||
 | ||
|
||
## Activations summary | ||
These are the top 10 rows from activations summary table: | ||
|
||
 | ||
|
||
In the Activations summary table, there are some nodes showing dispersed activation data range. Therefore, we calculate the `Min-Max data range` for activations data and sort the results in descending order. | ||
|
||
```python | ||
activations_table["Min-Max data range"] = activations_table["Activation max"] - activations_table["Activation min"] | ||
sorted_data = activations_table.sort_values(by="Min-Max data range", ascending=False) | ||
display(sorted_data) | ||
``` | ||
|
||
The results should look like below: | ||
|
||
 | ||
|
||
According to the results displayed above, it is evident that the nodes of type `/layoutlmv3/encoder/layer.\d+/output/Add` and `/layoutlmv3/encoder/layer.\d+/output/dense/MatMul` have significantly higher values for `Min-Max data range` compared to other node types. This indicates that they may have caused a loss of accuracy. Therefore, we can try to fallback these nodes. | ||
|
||
Refer to [diagnosis.md](https://github.com/intel/neural-compressor/blob/master/docs/source/diagnosis.md) for more tips for diagnosis. | ||
|
||
```python | ||
from neural_compressor.utils.constant import FP32 | ||
config = PostTrainingQuantConfig(approach="static", | ||
quant_format="QOperator", | ||
op_name_dict={"/layoutlmv3/encoder/layer.\d+/output/dense/MatMul":FP32, | ||
"/layoutlmv3/encoder/layer.\d+/output/Add":FP32}) | ||
q_model = quantization.fit(onnx_model, | ||
config, | ||
calib_dataloader=DataLoader(framework='onnxruntime', dataset=calib_dataset)) | ||
q_model.save(output_model) | ||
``` | ||
|
||
Execute benchmark on the new quantized model again and the accuracy ratio is improved to <1%. | ||
``` | ||
fp32 f1 = 0.9049, int8 f1 = 0.8981, accuracy ratio = -0.7502% | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
# Step by step example how to debug accuracy with Neural Insights | ||
1. [Introduction](#introduction) | ||
2. [Preparation](#preparation) | ||
3. [Running the quantization](#running-the-quantization) | ||
4. [Analyzing the result of quantization](#-analyzing-the-result-of-quantization) | ||
5. [Analyzing weight histograms](#-analyzing-weight-histograms) | ||
|
||
# Introduction | ||
In this instruction accuracy issue will be debugged using Neural Insights. TensorFlow Inception_v3 model will be used as an example. It will be quantized and the results will be analyzed to find the cause of the accuracy loss. | ||
|
||
# Preparation | ||
## Source | ||
First you need to install Intel® Neural Compressor. | ||
```shell | ||
# Install Neural Compressor | ||
git clone https://github.com/intel/neural-compressor.git | ||
cd neural-compressor | ||
pip install -r requirements.txt | ||
python setup.py install | ||
|
||
# Install Neural Insights | ||
pip install -r neural_insights/requirements.txt | ||
python setup.py install neural_insights | ||
``` | ||
|
||
## Requirements | ||
```shell | ||
cd examples/tensorflow/image_recognition/tensorflow_models/inception_v3/quantization/ptq | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## Model | ||
Download pre-trained PB model file. | ||
```shell | ||
wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_6/inceptionv3_fp32_pretrained_model.pb | ||
``` | ||
|
||
## Prepare the dataset | ||
Download dataset from ImageNet and process the data to TensorFlow Record format. | ||
```shell | ||
cd examples/tensorflow/image_recognition/tensorflow_models/ | ||
bash prepare_dataset.sh --output_dir=./inception_v3/quantization/ptq/data --raw_dir=/PATH/TO/img_raw/val/ --subset=validation | ||
bash prepare_dataset.sh --output_dir=./inception_v3/quantization/ptq/data --raw_dir=/PATH/TO/img_raw/train/ --subset=train | ||
``` | ||
|
||
# Running the quantization | ||
Before applying quantization, modify some code to enable Neural Insights: | ||
1. Set the argument `diagnosis` to be `True` in `PostTrainingQuantConfig` so that Neural Insights will dump weights and activations of quantizable Ops in this model. | ||
2. Delete the `op_name_dict` argument because that’s the answer of our investigation. | ||
```python | ||
conf = PostTrainingQuantConfig(calibration_sampling_size=[50, 100], diagnosis=True) | ||
``` | ||
3. Quantize the model with following command: | ||
```shell | ||
bash run_tuning.sh --input_model=/PATH/TO/inceptionv3_fp32_pretrained_model.pb --output_model=./nc_inception_v3.pb --dataset_location=/path/to/ImageNet/ | ||
``` | ||
|
||
The accuracy of this model will decrease a lot if all Ops are quantized to int8 as default strategy: | ||
|
||
 | ||
|
||
# Analyzing the result of quantization | ||
Then, if you run quantization, you will find the following table: | ||
|
||
 | ||
|
||
The MSE (Mean Square Error) of the Ops’ activation are listed from high to low, there are also min-max values. | ||
Usually, MSE can be referred as one of a typical indexes leading to accuracy loss. | ||
|
||
 | ||
|
||
There are also relevant information about Ops’ weights. | ||
Often Op with highest MSE will cause the highest accuracy loss, but it is not always the case. | ||
|
||
Experiment with disabling the quantization of some of the Ops with top 5 highest MSE in both tables is not satisfactory, as results show in this example: | ||
|
||
 | ||
|
||
Then weights histograms can be analyzed to find the reason of the accuracy loss. | ||
|
||
# Analyzing weight histograms | ||
## Open Neural Insights | ||
```shell | ||
neural_insights | ||
``` | ||
|
||
Then you will get a webpage address with Neural insights GUI mode. You can find there histograms of weights and activations. | ||
``` | ||
Neural Insights Server started. | ||
Open address [...] | ||
``` | ||
|
||
The weights of Ops are usually distributed in one spike like the following graph: | ||
|
||
 | ||
|
||
When you click on the Op in the Op list, you can get weight and activation histograms at the bottom of the page. | ||
One of the weights histograms looks different than the examples above. | ||
|
||
 | ||
|
||
As is shown in the chart, the distribution of weights often concentrates in a small range of min-max values, when the accuracy loss of an Op is tolerable. But in this Op the min-max values of weights are significantly high (range is bigger than [-20, 20]) because of some outliers. The values near zero point, which are the majority, will be mapped to a very small range in int8, leading to a huge accuracy loss. Besides, since the min-max values vary in different channels, the accuracy will decrease without using channel-wise quantization. | ||
|
||
Therefore, you can disable this Op: | ||
```python | ||
op_name_dict = {'v0/cg/conv0/conv2d/Conv2D': { | ||
'activation': {'dtype': ['fp32']}}} | ||
conf = PostTrainingQuantConfig(calibration_sampling_size=[50, 100], op_name_dict=op_name_dict) | ||
``` | ||
|
||
After running quantization again, you can see that accuracy result has increased. The Op that caused accuracy loss was found. | ||
|
||
 |