diff --git a/README.md b/README.md index 1b9038f95eb..d6c430bf78b 100755 --- a/README.md +++ b/README.md @@ -37,9 +37,7 @@ Intel® Neural Compressor has been one of the critical AI software components in # install stable version from from conda conda install neural-compressor -c conda-forge -c intel ``` -More installation methods can be found at [Installation Guide](./docs/installation_guide.md). -> **Note:** -> Run into installation issues, please check [FAQ](./docs/faq.md). +More installation methods can be found at [Installation Guide](./docs/installation_guide.md). Please check out our [FAQ](./docs/faq.md) for more details. ## Getting Started * Quantization with Python API @@ -122,8 +120,8 @@ Intel® Neural Compressor supports systems based on [Intel 64 architecture or co -> Note: 1.Starting from official TensorFlow 2.6.0, oneDNN has been default in the binary. Please set the environment variable TF_ENABLE_ONEDNN_OPTS=1 to enable the oneDNN optimizations. -> 2.Starting from official TensorFlow 2.9.0, oneDNN optimizations are enabled by default on CPUs with neural-network-focused hardware features such as AVX512_VNNI, AVX512_BF16, AMX, etc. No need to set environment variable. +> **Note:** +> Please set the environment variable TF_ENABLE_ONEDNN_OPTS=1 to enable oneDNN optimizations if you are using TensorFlow from v2.6 to v2.8. oneDNN has been fully default from TensorFlow v2.9. ### Validated Models Intel® Neural Compressor validated 420+ [examples](./examples) with performance speedup geomean 2.2x and up to 4.2x on VNNI while minimizing the accuracy loss. @@ -143,7 +141,7 @@ More details for validated models are available [here](docs/validated_model_list - Infrastructure + Architecture Tutorial Examples GUI @@ -177,7 +175,7 @@ More details for validated models are available [here](docs/validated_model_list Quantization Pruning (Sparsity) Knowledge Distillation - Mixed precision + Mixed Precision Benchmarking @@ -207,7 +205,7 @@ More details for validated models are available [here](docs/validated_model_list * [Quantizing ONNX Models using Intel® Neural Compressor](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Quantizing-ONNX-Models-using-Intel-Neural-Compressor/post/1355237) (Feb 2022) * [Quantize AI Model by Intel® oneAPI AI Analytics Toolkit on Alibaba Cloud](https://www.intel.com/content/www/us/en/developer/articles/technical/quantize-ai-by-oneapi-analytics-on-alibaba-cloud.html) (Feb 2022) -> View the [full publication list](docs/publication_list.md). +> Please check out our [full publication list](docs/publication_list.md). ## Additional Content @@ -217,6 +215,6 @@ More details for validated models are available [here](docs/validated_model_list * [Security Policy](docs/security_policy.md) * [Intel® Neural Compressor Website](https://intel.github.io/neural-compressor) -## Hiring +## Hiring :star: -We are hiring. Please send your resume to inc.maintainers@intel.com if you have interests in model compression techniques. +We are actively hiring. Please send your resume to inc.maintainers@intel.com if you have interests in model compression techniques. diff --git a/docs/QAT.md b/docs/QAT.md index efa5d80341d..7bad1c0fcd0 100644 --- a/docs/QAT.md +++ b/docs/QAT.md @@ -1,75 +1,56 @@ -# QAT +# Quantization-aware Training ## Design -At its core, QAT simulates low-precision inference-time computation in the forward pass of the training process. With QAT, all weights and activations are "fake quantized" during both the forward and backward passes of training: that is, float values are rounded to mimic int8 values, but all computations are still done with floating point numbers. Thus, all the weight adjustments during training are made while "aware" of the fact that the model will ultimately be quantized; after quantizing, therefore, this method will usually yield higher accuracy than either dynamic quantization or post-training static quantization. +Quantization-aware training (QAT) simulates low-precision inference-time computation in the forward pass of the training process. With QAT, all weights and activations are "fake quantized" during both the forward and backward passes of training: that is, float values are rounded to mimic int8 values, but all computations are still done with floating point numbers. Thus, all the weight adjustments during training are made while "aware" of the fact that the model will ultimately be quantized; after quantizing, therefore, this method will usually yield higher accuracy than either dynamic quantization or post-training static quantization. -The overall workflow for actually performing QAT is very similar to Post-training static quantization (PTQ): - -* We can use the same model as PTQ; no additional preparation is needed for quantization-aware training. -* We need to use a qconfig specifying what kind of fake-quantization is to be inserted after weights and activations, instead of specifying observers. +fake quantize ## Usage -### MobileNetV2 Model Architecture - -Refer to the [PTQ Model Usage](PTQ.md#mobilenetv2-model-architecture). - -### Helper Functions - -Refer to [PTQ Helper Functions](PTQ.md#helper-functions). - -### QAT - -First, define a training function: +First, define a training function as below. +accuracy is in the ```python -def train_one_epoch(model, criterion, optimizer, data_loader, device, ntrain_batches): - model.train() - top1 = AverageMeter('Acc@1', ':6.2f') - top5 = AverageMeter('Acc@5', ':6.2f') - avgloss = AverageMeter('Loss', '1.5f') - - cnt = 0 - for image, target in data_loader: - start_time = time.time() - print('.', end = '') - cnt += 1 - image, target = image.to(device), target.to(device) - output = model(image) - loss = criterion(output, target) - optimizer.zero_grad() - loss.backward() - optimizer.step() - acc1, acc5 = accuracy(output, target, topk=(1, 5)) - top1.update(acc1[0], image.size(0)) - top5.update(acc5[0], image.size(0)) - avgloss.update(loss, image.size(0)) - if cnt >= ntrain_batches: - print('Loss', avgloss.avg) - - print('Training: * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}' - .format(top1=top1, top5=top5)) - return - - print('Full imagenet train set: * Acc@1 {top1.global_avg:.3f} Acc@5 {top5.global_avg:.3f}' - .format(top1=top1, top5=top5)) - return +def training_func_for_nc(model): + epochs = 8 + iters = 30 + optimizer = torch.optim.SGD(model.parameters(), lr=0.0001) + for nepoch in range(epochs): + model.train() + cnt = 0 + for image, target in train_loader: + print('.', end='') + cnt += 1 + output = model(image) + loss = criterion(output, target) + optimizer.zero_grad() + loss.backward() + optimizer.step() + if cnt >= iters: + break + if nepoch > 3: + # Freeze quantizer parameters + model.apply(torch.quantization.disable_observer) + if nepoch > 2: + # Freeze batch norm mean and variance estimates + model.apply(torch.nn.intrinsic.qat.freeze_bn_stats) + return model ``` -Fuse modules as PTQ: +Fuse modules: ```python model.fuse_model() optimizer = torch.optim.SGD(model.parameters(), lr = 0.0001) model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm') ``` -Finally, prepare_qat performs the "fake quantization", preparing the model for quantization-aware training: +Finally, prepare_qat performs the "fake quantization", preparing the model for quantization-aware training, this function already be implemented as a hook : ```python torch.quantization.prepare_qat(model, inplace=True) ``` -Training a quantized model with high accuracy requires accurate modeling of numerics at inference. For quantization-aware training, therefore, modify the training loop by doing the following: - +Training a quantized model with high accuracy requires accurate modeling of numerics at inference. INC does the training loop by following: * Switch batch norm to use running mean and variance towards the end of training to better match inference numerics. * Freeze the quantizer parameters (scale and zero-point) and fine tune the weights. + ```python num_train_batches = 20 # Train and check accuracy after each epoch @@ -88,6 +69,20 @@ for nepoch in range(8): print('Epoch %d :Evaluation accuracy on %d images, %2.2f'%(nepoch, num_eval_batches * eval_batch_size, top1.avg)) ``` +When using QAT in INC, you just need to use these APIs: +```python +from neural_compressor.experimental import Quantization, common +quantizer = Quantization("./conf.yaml") +quantizer.model = common.Model(model) +quantizer.q_func = training_func_for_nc +quantizer.eval_dataloader = val_loader +q_model = quantizer.fit() +``` + +The quantizer.fit() function will return a best quantized model during timeout constrain. +
+The yaml define example: [The yaml example](/examples/pytorch/image_recognition/torchvision_models/quantization/qat/fx) + Here, we just perform quantization-aware training for a small number of epochs. Nevertheless, quantization-aware training yields an accuracy of over 71% on the entire imagenet dataset, which is close to the floating point accuracy of 71.9%. More on quantization-aware training: @@ -96,10 +91,6 @@ More on quantization-aware training: * We can simulate the accuracy of a quantized model in floating points since we are using fake-quantization to model the numerics of actual quantized arithmetic. * We can easily mimic post-training quantization. -Intel® Neural Compressor can support QAT calibration for -PyTorch models. Refer to the [QAT model](https://github.com/intel/neural-compressor/tree/master/examples/pytorch/eager/image_recognition/imagenet/cpu/qat/README.md) for step-by-step tuning. - -### Example -View a [QAT example of PyTorch resnet50](/examples/pytorch/image_recognition/torchvision_models/quantization/qat). - +### Examples +For related examples, please refer to the [QAT models](/examples/README.md). diff --git a/docs/Quantization.md b/docs/Quantization.md index 18619185e3f..dd3dbce3c7d 100644 --- a/docs/Quantization.md +++ b/docs/Quantization.md @@ -1,15 +1,77 @@ -Quantization -============ +# Quantization -Quantization refers to processes that enable lower precision inference and training by performing computations at fixed point integers that are lower than floating points. This often leads to smaller model sizes and faster inference time. Quantization is particularly useful in deep learning inference and training, where moving data more quickly and reducing bandwidth bottlenecks is optimal. Intel is actively working on techniques that use lower numerical precision by using training with 16-bit multipliers and inference with 8-bit or 16-bit multipliers. Refer to the Intel article on [lower numerical precision inference and training in deep learning](https://software.intel.com/content/www/us/en/develop/articles/lower-numerical-precision-deep-learning-inference-and-training.html). +Quantization is a widely-used model compression technique that can reduce model size while also improving inference and training latency.
+The full precision data converts to low-precision, there is little degradation in model accuracy, but the inference performance of quantized model can gain higher performance by saving the memory bandwidth and accelerating computations with low precision instructions. Intel provided several lower precision instructions (ex: 8-bit or 16-bit multipliers), both training and inference can get benefits from them. +Refer to the Intel article on [lower numerical precision inference and training in deep learning](https://software.intel.com/content/www/us/en/develop/articles/lower-numerical-precision-deep-learning-inference-and-training.html). -Quantization methods include the following three classes: +## Quantization Support Matrix -* [Post-Training Quantization (PTQ)](./PTQ.md) -* [Quantization-Aware Training (QAT)](./QAT.md) -* [Dynamic Quantization](./dynamic_quantization.md) +Quantization methods include the following three types: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TypesQuantizationDataset RequirementsFrameworkBackend
Post-Training Static Quantization (PTQ)weights and activationscalibrationPyTorchPyTorch Eager/PyTorch FX/IPEX
TensorFlowTensorFlow/Intel TensorFlow
ONNX RuntimeQLinearops/QDQ
Post-Training Dynamic QuantizationweightsnonePyTorchPyTorch eager mode/PyTorch fx mode/IPEX
ONNX RuntimeQIntegerops
Quantization-aware Training (QAT)weights and activationsfine-tuningPyTorchPyTorch eager mode/PyTorch fx mode/IPEX
TensorFlowTensorFlow/Intel TensorFlow
+
+
-> **Note** -> -> Dynamic Quantization currently only supports the onnxruntime backend. +### [Post-Training Static Quantization](./PTQ.md) performs quantization on already trained models, it requires an additional pass over the dataset to work, only activations do calibration. +PTQ +
+ +### [Post-Training Dynamic Quantization](./dynamic_quantization.md) simply multiplies input values by a scaling factor, then rounds the result to the nearest, it determines the scale factor for activations dynamically based on the data range observed at runtime. Weights are quantized ahead of time but the activations are dynamically quantized during inference. +Dynamic Quantization +
+ +### [Quantization-aware Training (QAT)](./QAT.md) quantizes models during training and typically provides higher accuracy comparing with post-training quantization, but QAT may require additional hyper-parameter tuning and it may take more time to deployment. +QAT + +## Examples of Quantization + +For Quantization related examples, please refer to [Quantization examples](/examples/README.md) diff --git a/docs/infrastructure.md b/docs/design.md similarity index 96% rename from docs/infrastructure.md rename to docs/design.md index 6b23be3c10c..bee2fa124b8 100644 --- a/docs/infrastructure.md +++ b/docs/design.md @@ -1,4 +1,4 @@ -Infrastructure +Design ===== Intel® Neural Compressor features an architecture and workflow that aids in increasing performance and faster deployments across infrastructures. diff --git a/docs/imgs/PTQ.png b/docs/imgs/PTQ.png new file mode 100644 index 00000000000..9d6d8183409 Binary files /dev/null and b/docs/imgs/PTQ.png differ diff --git a/docs/imgs/QAT.png b/docs/imgs/QAT.png new file mode 100644 index 00000000000..27c72efa583 Binary files /dev/null and b/docs/imgs/QAT.png differ diff --git a/docs/imgs/dynamic_quantization.png b/docs/imgs/dynamic_quantization.png new file mode 100644 index 00000000000..2a71c0ea1ea Binary files /dev/null and b/docs/imgs/dynamic_quantization.png differ diff --git a/docs/imgs/fake_quant.png b/docs/imgs/fake_quant.png new file mode 100644 index 00000000000..855297a685d Binary files /dev/null and b/docs/imgs/fake_quant.png differ diff --git a/docs/orchestration.md b/docs/orchestration.md new file mode 100755 index 00000000000..945c37e0d28 --- /dev/null +++ b/docs/orchestration.md @@ -0,0 +1,57 @@ +Optimization Orchestration +============ + +## Introduction + +Intel Neural Compressor supports arbitrary meaningful combinations of supported optimization methods under one-shot or multi-shot, such as pruning during quantization-aware training, or pruning and then post-training quantization, +pruning and then distillation and then quantization. + +## Validated Orchestration Types + +### One-shot + +- Pruning during quantization-aware training +- Distillation with pattern lock pruning +- Distillation with pattern lock pruning and quantization-aware training + +### Multi-shot + +- Pruning and then post-training quantization +- Distillation and then post-training quantization + +## Orchestration user facing API + +Neural Compressor defines `Scheduler` class to automatically pipeline execute model optimization with one shot or multiple shots way. + +User instantiates model optimization components, such as quantization, pruning, distillation, separately. After that, user could append +those separate optimization objects into scheduler's pipeline, the scheduler API executes them one by one. + +In following example it executes the pruning and then post-training quantization with two-shot way. + +```python +from neural_compressor.experimental import Quantization, Pruning, Scheduler +prune = Pruning(prune_conf) +quantizer = Quantization(post_training_quantization_conf) +scheduler = Scheduler() +scheduler.model = model +scheduler.append(prune) +scheduler.append(quantizer) +opt_model = scheduler.fit() +``` + +If user wants to execute the pruning and quantization-aware training with one-shot way, the code is like below. + +```python +from neural_compressor.experimental import Quantization, Pruning, Scheduler +prune = Pruning(prune_conf) +quantizer = Quantization(quantization_aware_training_conf) +scheduler = Scheduler() +scheduler.model = model +combination = scheduler.combine(prune, quantizer) +scheduler.append(combination) +opt_model = scheduler.fit() +``` + +### Examples + +For orchestration related examples, please refer to [Orchestration examples](../examples/README.md). diff --git a/docs/platform_configuration.md b/docs/platform_configuration.md new file mode 100644 index 00000000000..764216ddc72 --- /dev/null +++ b/docs/platform_configuration.md @@ -0,0 +1,66 @@ +### SYSTEM CONFIGURATION + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
System ConfigurationIntel Xeon Platinum 8380 Scalable processor
ManufacturerIntel Corporation
Product NameM50CYP2SBSTD
BIOS VersionSE5C6200.86B.0022.D64.2105220049
OSUbuntu 20.04.1 LTS
Kernel5.4.0-42-generic
Microcode0xd0002b1
CPU ModelIntel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
Base Frequency2.3GHZ
Thread(s) per Core2
Core(s) per Socket40
Socket(s)2
TurboEnabled
Power & Perf PolicyBalanced
Installed256GB (16x16GB DDR4 3200MT/s [3200MT/s])
NIC Summary2x Ethernet Controller 10G X550T
Drive Summary1x INTEL_SSDSC2KW01 953.9G, +1x CT1000MX500SSD1 931.5G, +1x CT1000MX500SSD1 931.5G +
+ +Performance varies by use, configuration and other factors and may not reflect all publicly available ​updates. No product or component can be absolutely secure.. + +Intel technologies may require enabled hardware, software or service activation. + +Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products. + + + +© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others​.​​ \ No newline at end of file diff --git a/docs/pruning.md b/docs/pruning.md index 93c31b7dcce..e421ed3445a 100755 --- a/docs/pruning.md +++ b/docs/pruning.md @@ -3,53 +3,74 @@ Pruning ## Introduction -Network pruning is one of popular approaches of network compression, which reduces the size of a network by removing parameters with minimal drop in accuracy. +Network pruning is one of popular approaches of network compression, which removes the least important parameters in the network to achieve compact architectures with minimal accuracy drop. -- Structured Pruning - -Structured pruning means pruning sparsity patterns, in which there is some structure, most often in the form of blocks. -Neural Compressor provided a NLP Structured pruning example: -[Bert example](../examples/pytorch/nlp/huggingface_models/question-answering/pruning/group_lasso/eager). -[README of Structured pruning example](../examples/pytorch/nlp/huggingface_models/question-answering/pruning/group_lasso/eager/README.md). +## Pruning Types - Unstructured Pruning -Unstructured pruning means pruning unstructured sparsity (aka random sparsity) patterns, where the nonzero patterns are irregular and could be anywhere in the matrix. - -- Filter/Channel Pruning - -Filter/Channel pruning means pruning a larger part of the network, such as filters or layers, according to some rules. - -## Pruning Algorithms supported by Neural Compressor - -| Pruning Type | Algorithm | PyTorch | Tensorflow | -|------------------------|---------------------------------------------|---------|------------| -| unstructured pruning | basic_magnitude | Yes | Yes | -| | pattern_lock | Yes | N/A | -| structured pruning | pattern_lock | Yes | N/A | -| filter/channel pruning | gradient_sensitivity | Yes | N/A | +Unstructured pruning means finding and removing the less salient connection in the model where the nonzero patterns are irregular and could be anywhere in the matrix. -Neural Compressor also supports the two-shot execution of unstructured pruning and post-training quantization. +- Structured Pruning -- basic_magnitude: +Structured pruning means finding parameters in groups, deleting entire blocks, filters, or channels according to some pruning criterions. + +## Pruning Algorithms + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Pruning TypePruning GranularityPruning AlgorithmFramework
Unstructured PruningElement-wiseMagnitudePyTorch, TensorFlow
Pattern LockPyTorch
Structured PruningFilter/Channel-wiseGradient SensitivityPyTorch
Block-wiseGroup LassoPyTorch
Element-wisePattern LockPyTorch
+ +- Magnitude - The algorithm prunes the weight by the lowest absolute value at each layer with given sparsity target. -- gradient_sensitivity: +- Gradient sensitivity - The algorithm prunes the head, intermediate layers, and hidden states in NLP model according to importance score calculated by following the paper [FastFormers](https://arxiv.org/abs/2010.13382). -- pattern_lock - - - The algorithm takes a sparsity model as input and starts to fine tune this sparsity model and locks the sparsity pattern by freezing those zero values in weight tensor after weight update during training. - -- pruning and then post-training quantization +- Group Lasso - - The algorithm executes unstructured pruning and then executes post-training quantization. + - The algorithm uses Group lasso regularization to prune entire rows, columns or blocks of parameters that result in a smaller dense network. -- pruning during quantization-aware training +- Pattern Lock - - The algorithm executes unstructured pruning during quantization-aware training. + - The algorithm locks the sparsity pattern in fine tune phase by freezing those zero values of weight tensor during weight update of training. ## Pruning API @@ -58,38 +79,29 @@ Neural Compressor also supports the two-shot execution of unstructured pruning a Neural Compressor pruning API is defined under `neural_compressor.experimental.Pruning`, which takes a user defined yaml file as input. The user defined yaml defines training, pruning and evaluation behaviors. [API Readme](../docs/pruning_api.md). -### Launcher code +### Usage 1: Launch pruning with user-defined yaml -Simplest launcher code if training behavior is defined in user-defined yaml. +#### Launcher code -``` -from neural_compressor.experimental import Pruning, common -prune = Pruning('/path/to/user/pruning/yaml') -prune.model = model -model = prune.fit() -``` - -Pruning class also support PruningConf class as it's argument. +Below is the launcher code if training behavior is defined in user-defined yaml. ``` -from lpot.experimental import Pruning, common -from lpot.conf.config import PruningConf -conf = PruningConf('/path/to/user/pruning/yaml') -prune = Pruning(conf) +from neural_compressor.experimental import Pruning +prune = Pruning('/path/to/user/pruning/yaml') prune.model = model model = prune.fit() ``` -### User-defined yaml +#### User-defined yaml The user-defined yaml follows below syntax, note `train` section is optional if user implements `pruning_func` and sets to `pruning_func` attribute of pruning instance. -[user-defined yaml](../docs/pruning.yaml). +User could refer to [the yaml template file](../docs/pruning.yaml) to know field meanings. -#### `train` +##### `train` The `train` section defines the training behavior, including what training hyper-parameter would be used and which dataloader is used during training. -#### `approach` +##### `approach` The `approach` section defines which pruning algorithm is used and how to apply it during training process. @@ -103,13 +115,13 @@ The `approach` section defines which pruning algorithm is used and how to apply - `Pruner`: - - `prune_type`: pruning algorithm, currently ``basic_magnitude`` and ``gradient_sensitivity`` are supported. + - `prune_type`: pruning algorithm, currently ``basic_magnitude``, ``gradient_sensitivity`` and ``group_lasso``are supported. - `names`: weight name to be pruned. If no weight is specified, all weights of the model will be pruned. - - `parameters`: Additional parameters is required ``gradient_sensitivity`` prune_type, which is defined in ``parameters`` field. Those parameters determined how a weight is pruned, including the pruning target and the calculation of weight's importance. it contains: + - `parameters`: Additional parameters is required ``gradient_sensitivity`` prune_type, which is defined in ``parameters`` field. Those parameters determined how a weight is pruned, including the pruning target and the calculation of weight's importance. It contains: - - `target`: the pruning target for weight. + - `target`: the pruning target for weight, will override global config `target_sparsity` if set. - `stride`: each stride of the pruned weight. - `transpose`: whether to transpose weight before prune. - `normalize`: whether to normalize the calculated importance. @@ -119,18 +131,32 @@ The `approach` section defines which pruning algorithm is used and how to apply Take above as an example, if we assume the 'bert.encoder.layer.0.attention.output.dense.weight' is the shape of [N, 12\*64]. The target 8 and stride 64 is used to control the pruned weight shape to be [N, 8\*64]. `Transpose` set to True indicates the weight is pruned at dim 1 and should be transposed to [12\*64, N] before pruning. `importance_input` and `importance_metric` specify the actual input and metric to calculate importance matrix. +### Usage 2: Launch pruning with user-defined pruning function -### Pruning with user-defined pruning_func() +#### Launcher code -User can pass the customized training/evaluation functions to `Pruning` for flexible scenarios. `Pruning` In this case, pruning process can be done by pre-defined hooks in Neural Compressor. User needs to put those hooks inside the training function. +In this case, the launcher code is like the following: -Neural Compressor defines several hooks for user pass +```python +from neural_compressor.experimental import Pruning, common +prune = Pruning(args.config) +prune.model = model +prune.pruning_func = pruning_func +model = prune.fit() +``` + +#### User-defined pruning function + +User can pass the customized training/evaluation functions to `Pruning` for flexible scenarios. In this case, pruning process can be done by pre-defined hooks in Neural Compressor. User needs to put those hooks inside the training function. + +Neural Compressor defines several hooks for user use: ``` on_epoch_begin(epoch) : Hook executed at each epoch beginning on_batch_begin(batch) : Hook executed at each batch beginning on_batch_end() : Hook executed at each batch end on_epoch_end() : Hook executed at each epoch end +on_post_grad() : Hook executed after gradients calculated and before backward ``` Following section shows how to use hooks in user pass-in training function which is part of example from BERT training: @@ -169,41 +195,6 @@ def pruning_func(model): ... ``` -In this case, the launcher code is like the following: - -```python -from neural_compressor.experimental import Pruning, common -prune = Pruning(args.config) -prune.model = model -prune.pruning_func = pruning_func -model = prune.fit() -``` - -### Scheduler for Pruning and Quantization - -Neural Compressor defined Scheduler to automatically pipeline execute prune and post-training quantization. After appending separate component into scheduler pipeline, scheduler executes them one by one. In following example it executes the pruning and then post-training quantization. - -```python -from neural_compressor.experimental import Quantization, common, Pruning, Scheduler -prune = Pruning(prune_conf) -quantizer = Quantization(post_training_quantization_conf) -scheduler = Scheduler() -scheduler.model = model -scheduler.append(prune) -scheduler.append(quantizer) -opt_model = scheduler.fit() -``` - ## Examples -### Examples in Neural Compressor -Following examples are supported in Neural Compressor: - -- CNN Examples: - - [resnet example](../examples/pytorch/image_recognition/torchvision_models/pruning/magnitude/eager/README.md): magnitude pruning on resnet. - - [pruning and post-training quantization](../examples/pytorch/image_recognition/torchvision_models/optimization_pipeline/prune_and_ptq/eager/README.md): magnitude pruning and then post-training quantization on resnet. - - [resnet_v2 example](../examples/tensorflow/image_recognition/resnet_v2/pruning/magnitude/README.md): magnitude pruning on resnet_v2 for tensorflow. -- NLP Examples: - - [BERT example](../examples/pytorch/nlp/huggingface_models/text-classification/pruning/magnitude/eager/README.md): magnitude pruning on DistilBERT. - - [BERT example](../examples/pytorch/nlp/huggingface_models/text-classification/pruning/pattern_lock/eager/README.md): Pattern-lock and head-pruning on BERT-base. - +For related examples, please refer to [Pruning examples](../examples/README.md). diff --git a/docs/pruning.yaml b/docs/pruning.yaml deleted file mode 100644 index a290fe285f8..00000000000 --- a/docs/pruning.yaml +++ /dev/null @@ -1,63 +0,0 @@ -``` -pruning: - train: # Section "train" is optional. If user implements `pruning_func` and pass to `pruning_func` attribute of pruning instance, skip this section. - start_epoch: 0 - end_epoch: 10 - iteration: 100 - frequency: 2 - - dataloader: - batch_size: 256 - dataset: - ImageFolder: - root: /path/to/imagenet/train - transform: - RandomResizedCrop: - size: 224 - RandomHorizontalFlip: - ToTensor: - Normalize: - mean: [0.485, 0.456, 0.406] - std: [0.229, 0.224, 0.225] - criterion: - CrossEntropyLoss: - reduction: None - optimizer: - SGD: - learning_rate: 0.1 - momentum: 0.9 - weight_decay: 0.0004 - nesterov: False - - approach: - weight_compression: - initial_sparsity: 0.0 - target_sparsity: 0.3 - pruners: - - !Pruner - initial_sparsity: 0.0 - target_sparsity: 0.97 - start_epoch: 0 - end_epoch: 2 - prune_type: basic_magnitude - update_frequency: 0.1 - names: ['layer1.0.conv1.weight'] - - !Pruner - start_epoch: 0 - end_epoch: 1 - prune_type: gradient_sensitivity - update_frequency: 1 - names: [ - 'bert.encoder.layer.0.attention.output.dense.weight', - ] - parameters: { - target: 8, - transpose: True, - stride: 64, - index: 0, - normalize: True, - importance_inputs: ['head_mask'], - importance_metric: abs_gradient - } - -``` diff --git a/docs/validated_model_list.md b/docs/validated_model_list.md index b0e4bbe6855..a0aee05287b 100644 --- a/docs/validated_model_list.md +++ b/docs/validated_model_list.md @@ -12,7 +12,7 @@ Validated Models - ResNet50 v1.5 + ResNet50 V1.5 TensorFlow Yes Link @@ -29,7 +29,7 @@ Validated Models Link - BERT-large + BERT large TensorFlow Yes Link @@ -40,7 +40,7 @@ Validated Models Link - SSD-ResNet34 + SSD ResNet34 TensorFlow Yes Link @@ -70,1641 +70,1446 @@ Validated Models -## Full Validated Models on Intel Xeon Platinum 8380 Scalable processor +## Validated Quantization Examples -The below tables are models enabled by the Intel® Neural Compressor. +Performance results test on ​​06/07/2022 with Intel Xeon Platinum 8380 Scalable processor, using 1 socket, 4 cores/instance, 10 instances and batch size 1. -Performance varies by use, configuration and other factors. See backup for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks +Performance varies by use, configuration and other factors. See [platform configuration](./platform_configuration.md) for configuration details. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks -Performance results are based on testing as of ​​04/08/2022 and may not reflect all publicly available ​updates. No product or component can be absolutely secure. - -Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products. - -Your costs may vary. - -Intel technologies may require enabled hardware, software or service activation. - -© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others​.​​ - -### TensorFlow 2.x models +### TensorFlow models with Intel TensorFlow 2.9.1 - - - + - + + - + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - + - - - + + + + - - - + - - - + + + + - - - + - - + + + + + + + + + + + + + - - - + + + + + + + + + + + - - - + + + + - - - + - - - + + + + - - - - - - - - - + + + + + + + + - - - - - - - - + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + - - - - - - - - - + + + + + + + + - - - + - - - + + + + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + +
FrameworkversionmodelModel AccuracyPerformance
1s4c10ins1bs/throughput
(samples/sec)
Performance
throughput (samples/sec)
Example
INT8 FP32Acc Ratio[(INT8-FP32)/FP32]Accuracy Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
BERT large SQuAD92.3992.99-0.64%25.3212.532.02xpb
DenseNet12173.57%72.89%0.93%370.52329.741.12xpb
DenseNet16176.24%76.29%-0.07%219.46180.751.21xpb
DenseNet16974.40%74.65%-0.33%301.33259.881.16xpb
Faster R-CNN Inception ResNet V237.98%38.33%-0.91%3.962.341.69xpb
Faster R-CNN Inception ResNet V2 37.84%38.33%-1.28%3.982.311.72xSavedModel
Faster R-CNN ResNet10130.28%30.39%-0.36%7019.983.50xpb
Faster R-CNN ResNet10130.37%30.39%-0.07%70.2616.984.14xSavedModel
intel-tensorflow2.7.0resnet50v1.576.82%76.46%0.47%1239.52433.072.86xInception ResNet V280.44%80.40%0.05%281.79137.912.04xpb
intel-tensorflow2.7.0resnet10177.50%76.45%1.37%874.41352.912.48xInception V170.48%69.74%1.06%2193.17975.62.25xpb
intel-tensorflow2.7.0inception_v2Inception V2 74.36% 73.97% 0.53%1840.78853.522.16x1835.35838.822.19xpb
intel-tensorflow2.7.0inception_v3Inception V3 77.28% 76.75% 0.69%954.63391.352.44x973.42376.32.59xpb
intel-tensorflow2.7.0inception_v4Inception V4 80.40% 80.27% 0.16%580.02202.14575.9200.55 2.87xpb
Mask R-CNN Inception V228.53%28.73%-0.70%132.5150.32.63xpb
intel-tensorflow2.7.0mobilenetv1Mask R-CNN Inception V2 28.53%28.73%-0.70%132.8950.972.61xckpt
MobileNet V1 71.79% 70.96% 1.17%3587.791343.072.67x3545.791191.942.97xpb
intel-tensorflow2.7.0mobilenetv2MobileNet V2 71.89% 71.76% 0.18%2469.921434.871.72x2431.661420.111.71xpb
intel-tensorflow2.7.0ssd_resnet50_v137.86%38.00%-0.37%70.3526.342.67xResNet10177.50%76.45%1.37%877.91355.492.47xpb
intel-tensorflow2.7.0ssd_mobilenet_v122.97%23.13%-0.69%852.80460.33ResNet50 Fashion77.80%78.12%-0.41%3977.52150.68 1.85xpb
intel-tensorflow2.7.0faster_rcnn_inception_resnet_v237.99%38.33%-0.89%4.062.331.74x
intel-tensorflow2.7.0faster_rcnn_resnet101_saved30.37%30.39%-0.07%69.6917.713.94x
intel-tensorflow2.7.0mask_rcnn_inception_v228.54%28.72%-0.63%123.9753.232.33xResNet50 V1.074.11%74.27%-0.22%1509.64472.663.19xpb
intel-tensorflow2.7.0wide_deep_large_ds77.62%77.67%-0.07%22704.1621249.521.07xResNet50 V1.576.82%76.46%0.47%1260.01415.833.03xpb
intel-tensorflow2.7.0vgg1672.66%70.89%2.50%669.62178.753.75xResNet V2 10172.67%71.87%1.11%436.52318.31.37xpb
intel-tensorflow2.7.0vgg1972.72%71.01%2.41%558.43148.193.77xResNet V2 15273.03%72.37%0.91%306.82221.41.39xpb
intel-tensorflow2.7.0resnetv2_50ResNet V2 50 70.33% 69.64% 0.99%765.73580.541.32x
intel-tensorflow2.7.0densenet12173.57%72.89%0.93%366.59296.631.24x
intel-tensorflow2.7.0densenet16176.24%76.29%-0.07%218.26164.481.33x
intel-tensorflow2.7.0densenet16974.40%74.65%-0.33%294.82253.351.16x749.85574.191.31xpb
intel-tensorflow2.7.0ssd_resnet50_v1_ckpt37.81%38.00%-0.50%70.4721.793.23xSSD MobileNet V122.97%23.13%-0.69%952.9582.871.63xpb
intel-tensorflow2.7.0ssd_mobilenet_v1_ckptSSD MobileNet V1 22.99% 23.13% -0.61%852.49386.902.20x954.92413.242.31xckpt
intel-tensorflow2.7.0mask_rcnn_inception_v2_ckpt28.54%28.72%-0.63%131.4351.092.57x
intel-tensorflow2.7.0resnet50v1.074.11%74.27%-0.22%1543.95501.613.08x
intel-tensorflow2.7.0ssd_resnet34SSD ResNet34 21.69% 22.09% -1.81%43.7111.783.71x44.4611.813.76xpb
intel-tensorflow2.7.0inception_v170.48%69.74%1.06%2227.691051.642.12x
intel-tensorflow2.7.0faster_rcnn_inception_resnet_v2_saved37.90%38.33%-1.12%4.052.331.74x
intel-tensorflow2.7.0faster_rcnn_resnet10130.28%30.39%-0.36%69.7419.903.50xSSD ResNet50 V137.86%38.00%-0.37%69.526.042.67xpb
intel-tensorflow2.7.0resnetv2_10172.67%71.87%1.11%444.06329.701.35xSSD ResNet50 V137.81%38.00%-0.50%69.2721.173.27xckpt
intel-tensorflow2.7.0inception_resnet_v280.44%80.40%0.05%284.40143.731.98xVGG1672.66%70.89%2.50%660.46177.853.71xpb
intel-tensorflow2.7.0resnetv2_15273.03%72.37%0.91%319.08223.371.43xVGG1972.72%71.01%2.41%562.04147.613.81xpb
intel-tensorflow2.7.0resnet50_fashion77.80%78.12%-0.41%3953.562170.491.82xWide & Deep77.62%77.67%-0.07%21332.4719714.081.08xpb
- -### Intel-tensorflow 1.x models +### PyTorch models with Torch 1.11.0+cpu in PTQ mode - - - + - + + - + - + - - - - - - - - - + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + - - - - - - - - + + + + + + + - -
FrameworkversionmodelModel AccuracyPerformance
1s4c10ins1bs/throughput
(samples/sec)
Performance
throughput (samples/sec)
Example
INT8 FP32Acc Ratio[(INT8-FP32)/FP32]Accuracy Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
intel-tensorflow1.15.0-up3bert_large_squad92.4292.98-0.61%25.9912.552.07xALBERT base MRPC88.06%88.50%-0.50%34.2829.541.16xeager
intel-tensorflow1.15.0-up3bert_base_mrpc86.52%86.52%0.00%266.15145.021.84x
intel-tensorflow1.15.0-up3resnet_v1_50_slim76.38%75.18%1.60%1515.24409.443.70x
intel-tensorflow1.15.0-up3resnet_v1_101_slim77.52%76.40%1.47%837.49224.573.73x
intel-tensorflow1.15.0-up3resnet_v1_152_slim77.08%76.81%0.35%587.75152.393.86x
intel-tensorflow1.15.0-up3inception_v1_slim70.49%69.77%1.03%1968.87803.532.45x
intel-tensorflow1.15.0-up3inception_v2_slim74.35%73.98%0.50%1591.25658.542.42x
intel-tensorflow1.15.0-up3inception_v3_slim78.32%77.99%0.42%941.48285.173.30x
intel-tensorflow1.15.0-up3inception_v4_slim80.30%80.19%0.14%512.74143.423.58x
intel-tensorflow1.15.0-up3vgg16_slim72.78%70.89%2.67%609.29151.154.03xBarthez MRPC82.99%83.81%-0.97%166.8489.561.86xeager
intel-tensorflow1.15.0-up3vgg19_slim72.60%71.01%2.24%510.33122.874.15x
intel-tensorflow1.15.0-up3resnetv2_50_slim70.47%69.72%1.08%823.59470.801.75x
intel-tensorflow1.15.0-up3resnetv2_101_slim72.62%71.91%0.99%471.451247.6271.90xBERT base COLA58.80%58.84%-0.07%260126.472.06xfx
intel-tensorflow1.15.0-up3resnetv2_152_slim72.95%72.40%0.76%339.192170.545BERT base MRPC90.28%90.69%-0.45%251.79126.46 1.99xfx
- - -### PyTorch models - - - - - - - - + + + + + + + + - - - - - - + + + + + + + + - - - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - + - - - + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - + + + - - - + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - + + + + + + + - - - - - - - - - + + + + + + + + - - - + + + + + + + + + + + - - + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + +
FrameworkversionmodelAccuracyPerformance
1s4c10ins1bs/throughput
(samples/sec)
BERT base RTE69.31%69.68%-0.52%252.14126.451.99xfx
INT8FP32Acc Ratio[(INT8-FP32)/FP32]INT8FP32Performance Ratio[INT8/FP32]BERT base SST291.97%91.86%0.12%258.98126.422.05xfx
pytorch1.10.0+cpuse_resnext50_32x4d79.04%79.08%-0.05%350.90171.322.05xBERT base STSB89.13%89.75%-0.68%249.57126.391.97xfx
pytorch1.10.0+cpumobilenet_v270.54%71.84%-1.81%707.15490.611.44xBERT large COLA62.88%62.57%0.49%88.7536.72.42xfx
pytorch1.10.0+cpurnnt92.4892.54-0.07%75.7420.443.71xBERT large MRPC89.93%90.38%-0.49%89.4336.622.44xfx
pytorch1.10.0+cpubarthez_mrpc82.99%83.81%-0.97%155.8089.411.74xBERT large QNLI90.96%91.82%-0.94%91.27372.47xfx
pytorch1.10.0+cpulongformer_mrpc90.59%91.46%-0.95%21.2917.151.24xBERT large RTE71.84%72.56%-1.00%77.6236.012.16xfx
pytorch1.10.0+cpuresnet1869.57%69.76%-0.27%749.77377.161.99xCamemBERT base MRPC86.56%86.82%-0.30%241.39124.771.93xeager
pytorch1.10.0+cpuresnet5075.98%76.15%-0.21%487.25199.642.44xDeberta MRPC91.17%90.91%0.28%152.0985.131.79xeager
pytorch1.10.0+cpuresnext101_32x8d79.03%79.31%-0.35%198.9473.882.69xDistilBERT base MRPC88.66%89.16%-0.56%415.09246.91.68xeager
pytorch1.10.0+cpuresnet18_qat69.74%69.76%-0.03%750.71379.571.98xDistilBERT base MRPC88.74%89.16%-0.47%459.93245.331.87xfx
pytorch1.10.0+cpuresnet50_qat76.04%76.15%-0.14%478.44197.692.42xFlauBERT MRPC81.01%80.19%1.01%644.05457.321.41xeager
pytorch1.10.0+cpuinception_v3Inception V3 69.43% 69.52% -0.13%433.36216.312.00x454.3213.72.13xeager
pytorch1.10.0+cpupeleenet71.64%72.10%-0.64%479.00377.541.27xLongformer MRPC90.59%91.46%-0.95%21.5117.451.23xeager
pytorch1.10.0+cpuyolo_v324.60%24.54%0.21%105.8439.802.66xMask R-CNN37.70%37.80%-0.26%17.615.763.06xeager
pytorch1.10.0+cpublendcnn68.40%68.40%mBart WNLI56.34%56.34% 0.00%4997.744621.031.08x65.0531.262.08xeager
pytorch1.10.0+cpuroberta_base_mrpc87.88%88.18%-0.34%246.27125.031.97xMobileNet V270.54%71.84%-1.81%740.97535.541.38xeager
pytorch1.10.0+cpucamembert_base_mrpc86.56%86.82%-0.30%236.17124.681.89xlvwerra/pegasus-samsum42.2142.67-1.09%3.891.143.41xeager
pytorch1.10.0+cpudistilbert_base_mrpc88.66%89.16%-0.56%422.29246.371.71xPeleeNet71.64%72.10%-0.64%502.01391.311.28xeager
pytorch1.10.0+cpualbert_base_mrpc88.06%88.50%-0.50%34.4428.851.19xResNet18 69.57%69.76%-0.27%800.43381.272.10xeager
pytorch1.10.0+cpupegasus_samsum42.2042.67-1.09%3.801.143.33xResNet18 69.57%69.76%-0.28%811.09389.362.08xfx
pytorch1.10.0+cpuflaubert_mrpc81.01%80.19%1.01%672.25457.051.47xResNet5075.98%76.15%-0.21%507.55200.522.53xeager
pytorch1.10.0+cpudeberta_mrpc91.17%90.91%0.28%131.0979.851.64xResNeXt101_32x8d79.08%79.31%-0.29%203.5473.852.76xeager
pytorch1.10.0+cpusqueezebert_mrpc87.77%87.65%0.14%239.56209.011.15xRNN-T92.4592.55-0.10%79.2120.473.87xeager
pytorch1.10.0+cpuresnet18_fx69.57%69.76%-0.28%761.15379.99Roberta Base MRPC87.88%88.18%-0.34%250.21124.92 2.00xeager
pytorch1.10.0+cpuresnet18_qat_fx69.73%69.76%-0.04%765.09377.012.03xSe_ResNeXt50_32x4d78.98%79.08%-0.13%358.63173.032.07xeager
pytorch1.10.0+cputransfo_xl_mrpcSqueezeBERT MRPC87.77%87.65%0.14%249.89207.431.20xeager
Transfo-xl MRPC 81.97% 81.20% 0.94%11.108.2211.258.34 1.35xeager
pytorch1.10.0+cpubert_base_mrpc90.28%90.69%-0.45%241.46125.091.93x
pytorch1.10.0+cpubert_base_cola58.80%58.84%-0.07%253.12125.172.02x
pytorch1.10.0+cpubert_base_sts-b89.13%89.75%-0.68%243.50124.541.96x
pytorch1.10.0+cpubert_base_sst-291.97%91.86%0.12%252.00121.142.08xYOLOv324.60%24.54%0.21%108.0940.022.70xeager
+ +### PyTorch models with Torch 1.11.0+cpu in QAT mode + + - - - - - - - - - + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + +
pytorch1.10.0+cpubert_large_cola62.88%62.57%0.49%87.8836.932.38xModelAccuracyPerformance
throughput (samples/sec)
Example
pytorch1.10.0+cpubert_base_rte69.31%69.68%-0.52%244.20125.711.94xINT8FP32Accuracy Ratio[(INT8-FP32)/FP32]INT8FP32Performance Ratio[INT8/FP32]
pytorch1.10.0+cpubert_large_mrpc89.93%90.38%-0.49%87.4436.712.38xResNet1869.74%69.76%-0.03%804.76388.672.07xeager
pytorch1.10.0+cpubert_large_qnli90.96%91.82%-0.94%89.1836.872.42xResNet1869.73%69.76%-0.04%806.44386.592.09xfx
pytorch1.10.0+cpubert_large_rte71.84%72.56%-1.00%75.9136.722.07xBERT base MRPC QAT89.60%89.50%0.11%258.89125.792.06xfx
pytorch1.10.0+cpumbart_wnli56.34%56.34%0.00%65.2431.062.10xResNet5076.04%76.15%-0.14%490.64203.492.41xeager
-### PyTorch models along with ipex +### PyTorch models with IPEX 1.11.0 - - - + - + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + - - - + + + + + + + + + + + + + + - - - - + + - - - - + + + + +
FrameworkversionmodelModel AccuracyPerformance
1s4c10ins1bs/throughput
(samples/sec)
Performance
throughput (samples/sec)
Example
INT8 FP32Acc Ratio[(INT8-FP32)/FP32]Accuracy Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
pytorch1.10.0+cpuresnet50_ipex76.14%76.15%0.00%654.50202.313.24x
pytorch1.10.0+cpubert_large_ipex92.7793.16-0.41%29.7413.612.18x
pytorch1.10.0+cpuresnext101_32x16d_wsl_ipex
bert-large-uncased-whole-word-masking-finetuned-squad 92.993.16-0.28%37.1311.453.24xipex
ResNeXt101_32x16d_wsl 84.02% 84.17% -0.18%157.7828.545.53x163.4528.95.66xipex
ResNet5076.00%76.15%-0.20%707.86202.023.51xipex
pytorch1.10.0+cpussd_resnet34_ipex19.95%SSD ResNet3419.97% 20.00%-0.25%30.508.503.59x-0.15%30.848.553.61xipex
- -### MXNet models +### ONNX Models with ONNX Runtime 1.11.0 - - - + - + + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - -
FrameworkversionmodelModel AccuracyPerformance
1s4c10ins1bs/throughput
(samples/sec)
Performance
throughput (samples/sec)
Example
INT8 FP32Acc Ratio[(INT8-FP32)/FP32]Accuracy Ratio[(INT8-FP32)/FP32] INT8 FP32 Performance Ratio[INT8/FP32]
mxnet1.7.0inceptionv377.80%77.65%0.20%918.73238.903.85x
mxnet1.7.0squeezenet1.056.80%56.97%-0.28%4693.551272.503.69x
mxnet1.7.0ssd-mobilenet1.074.94%75.54%-0.79%771.65189.814.07x
mxnet1.7.0resnet152_v178.28%78.54%-0.33%574.23126.784.53xAlexNet54.74%54.79%-0.09%1518.97676.742.24xqlinearops
AlexNet 54.74%54.79%-0.09%1411.3652.62.16xqdq
BERT base MRPC DYNAMIC85.54%86.03%-0.57%379.71156.162.43xqlinearops
BERT base MRPC STATIC85.29%86.03%-0.86%756.33316.362.39xqlinearops
BERT SQuAD80.4480.67-0.29%115.5864.711.79xqlinearops
BERT SQuAD80.4480.67-0.29%115.464.681.78xqdq
CaffeNet56.19%56.30%-0.20%2786.79802.73.47xqlinearops
CaffeNet56.19%56.30%-0.20%2726.86819.413.33xqdq
DenseNet60.20%60.96%-1.25%404.83340.631.19xqlinearops
DistilBERT base MRPC84.56%84.56%0.00%1630.41596.682.73xqlinearops
EfficientNet77.58%77.70%-0.15%1985.351097.331.81xqlinearops
Faster R-CNN33.99%34.37%-1.11%10.024.322.32xqlinearops
Faster R-CNN33.94%34.37%-1.25%10.414.282.43xqdq
- - -### ONNX Models - - - - - - - - + + + + + + + + - - - - - - + + + + + + + + - - - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - + - - - - + + + + + + - - - - - - - - - + + + + + + + + - - - + - - + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - + + + + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - + + + + - - - - - - - - - + + + + + + + + - - - + - - - + + + + - - - - - - - - - + + + + + + + + - - - + - - + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - + - - - - - - - - - - - - - - + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - + - - - - - + + + + + + - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - + + + - - - + + + + - - - - - - - - - + + + + + + + + + +
FrameworkversionmodelAccuracyPerformance
1s4c10ins1bs/throughput
(samples/sec)
FCN64.66%64.98%-0.49%44.3114.23.12xqlinearops
INT8FP32Acc Ratio[(INT8-FP32)/FP32]INT8FP32Performance Ratio[INT8/FP32]FCN64.66%64.98%-0.49%18.1114.191.28xqdq
onnxrt-runtime1.10.0alexnet54.74%54.79%-0.09%1505.75656.812.29xGoogleNet67.61%67.79%-0.27%1165.84810.651.44xqlinearops
onnxrt-runtime1.10.0zfnet55.89%55.96%-0.13%661.16353.201.87xGoogleNet67.61%67.79%-0.27%1165.73809.981.44xqdq
onnxrt-runtime1.10.0efficientnet77.58%77.70%-0.15%2065.721094.771.89xInception V167.23%67.24%-0.01%1205.89838.711.44xqlinearops
onnxrt-runtime1.10.0squeezenet_qdq56.55%56.87%-0.56%5965.784300.121.39xInception V167.23%67.24%-0.01%1204.93843.161.43xqdq
onnxrt-runtime1.10.0ssd-12_qdq18.38%18.98%-3.16%42.2411.123.80xMask R-CNN33.40%33.72%-0.95%8.563.762.27xqlinearops
onnxrt-runtime1.10.0resnet50_v1_572.28%72.29%-0.01%1166.31554.342.10xMask R-CNN33.33%33.72%-1.16%8.43.812.20xqdq
onnxrt-runtime1.10.0bert_base_mrpc_static85.29%Mobile bert MRPC 86.03%-0.86%766.46315.222.43x86.27%-0.28%790.11686.351.15xqlinearops
onnxrt-runtime1.10.0bert_base_mrpc_dynamic85.54%86.03%-0.57%381.30155.902.45xMobileBERT SQuAD MLPerf89.8490.03-0.20%102.9295.191.08xqlinearops
onnxrt-runtime1.10.0mobilenet_v2MobileNet V2 65.47% 66.89% -2.12%5128.933390.195133.843394.73 1.51xqlinearops
onnxrt-runtime1.10.0ssd_mobilenet_v122.20%23.10%-3.90%914.92703.741.30xMobileNet V265.47%66.89%-2.12%5066.313386.31.50xqdq
onnxrt-runtime1.10.0ssd_mobilenet_v223.83%24.68%-3.44%718.28501.311.43xMobileNet V3 MLPerf75.59%75.74%-0.20%4133.222132.921.94xqlinearops
onnxrt-runtime1.10.0distilbert_base_mrpc84.56%84.56%0.00%1675.94594.272.82xMobileNetV2 (ONNX Model Zoo)68.30%69.48%-1.70%5349.423373.291.59xqlinearops
onnxrt-runtime1.10.0mobilebert_mrpc85.54%86.27%-0.85%766.00684.301.12xResNet50 V1.5 MLPerf76.13%76.46%-0.43%1139.56549.882.07xqlinearops
onnxrt-runtime1.10.0resnet50-v1-12ResNet50 V1.572.28%72.29%-0.01%1165.35556.022.10xqlinearops
ResNet50 V1.572.28%72.29%-0.01%1319.32543.442.43xqdq
ResNet50 V1.5 (ONNX Model Zoo) 74.76% 74.99% -0.31%1380.38581.362.37x
onnxrt-runtime1.10.0resnet_v1_5_mlperf76.13%76.46%-0.43%1143.13550.772.08x1363.39573.12.38xqlinearops
onnxrt-runtime1.10.0mobilenet_v3_mlperf75.59%75.74%-0.20%4121.332135.311.93xRoberta Base MRPC90.44%89.95%0.54%811.05312.712.59xqlinearops
onnxrt-runtime1.10.0shufflenet-v2-12ShuffleNet V2 66.13% 66.36% -0.35%4901.742853.371.72x4948.772847.661.74xqlinearops
onnxrt-runtime1.10.0googlenet-1267.61%67.79%-0.27%1030.75805.761.28xSqueezeNet56.55%56.87%-0.56%6296.794340.511.45xqlinearops
onnxrt-runtime1.10.0squeezenetSqueezeNet 56.55% 56.87% -0.56%6119.014321.716227.764383.8 1.42xqdq
onnxrt-runtime1.10.0caffenet56.19%56.30%-0.20%2644.16810.133.26x
onnxrt-runtime1.10.0inception_v167.23%67.24%-0.01%1059.31848.191.25x
onnxrt-runtime1.10.0fcn64.66%64.98%-0.49%44.4814.233.13xSSD MobileNet V122.20%23.10%-3.90%917.64709.481.29xqlinearops
onnxrt-runtime1.10.0ssd-1218.84%18.98%-0.74%41.9811.113.78xSSD MobileNet V122.20%23.10%-3.90%840.99655.991.28xqdq
onnxrt-runtime1.10.0ssd_mobilenet_v1-2SSD MobileNet V1 (ONNX Model Zoo) 22.88% 23.03% -0.65%836.01652.271.28x
onnxrt-runtime1.10.0faster_rcnn33.99%34.37%-1.11%9.234.282.16x845.17666.251.27xqlinearops
onnxrt-runtime1.10.0mobilenetv2-1268.30%69.48%-1.70%5314.593369.521.58xSSD MobileNet V1 (ONNX Model Zoo)22.88%23.03%-0.65%790.06624.21.27xqdq
onnxrt-runtime1.10.0mask_rcnn33.40%33.72%-0.95%7.883.942.00xSSD MobileNet V223.83%24.68%-3.44%703.55506.61.39xqlinearops
onnxrt-runtime1.10.0yolov326.88%28.74%-6.47%157.8564.932.43xSSD18.68%18.98%-1.58%41.9911.123.78xqdq
onnxrt-runtime1.10.0densenet60.20%60.96%-1.25%408.55340.821.20xTiny YOLOv312.08%12.43%-2.82%836.21659.691.27xqlinearops
onnxrt-runtime1.10.0yolov430.95%32.78%-5.58%53.5128.661.87xVGG1666.60%66.69%-0.13%312.48128.982.42xqlinearops
onnxrt-runtime1.10.0resnet50_v1_5_qdqVGG16 (ONNX Model Zoo) 72.28%72.29%-0.01%1271.61543.582.34x72.40%-0.17%446.13131.043.40xqlinearops
onnxrt-runtime1.10.0mobilenet_v2_qdq65.47%66.89%-2.12%5069.543404.881.49xYOLOv326.88%28.74%-6.47%157.3966.722.36xqlinearops
onnxrt-runtime1.10.0ssd_mobilenet_v1_qdq22.25%23.10%-3.68%803.63644.181.25xYOLOv433.18%33.71%-1.57%58.5538.091.54xqlinearops
onnxrt-runtime1.10.0vgg1666.60%66.69%ZFNet55.89%55.96% -0.13%310.23128.812.41x664.37358.621.85xqlinearops
onnxrt-runtime1.10.0roberta_base_mrpc89.22%89.95%-0.81%766.66316.242.42xZFNet55.89%55.96%-0.13%666.99354.381.88xqdq
+ +### MXNet models with MXNet 1.7.0 + + + - - - - - - - - - + + + - - - - - - - - - + + + + + + - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
onnxrt-runtime1.10.0bert_squad_model_zoo80.4380.67-0.29%115.7864.691.79xModelAccuracyPerformance
throughput (samples/sec)
onnxrt-runtime1.10.0mobilebert_squad_mlperf89.8490.02-0.20%102.8295.171.08xINT8FP32Accuracy Ratio[(INT8-FP32)/FP32]INT8FP32Performance Ratio[INT8/FP32]
onnxrt-runtime1.10.0vgg16_model_zoo72.28%72.40%-0.17%447.28129.593.45x
Inception V377.80%77.65%0.20%920.74276.733.33x
MobileNet V171.60%72.23%-0.86%6585.192529.212.60x
MobileNet V270.80%70.87%-0.10%5230.321996.472.62x
ResNet V1 15278.28%78.54%-0.33%574.85156.23.68x
ResNet50 V1.075.91%76.33%-0.55%1567.9427.993.66x
SqueezeNet56.80%56.97%-0.28%4704.511332.293.53x
SSD MobileNet V174.94%75.54%-0.79%769.26193.033.99x
-### BACKUP - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
System ConfigurationIntel Xeon Platinum 8380 Scalable processor
Test DateSat 30 Apr 2022 UTC
ManufacturerIntel Corporation
Product NameM50CYP2SBSTD
BIOS VersionSE5C6200.86B.0022.D64.2105220049
OSUbuntu 20.04.1 LTS
Kernel5.4.0-42-generic
Microcode0xd0002b1
CPU ModelIntel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
Base Frequency2.3GHZ
Thread(s) per Core2
Core(s) per Socket40
Socket(s)2
TurboEnabled
Power & Perf PolicyBalanced
Installed256GB (16x16GB DDR4 3200MT/s [3200MT/s])
NIC Summary2x Ethernet Controller 10G X550T
Drive Summary1x INTEL_SSDSC2KW01 953.9G, -1x CT1000MX500SSD1 931.5G, -1x CT1000MX500SSD1 931.5G -
- -## Validated Pruning Models +## Validated Pruning Examples - + - - - + + + - - - - - - + + + + + + - - + + @@ -1715,8 +1520,8 @@ Intel technologies may require enabled hardware, software or service activation. - - + + @@ -1732,24 +1537,24 @@ Intel technologies may require enabled hardware, software or service activation. - + - + - - - - + + + + - - + + @@ -1758,8 +1563,8 @@ Intel technologies may require enabled hardware, software or service activation. - - + + @@ -1768,8 +1573,8 @@ Intel technologies may require enabled hardware, software or service activation. - - + + @@ -1778,8 +1583,8 @@ Intel technologies may require enabled hardware, software or service activation. - - + + @@ -1788,8 +1593,8 @@ Intel technologies may require enabled hardware, software or service activation. - - + + @@ -1804,50 +1609,50 @@ Intel technologies may require enabled hardware, software or service activation. - + - - + + - - + + - + - - + + - + - - + + - + - - + + - + - - + + - + @@ -1878,9 +1683,9 @@ Intel technologies may require enabled hardware, software or service activation. - + - + @@ -1897,3 +1702,5 @@ Intel technologies may require enabled hardware, software or service activation.
TasksFWKFramework Modelfp32 baselinegradient sensitivity with 20% sparsity+onnx dynamic quantization on pruned modelFP32 BaselineGradient Sensitivity with 20% Sparsity+ONNX Dynamic Quantization on Pruned Model
accuracy% drop%perf gain (sample/s)accuracy% drop%perf gain (sample/s)Accuracy%DropPerf Gain (sample/s)Accuracy%DropPerf Gain (sample/s)
SST-2pytorchbert-basePyTorchBERT base accuracy = 92.32 accuracy = 91.97 -0.38
QQPpytorchbert-basePyTorchBERT base [accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [89.97, 86.54] [-1.24, -1.71]
TasksFWKFramework Modelfp32 baselineFP32 Baseline Pattern Lock on 70% Unstructured Sparsity Pattern Lock on 50% 1:2 Structured Sparsity
accuracy% drop%accuracy% drop%Accuracy%DropAccuracy%Drop
MNLIpytorchbert-basePyTorchBERT base [m, mm] = [84.57, 84.79] [m, mm] = [82.45, 83.27] [-2.51, -1.80]
SST-2pytorchbert-basePyTorchBERT base accuracy = 92.32 accuracy = 91.51 -0.88
QQPpytorchbert-basePyTorchBERT base [accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.48, 87.06] [-0.68, -1.12]
QNLIpytorchbert-basePyTorchBERT base accuracy = 91.54 accuracy = 90.39 -1.26
QnApytorchbert-basePyTorchBERT base [em, f1] = [79.34, 87.10] [em, f1] = [77.27, 85.75] [-2.61, -1.54]
Framework Modelfp32 baselineFP32 Baseline Compressiondatasetacc(drop)%DatasetAccuracy% (Drop)
Pytorchresnet18PyTorchResNet18 69.7630% sparsity on magnitude30% Sparsity on Magnitude ImageNet 69.47(-0.42)
Pytorchresnet18PyTorchResNet18 69.7630% sparsity on gradient sensitivity30% Sparsity on Gradient Sensitivity ImageNet 68.85(-1.30)
Pytorchresnet50PyTorchResNet50 76.1330% sparsity on magnitude30% Sparsity on Magnitude ImageNet 76.11(-0.03)
Pytorchresnet50PyTorchResNet50 76.1330% sparsity on magnitude and post training quantization30% Sparsity on Magnitude and Post Training Quantization ImageNet 76.01(-0.16)
Pytorchresnet50PyTorchResNet50 76.1330% sparsity on magnitude and quantization aware training30% Sparsity on Magnitude and Quantization Aware Training ImageNet 75.90(-0.30)
BlendCnn exampleBlendCNN example MRPCBlendCnn
(0.7034)
BlendCNN
(0.7034)
BERT-Base
(0.8382)
0.7034
(0)
+ + diff --git a/examples/README.md b/examples/README.md new file mode 100644 index 00000000000..1f39b9731a4 --- /dev/null +++ b/examples/README.md @@ -0,0 +1,883 @@ +Examples +=== +Intel® Neural Compressor validated examples with multiple compression techniques, including quantization, pruning, knowledge distillation and orchestration. Part of the validated cases can be found in the example tables, and the release data is available [here](../docs/validated_model_list.md). + +# TensorFlow Examples +## Quantization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelDomainApproachExamples
ResNet50 V1.0Image RecognitionPost-Training Static Quantizationpb
ResNet50 V1.5Image RecognitionPost-Training Static Quantizationpb
ResNet101Image RecognitionPost-Training Static Quantizationpb
MobileNet V1Image RecognitionPost-Training Static Quantizationpb / SavedModel
MobileNet V2Image RecognitionPost-Training Static Quantizationpb / SavedModel
MobileNet V3Image RecognitionPost-Training Static Quantizationpb
Inception V1Image RecognitionPost-Training Static Quantizationpb
Inception V2Image RecognitionPost-Training Static Quantizationpb
Inception V3Image RecognitionPost-Training Static Quantizationpb
Inception V4Image RecognitionPost-Training Static Quantizationpb
Inception ResNet V2Image RecognitionPost-Training Static Quantizationpb
VGG16Image RecognitionPost-Training Static Quantizationpb / keras
VGG19Image RecognitionPost-Training Static Quantizationpb / keras
ResNet V2 50Image RecognitionPost-Training Static Quantizationpb
ResNet V2 101Image RecognitionPost-Training Static Quantizationpb
ResNet V2 152Image RecognitionPost-Training Static Quantizationpb
DenseNet121Image RecognitionPost-Training Static Quantizationpb
DenseNet161Image RecognitionPost-Training Static Quantizationpb
DenseNet169Image RecognitionPost-Training Static Quantizationpb
EfficientNet B0Image RecognitionPost-Training Static Quantizationckpt
MNIST Image RecognitionQuantization-Aware Trainingkeras
ResNet50Image RecognitionPost-Training Static Quantizationkeras
ResNet50 FashionImage RecognitionPost-Training Static Quantizationkeras
ResNet V2Image RecognitionQuantization-Aware Trainingkeras
EfficientNet V2 B0Image RecognitionPost-Training Static QuantizationSavedModel
BERT base MRPCNatural Language ProcessingPost-Training Static Quantizationckpt
BERT large SQuADNatural Language ProcessingPost-Training Static Quantizationpb
Transformer LTNatural Language ProcessingPost-Training Static Quantizationpb
SSD ResNet50 V1Object DetectionPost-Training Static Quantizationpb / ckpt
SSD MobileNet V1Object DetectionPost-Training Static Quantizationpb / ckpt
Faster R-CNN Inception ResNet V2Object DetectionPost-Training Static Quantizationpb / SavedModel
Faster R-CNN ResNet101Object DetectionPost-Training Static Quantizationpb / SavedModel
Mask R-CNN Inception V2Object DetectionPost-Training Static Quantizationpb / ckpt
SSD ResNet34Object DetectionPost-Training Static Quantizationpb
YOLOv3Object DetectionPost-Training Static Quantizationpb
Wide & DeepRecommendationPost-Training Static Quantizationpb
Arbitrary Style TransferStyle TransferPost-Training Static Quantizationckpt
+ +## Pruning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelDomainPruning Type Approach Examples
Inception V3Image RecognitionUnstructuredMagnitudepb
ResNet V2Image RecognitionUnstructuredMagnitudepb
ViTImage RecognitionUnstructuredMagnitudeckpt
+ +## Distillation + + + + + + + + + + + + + + + + + +
Student ModelTeacher ModelDomainExamples
MobileNetDenseNet201Image Recognitionpb
+ +# PyTorch Examples +## Quantization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelDomainApproach Examples
ResNet18Image RecognitionPost-Training Static Quantizationeager / fx
ResNet18Image RecognitionQuantization-Aware Trainingeager / fx
ResNet50Image RecognitionPost-Training Static Quantizationeager / ipex
ResNet50Image RecognitionQuantization-Aware Trainingeager
ResNeXt101_32x16d_wslImage RecognitionPost-Training Static Quantizationipex
ResNeXt101_32x8dImage RecognitionPost-Training Static Quantizationeager
Se_ResNeXt50_32x4dImage RecognitionPost-Training Static Quantizationeager
Inception V3Image RecognitionPost-Training Static Quantizationeager
MobileNet V2Image RecognitionPost-Training Static Quantizationeager
PeleeNetImage RecognitionPost-Training Static Quantizationeager
ResNeSt50Image RecognitionPost-Training Static Quantizationeager
3D-UNetImage RecognitionPost-Training Static Quantizationeager
SSD ResNet34Object DetectionPost-Training Static Quantizationfx / ipex
Mask R-CNNObject DetectionPost-Training Static Quantizationfx
YOLOv3Object DetectionPost-Training Static Quantizationeager
DLRMRecommendationPost-Training Static Quantizationeager / ipex / fx
RNN-TSpeech RecognitionPost-Training Dynamic / Static Quantizationeager / ipex
Wav2Vec2Speech RecognitionPost-Training Dynamic Quantizationeager
HuBERTSpeech RecognitionPost-Training Dynamic Quantizationeager
BlendCNNNatural Language ProcessingPost-Training Static Quantizationeager
bert-large-uncased-whole-word-masking-finetuned-squadNatural Language ProcessingPost-Training Static Quantizationfx / ipex
t5-smallNatural Language ProcessingPost-Training Dynamic Quantizationeager
Helsinki-NLP/opus-mt-en-roNatural Language ProcessingPost-Training Dynamic Quantizationeager
lvwerra/pegasus-samsumNatural Language ProcessingPost-Training Dynamic Quantizationeager
+ +## Pruning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelDomainPruning Type ApproachExamples
ResNet18Image RecognitionUnstructuredMagnitudeeager
ResNet34Image RecognitionUnstructuredMagnitudeeager
ResNet50Image RecognitionUnstructuredMagnitudeeager
ResNet101Image RecognitionUnstructuredMagnitudeeager
BERT largeNatural Language ProcessingStructuredGroup Lassoeager
Intel/bert-base-uncased-sparse-70-unstructuredNatural Language Processing (question-answering)UnstructuredPattern Lockeager
bert-base-uncasedNatural Language ProcessingStructuredGradient Sensitivityeager
DistilBERTNatural Language ProcessingUnstructuredMagnitudeeager
Intel/bert-base-uncased-sparse-70-unstructuredNatural Language Processing (text-classification)UnstructuredPattern Lockeager
+ +## Distillation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Student ModelTeacher ModelDomainExamples
CNN-2CNN-10Image Recognitioneager
MobileNet V2-0.35WideResNet40-2Image Recognitioneager
ResNet18|ResNet34|ResNet50|ResNet101ResNet18|ResNet34|ResNet50|ResNet101Image Recognitioneager
VGG-8VGG-13Image Recognitioneager
BlendCNNBERT baseNatural Language Processingeager
distilbert-base-uncasedcsarron/bert-base-uncased-squad-v1Natural Language Processingeager
BiLSTMtextattack/roberta-base-SST-2Natural Language Processingeager
huawei-noah/TinyBERT_General_4L_312Dblackbird/bert-base-uncased-MNLI-v1Natural Language Processingeager
nreimerstextattack/bert-base-uncased-QQPNatural Language Processingeager
distilroberta-basehowey/roberta-large-colaNatural Language Processingeager
+ +## Orchestration + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelDomainApproachExamples
ResNet50Image RecognitionMulti-shot: Pruning and PTQ
link
ResNet50Image RecognitionOne-shot: QAT during Pruning
link
Intel/bert-base-uncased-sparse-90-unstructured-pruneofaNatural Language Processing (question-answering)One-shot: Pruning, Distillation and QAT
link
Intel/bert-base-uncased-sparse-90-unstructured-pruneofaNatural Language Processing (text-classification)One-shot: Pruning, Distillation and QAT
link
+ +# ONNX Runtime Examples +## Quantization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelDomainApproach Examples
ResNet50 V1.5Image RecognitionPost-Training Static Quantizationqlinearops / qdq
ResNet50 V1.5 MLPerfImage RecognitionPost-Training Static Quantizationqlinearops / qdq
VGG16Image RecognitionPost-Training Static Quantizationqlinearops / qdq
MobileNet V2Image RecognitionPost-Training Static Quantizationqlinearops / qdq
MobileNet V3 MLPerfImage RecognitionPost-Training Static Quantizationqlinearops / qdq
AlexNetImage RecognitionPost-Training Static Quantizationqlinearops / qdq
CaffeNetImage RecognitionPost-Training Static Quantizationqlinearops / qdq
DenseNetImage RecognitionPost-Training Static Quantizationqlinearops
EfficientNetImage RecognitionPost-Training Static Quantizationqlinearops / qdq
FCNImage RecognitionPost-Training Static Quantizationqlinearops / qdq
GoogleNetImage RecognitionPost-Training Static Quantizationqlinearops / qdq
Inception V1Image RecognitionPost-Training Static Quantizationqlinearops / qdq
MNISTImage RecognitionPost-Training Static Quantizationqlinearops
MobileNet V2 (ONNX Model Zoo)Image RecognitionPost-Training Static Quantizationqlinearops / qdq
ResNet50 V1.5 (ONNX Model Zoo)Image RecognitionPost-Training Static Quantizationqlinearops / qdq
ShuffleNet V2Image RecognitionPost-Training Static Quantizationqlinearops / qdq
SqueezeNetImage RecognitionPost-Training Static Quantizationqlinearops / qdq
VGG16 (ONNX Model Zoo)Image RecognitionPost-Training Static Quantizationqlinearops / qdq
ZFNetImage RecognitionPost-Training Static Quantizationqlinearops / qdq
BERT base MRPCNatural Language ProcessingPost-Training Static Quantizationintegerops / qdq
BERT base MRPCNatural Language ProcessingPost-Training Dynamic Quantizationintegerops
DistilBERT base MRPCNatural Language ProcessingPost-Training Dynamic / Static Quantizationintegerops / qdq
Mobile bert MRPCNatural Language ProcessingPost-Training Dynamic / Static Quantizationintegerops / qdq
Roberta base MRPCNatural Language ProcessingPost-Training Dynamic / Static Quantizationintegerops / qdq
BERT SQuADNatural Language ProcessingPost-Training Dynamic / Static Quantizationintegerops / qdq
GPT2 lm head WikiTextNatural Language ProcessingPost-Training Dynamic Quantizationintegerops
MobileBERT SQuAD MLPerfNatural Language ProcessingPost-Training Dynamic / Static Quantizationintegerops / qdq
SSD MobileNet V1Object DetectionPost-Training Static Quantizationqlinearops / qdq
SSD MobileNet V2Object DetectionPost-Training Static Quantizationqlinearops / qdq
SSD MobileNet V1 (ONNX Model Zoo)Object DetectionPost-Training Static Quantizationqlinearops / qdq
DUCObject DetectionPost-Training Static Quantizationqlinearops
Faster R-CNNObject DetectionPost-Training Static Quantizationqlinearops / qdq
Mask R-CNNObject DetectionPost-Training Static Quantizationqlinearops / qdq
SSDObject DetectionPost-Training Static Quantizationqlinearops / qdq
Tiny YOLOv3Object DetectionPost-Training Static Quantizationqlinearops
YOLOv3Object DetectionPost-Training Static Quantizationqlinearops
YOLOv4Object DetectionPost-Training Static Quantizationqlinearops
\ No newline at end of file