Release Intel® Neural Compressor v2.0 Release · intel/neural-compressor

Highlights
Features
Bug Fixes
Examples
Documentations

Highlights

Support the quantization for Intel® Xeon® Scalable Processors (e.g., Sapphire Rapids), Intel® Data Center GPU Flex Series, and Intel® Max Series CPUs & GPUs
Provide the new unified APIs for post-training optimizations (static/dynamic quantization) and during-training optimizations (quantization-aware training, pruning/sparsity, distillation, etc.)
Support the advanced fine-grained auto mixed precisions (AMP) upon all the supported precisions (e.g., INT8, BF16, and FP32)
Improve the model conversion from PyTorch INT8 model to ONNX INT8 model
Support the zero-code quantization in Visual Studio Code and JupyterLab with Neural Coder plugins
Support the quantization for 10K+ transformer-based models including large language models (e.g., T5, GPT, Stable Diffusion, etc.)

Features

[Quantization] Experimental Keras model in, quantized Keras model out (commit 4fa753)
[Quantization] Support quantization for ITEX v1.0 on Intel CPU and Intel GPU (commit a2fcb2)
[Quantization] Support hardware-neutral quantized ONNX QDQ models and validate on multiple devices (Intel CPU, NVidia GPU, AMD CPU, and ARM CPU) through ONNX Runtime
[Quantization] Enhance TensorFlow QAT: remove TFMOT dependency (commit 1deb7d)
[Quantization] Distinguish frameworks, backends and output formats for OnnxRuntime backend (commit 2483a8)
[Quantization] Support PyTorch/IPEX 1.13 and TensorFlow 2.11 (commit b7a2ef)
[AMP] Support more TensorFlow bf16 ops (commit 98d3c8)
[AMP] Add torch.amp bf16 support for IPEX backend (commit 2a361b)
[Strategy] Add accuracy-first tuning strategies: MSE_v2 (commit 80311f) and HAWQ (commit 83018e) to solve the accuracy problem of specific models
[Strategy] Refine the tuning strategy, add more data type, more op attributes like per tensor/per channel, dynamic/static, …etc
[Pruning] Add progressive pruning and pattern lock pruning_type (commit f46bb1)
[Pruning] Add per_channel sparse pattern (commit f46bb1)
[Distillation] Support self-distillation towards efficient and compact neural networks (commit acdd4c)
[Distillation] Enhance API of intermediate layers knowledge distillation (commit 3183f6)
[Neural Coder] Detect devices and ISA to adjust the optimization (commit 691d0b)
[Neural Coder] Automatically quantize with ONNX Runtime backend (commit f711b4)
[Neural Coder] Add Neural Coder Python Launcher (commit 7bb92d)
[Neural Coder] Add Visual Studio Plugin (commit dd39ca)
[Productivity] Support Pruning in GUI (commit d24fea)
[Productivity] Use config-driven API to replace yaml
[Productivity] Export ONNX QLinear to QDQ format (commit e996a9)
[Productivity] Validate 10K+ transformer-based models including large language models (e.g., T5, GPT, Stable Diffusion, etc.)

Bug Fixes

Fix quantization failed of Onnx models with over 2GB model size (commit 8d83cc)
Fix bf16 disabled by default (commit 83825a)
Fix PyTorch DLRM quantization out of memory (commit ff1725)
Fix ITEX resnetv2_50 tuning accuracy (commit ae1e05)
Fix bf16 ops error in QAT when torch version < 1.11 (commit eda8cb)
Fix the key comparison in the Bayesian strategy (commit 1e9c12)
Fix PyTorch T5 can’t do static quantization (commit ee3ef0)

Examples

Add quantization examples of HuggingFace models with OnnxRuntime backend (commit f4aeb5)
Add Big language model quantization example: GPT-J (commit 01899d)
Add Distributed Distillation examples: MobileNetV2 (commit d33ebe) and CNN-2 (commit ebe9e2)
Update examples with INC v2.0 new API
Add Stable Diffusion example

Documentations

Update the accuracy of broad hardware (commit 71b056)
Refine API helper and documents

Validated Configurations

Centos 8.4 & Ubuntu 20.04
Python 3.7, 3.8, 3.9, 3.10
TensorFlow 2.9.3, 2.10.1, 2.11.0, ITEX 1.0
PyTorch/IPEX 1.11.0+cpu, 1.12.1+cpu, 1.13.0+cpu
ONNX Runtime 1.11.0, 1.12.1, 1.13.1
MxNet 1.7.0, 1.8.0, 1.9.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intel® Neural Compressor v2.0 Release