Intel® Neural Compressor v2.0 Release
- Highlights
- Features
- Bug Fixes
- Examples
- Documentations
Highlights
- Support the quantization for Intel® Xeon® Scalable Processors (e.g., Sapphire Rapids), Intel® Data Center GPU Flex Series, and Intel® Max Series CPUs & GPUs
- Provide the new unified APIs for post-training optimizations (static/dynamic quantization) and during-training optimizations (quantization-aware training, pruning/sparsity, distillation, etc.)
- Support the advanced fine-grained auto mixed precisions (AMP) upon all the supported precisions (e.g., INT8, BF16, and FP32)
- Improve the model conversion from PyTorch INT8 model to ONNX INT8 model
- Support the zero-code quantization in Visual Studio Code and JupyterLab with Neural Coder plugins
- Support the quantization for 10K+ transformer-based models including large language models (e.g., T5, GPT, Stable Diffusion, etc.)
Features
- [Quantization] Experimental Keras model in, quantized Keras model out (commit 4fa753)
- [Quantization] Support quantization for ITEX v1.0 on Intel CPU and Intel GPU (commit a2fcb2)
- [Quantization] Support hardware-neutral quantized ONNX QDQ models and validate on multiple devices (Intel CPU, NVidia GPU, AMD CPU, and ARM CPU) through ONNX Runtime
- [Quantization] Enhance TensorFlow QAT: remove TFMOT dependency (commit 1deb7d)
- [Quantization] Distinguish frameworks, backends and output formats for OnnxRuntime backend (commit 2483a8)
- [Quantization] Support PyTorch/IPEX 1.13 and TensorFlow 2.11 (commit b7a2ef)
- [AMP] Support more TensorFlow bf16 ops (commit 98d3c8)
- [AMP] Add torch.amp bf16 support for IPEX backend (commit 2a361b)
- [Strategy] Add accuracy-first tuning strategies: MSE_v2 (commit 80311f) and HAWQ (commit 83018e) to solve the accuracy problem of specific models
- [Strategy] Refine the tuning strategy, add more data type, more op attributes like per tensor/per channel, dynamic/static, …etc
- [Pruning] Add progressive pruning and pattern lock pruning_type (commit f46bb1)
- [Pruning] Add per_channel sparse pattern (commit f46bb1)
- [Distillation] Support self-distillation towards efficient and compact neural networks (commit acdd4c)
- [Distillation] Enhance API of intermediate layers knowledge distillation (commit 3183f6)
- [Neural Coder] Detect devices and ISA to adjust the optimization (commit 691d0b)
- [Neural Coder] Automatically quantize with ONNX Runtime backend (commit f711b4)
- [Neural Coder] Add Neural Coder Python Launcher (commit 7bb92d)
- [Neural Coder] Add Visual Studio Plugin (commit dd39ca)
- [Productivity] Support Pruning in GUI (commit d24fea)
- [Productivity] Use config-driven API to replace yaml
- [Productivity] Export ONNX QLinear to QDQ format (commit e996a9)
- [Productivity] Validate 10K+ transformer-based models including large language models (e.g., T5, GPT, Stable Diffusion, etc.)
Bug Fixes
- Fix quantization failed of Onnx models with over 2GB model size (commit 8d83cc)
- Fix bf16 disabled by default (commit 83825a)
- Fix PyTorch DLRM quantization out of memory (commit ff1725)
- Fix ITEX resnetv2_50 tuning accuracy (commit ae1e05)
- Fix bf16 ops error in QAT when torch version < 1.11 (commit eda8cb)
- Fix the key comparison in the Bayesian strategy (commit 1e9c12)
- Fix PyTorch T5 can’t do static quantization (commit ee3ef0)
Examples
- Add quantization examples of HuggingFace models with OnnxRuntime backend (commit f4aeb5)
- Add Big language model quantization example: GPT-J (commit 01899d)
- Add Distributed Distillation examples: MobileNetV2 (commit d33ebe) and CNN-2 (commit ebe9e2)
- Update examples with INC v2.0 new API
- Add Stable Diffusion example
Documentations
- Update the accuracy of broad hardware (commit 71b056)
- Refine API helper and documents
Validated Configurations
- Centos 8.4 & Ubuntu 20.04
- Python 3.7, 3.8, 3.9, 3.10
- TensorFlow 2.9.3, 2.10.1, 2.11.0, ITEX 1.0
- PyTorch/IPEX 1.11.0+cpu, 1.12.1+cpu, 1.13.0+cpu
- ONNX Runtime 1.11.0, 1.12.1, 1.13.1
- MxNet 1.7.0, 1.8.0, 1.9.1