Skip to content

Intel® Neural Compressor v2.0 Release

Compare
Choose a tag to compare
@kevinintel kevinintel released this 30 Dec 05:27
· 1379 commits to master since this release
4eb3af0
  • Highlights
  • Features
  • Bug Fixes
  • Examples
  • Documentations

Highlights

  • Support the quantization for Intel® Xeon® Scalable Processors (e.g., Sapphire Rapids), Intel® Data Center GPU Flex Series, and Intel® Max Series CPUs & GPUs
  • Provide the new unified APIs for post-training optimizations (static/dynamic quantization) and during-training optimizations (quantization-aware training, pruning/sparsity, distillation, etc.)
  • Support the advanced fine-grained auto mixed precisions (AMP) upon all the supported precisions (e.g., INT8, BF16, and FP32)
  • Improve the model conversion from PyTorch INT8 model to ONNX INT8 model
  • Support the zero-code quantization in Visual Studio Code and JupyterLab with Neural Coder plugins
  • Support the quantization for 10K+ transformer-based models including large language models (e.g., T5, GPT, Stable Diffusion, etc.)

Features

  • [Quantization] Experimental Keras model in, quantized Keras model out (commit 4fa753)
  • [Quantization] Support quantization for ITEX v1.0 on Intel CPU and Intel GPU (commit a2fcb2)
  • [Quantization] Support hardware-neutral quantized ONNX QDQ models and validate on multiple devices (Intel CPU, NVidia GPU, AMD CPU, and ARM CPU) through ONNX Runtime
  • [Quantization] Enhance TensorFlow QAT: remove TFMOT dependency (commit 1deb7d)
  • [Quantization] Distinguish frameworks, backends and output formats for OnnxRuntime backend (commit 2483a8)
  • [Quantization] Support PyTorch/IPEX 1.13 and TensorFlow 2.11 (commit b7a2ef)
  • [AMP] Support more TensorFlow bf16 ops (commit 98d3c8)
  • [AMP] Add torch.amp bf16 support for IPEX backend (commit 2a361b)
  • [Strategy] Add accuracy-first tuning strategies: MSE_v2 (commit 80311f) and HAWQ (commit 83018e) to solve the accuracy problem of specific models
  • [Strategy] Refine the tuning strategy, add more data type, more op attributes like per tensor/per channel, dynamic/static, …etc
  • [Pruning] Add progressive pruning and pattern lock pruning_type (commit f46bb1)
  • [Pruning] Add per_channel sparse pattern (commit f46bb1)
  • [Distillation] Support self-distillation towards efficient and compact neural networks (commit acdd4c)
  • [Distillation] Enhance API of intermediate layers knowledge distillation (commit 3183f6)
  • [Neural Coder] Detect devices and ISA to adjust the optimization (commit 691d0b)
  • [Neural Coder] Automatically quantize with ONNX Runtime backend (commit f711b4)
  • [Neural Coder] Add Neural Coder Python Launcher (commit 7bb92d)
  • [Neural Coder] Add Visual Studio Plugin (commit dd39ca)
  • [Productivity] Support Pruning in GUI (commit d24fea)
  • [Productivity] Use config-driven API to replace yaml
  • [Productivity] Export ONNX QLinear to QDQ format (commit e996a9)
  • [Productivity] Validate 10K+ transformer-based models including large language models (e.g., T5, GPT, Stable Diffusion, etc.)

Bug Fixes

  • Fix quantization failed of Onnx models with over 2GB model size (commit 8d83cc)
  • Fix bf16 disabled by default (commit 83825a)
  • Fix PyTorch DLRM quantization out of memory (commit ff1725)
  • Fix ITEX resnetv2_50 tuning accuracy (commit ae1e05)
  • Fix bf16 ops error in QAT when torch version < 1.11 (commit eda8cb)
  • Fix the key comparison in the Bayesian strategy (commit 1e9c12)
  • Fix PyTorch T5 can’t do static quantization (commit ee3ef0)

Examples

  • Add quantization examples of HuggingFace models with OnnxRuntime backend (commit f4aeb5)
  • Add Big language model quantization example: GPT-J (commit 01899d)
  • Add Distributed Distillation examples: MobileNetV2 (commit d33ebe) and CNN-2 (commit ebe9e2)
  • Update examples with INC v2.0 new API
  • Add Stable Diffusion example

Documentations

  • Update the accuracy of broad hardware (commit 71b056)
  • Refine API helper and documents

Validated Configurations

  • Centos 8.4 & Ubuntu 20.04
  • Python 3.7, 3.8, 3.9, 3.10
  • TensorFlow 2.9.3, 2.10.1, 2.11.0, ITEX 1.0
  • PyTorch/IPEX 1.11.0+cpu, 1.12.1+cpu, 1.13.0+cpu
  • ONNX Runtime 1.11.0, 1.12.1, 1.13.1
  • MxNet 1.7.0, 1.8.0, 1.9.1