Skip to content

Intel® Neural Compressor v2.2 Release

Compare
Choose a tag to compare
@chensuyue chensuyue released this 21 Jun 13:30
· 904 commits to master since this release
da99d28
  • Highlights
  • Features
  • Improvement
  • Productivity
  • Bug Fixes
  • Examples
  • External Contributes

Highlights

  • Expanded SmoothQuant support on mainstream frameworks including PyTorch/IPEX, TensorFlow/ITEX, ONNX Runtime, and validated popular large language models (LLMs) such as GPT-J, LLaMA, OPT, BLOOM, Dolly, MPT, LaMini-LM and RedPajama-INCITE.
  • Innovated two productivity components Neural Solution for distributed quantization and Neural Insights for quantization accuracy debugging.
  • Successfully integrated Intel Neural Compressor into MSFT Olive (#157) and DeepSpeed (#3300).

Features

  • [Quantization] Support TensorFlow SmoothQuant (1f4127)
  • [Quantization] Support ITEX SmoothQuant (1f4127)
  • [Quantization] Support PyTorch FX SmoothQuant (6a39f6, 603811)
  • [Quantization] Support ONNX Runtime SmoothQuant (3df647, 1e1d70)
  • [Quantization] Support dictionary inputs for IPEX quantization (4ba233)
  • [Quantization] Enable calibration algorithm Entropy/KL & Percentile for ONNX Runtime (dae494)
  • [MixedPrecision] Support mixed precision op name/type dict option (a9c2cb)
  • [Strategy] Support block wise tuning (9c26ed)
  • [Strategy] Enable mse_v2 for ONNX Runtime (62122d)
  • [Pruning] Support retrain free sparse (d29aa0)
  • [Pruning] Support TensorFlow pruning with 2.x API (072c13)

Improvement

  • [Quantization] Enhance Keras functional model quantization with Keras model in, quantized Keras model out (699751)
  • [Quantization] Enhance MatMul and Gather quantization for ONNX Runtime (1f9c4f)
  • [Quantization] Add new recipe for ONNX Runtime NLP models (10d82c)
  • [MixedPrecision] Add more FP16 OPs support for ONNX Runtime (15d551)
  • [MixedPrecision] Add more BF16 OPs support for TensorFlow (369b9d)
  • [Pruning] Enhance multihead-attention slim (f3de50)
  • [Pruning] Enable progressive pruning in N:M pattern (483e80)
  • [Model Export] Refine PT2ONNX export (877adb)
  • Remove redundant classes for quantization, benchmark and mixed precision (c51096)

Productivity

  • [Neural Solution] Support multi-node distribute tuning model-level parallelism (ee049c)
  • [Neural Insights] Support quantization and benchmark diagnosis with GUI (5dc9ea, 3bde2e, 898344)
  • [Neural Coder] Migrate Neural Coder support into 2.x API (113ca1, e74a8a)
  • [Ecosystem] MSFT Olive integration (#157)
  • [Ecosystem] MSFT DeepSpeed integration (#3300)
  • Support ITEX 1.2 (5519e2)
  • Support Python 3.11 (6fa053)
  • Enhance documentations for mixed precision, diagnosis, dataloader, metric, etc.

Bug Fixes

Examples

External Contributes

  • Add a mathematical check for SmoothQuant transform (5c04ac)
  • Fix mismatch absorb layers due to tracing and named modules for SmoothQuant (bccc89)
  • Fix trace issue when input is dictionary for SmoothQuant (6a3c64)
  • Allow dictionary model inputs for ONNX export (17b642)

Validated Configurations

  • Centos 8.4 & Ubuntu 22.04
  • Python 3.7, 3.8, 3.9, 3.10, 3.11
  • TensorFlow 2.10.0, 2.11.0, 2.12.0
  • ITEX 1.1.0, 1.2.0
  • PyTorch/IPEX 1.12.1+cpu, 1.13.0+cpu, 2.0.1+cpu
  • ONNX Runtime 1.13.1, 1.14.1, 1.15.0
  • MXNet 1.9.1