Skip to content

Intel® Neural Compressor v2.4 Release

Compare
Choose a tag to compare
@chensuyue chensuyue released this 17 Dec 03:26
· 536 commits to master since this release
111b3ce
  • Highlights
  • Features
  • Improvement
  • Productivity
  • Bug Fixes
  • Examples
  • Validated Configurations

Highlights

  • Supported layer-wise quantization for PyTorch RTN/GPTQ Weight-Only Quantization and ONNX Runtime W8A8 quantization.
  • Supported Weight-Only Quantization tuning for ONNX Runtime backend.
  • Supported GGML double quant on RTN/GPTQ Weight-Only Quantization with FW extension API
  • Supported SmoothQuant of Big Saved Model for TensorFlow Backend.

Features

  • [Quantization] Support GGML double quant in Weight-Only Quantization for RTN and GPTQ (05c15a)
  • [Quantization] Support Weight-Only Quantization tuning for ONNX Runtime backend (6d4ea5, 934ba0, 4fcfdf)
  • [Quantization] Support SmoothQuant block-wise alpha-tuning (ee6bc2)
  • [Quantization] Support SmoothQuant of Big Saved Model for TensorFlow Backend (3b2925, 4f2c35)
  • [Quantization] Support PyTorch layer-wise quantization for GPTQ (ee5450)
  • [Quantization] support PyTorch layer-wise quantization for RTN (ebd1e2)
  • [Quantization] Support ONNX Runtime layer-wise W8A8 quantization (6142e4, 5d33a5)
  • [Common] [Experimental] FW extension API implement (76b8b3, 8447d7, 258236)
  • [Quantization] [Experimental] FW extension API for PT backend support Weight-Only Quantization (915018, dc9328)
  • [Quantization] [Experimental] FW extension API for TF backend support Keras Quantization (2627d3)
  • [Quantization] IPEX 2.1 XPU (CPU+GPU) support (af0b50, cf847c)

Improvement

  • [Quantization] Add use_optimum_format for export_compressed_model in Weight-Only Quantization (5179da, 0a0644)
  • [Quantization] Enhance ONNX Runtime quantization with DirectML EP (db0fef, d13183, 098401, 6cad50)
  • [Quantization] Support restore ipex model from json (c3214c)
  • [Quantization] ONNX Runtime add attr to MatMulNBits (7057e3)
  • [Quantization] Increase SmoothQuant auto alpha running speed (173c18)
  • [Quantization] Add SmoothQuant alpha search space as a config argument (f9663d)
  • [Quantization] Add SmoothQuant weight_clipping as a default_on option (1f4aec)
  • [Quantization] Support SmoothQuant with MinMaxObserver (45b496)
  • [Quantization] Support Weight-Only Quantization with fp16 for PyTorch backend (d5cb56)
  • [Quantization] Support trace with dictionary type example_inputs (afe315)
  • [Quantization] Support falcon Weight-Only Quantization (595d3a)
  • [Common] Add deprecation decorator in experimental fold (aeb3ed)
  • [Common] Remove 1.x API dependency (ee617a)
  • [Mixed Precision] Support PyTorch eager mode BF16 MixedPrecision (3bfb76)

Productivity

  • Support quantization and benchmark on macOS (16d6a0)
  • Support ONNX Runtime 1.16.0 (d81732, 299af9, 753783)
  • Support TensorFlow new API for gnr-base (8160c7)

Bug Fixes

  • Fix GraphModule object has no attribute bias (7f53d1)
  • Fix ONNX model export issue (af0aea, eaa57f)
  • Add clip for ONNX Runtime SmoothQuant (cbb69b)
  • Fix SmoothQuant minmax observer init (b1db1c)
  • Fix SmoothQuant issue in get/set_module (dffcfe)
  • Align sparsity with block-wise masks in progressive pruning (fcdc29)

Examples

  • Support peft model with SmoothQuant (5e21b7)
  • Enable two ONNX Runtime examples table-transformer-detection (550cee), BEiT (7265df)

Validated Configurations

  • Centos 8.4 & Ubuntu 22.04 & Win10 & MacOS Ventura 13.5
  • Python 3.8, 3.9, 3.10, 3.11
  • TensorFlow 2.13, 2.14, 2.15
  • ITEX 1.2.0, 2.13.0.0, 2.14.0.1
  • PyTorch/IPEX 1.13.0+cpu, 2.0.1+cpu, 2.1.0
  • ONNX Runtime 1.14.1, 1.15.1, 1.16.3
  • MXNet 1.9.1