Skip to content

Intel® Neural Compressor v3.0 Release

Compare
Choose a tag to compare
@chensuyue chensuyue released this 12 Aug 04:09
· 145 commits to master since this release
7056720
  • Highlights
  • Features
  • Improvements
  • Examples
  • Bug Fixes
  • Documentations
  • Validated Configurations

Highlights

  • FP8 quantization and INT4 model loading support on Intel® Gaudi® AI accelerator
  • Framework extension API for quantization, mixed-precision and benchmarking
  • Accuracy-aware FP16 mixed precision support on Intel® Xeon® 6 Processors
  • Performance optimizations and usability improvements on client-side quantization

Features

Improvements

  • [Quantization] Integrate AutoRound v0.3 (bfa27e, fd9685)
  • [Quantization] Support auto_host2device on RTN and GPTQ(f75ff4)
  • [Quantization] Support transformers.Conv1D WOQ quantization (b6237c)
  • [Quantization] support quant_lm_head argument in all WOQ configs (4ae2e8)
  • [Quantization] Update fp4_e2m1 mapping list to fit neural_speed and qbits inference (5fde50)
  • [Quantization] Enhance load_empty_model import (29471d)
  • [Common] Add common logger to the quantization process (1cb844, 482f87, 83bc77, f50baf)
  • [Common] Enhance the set_local for operator type (a58638)
  • [Common] Port more helper classes from 2.x (3b150d)
  • [Common] Refine base config for 3.x API (be42d0, efea08)
  • [Export] Migrate export feature to 2.x and 3.x from deprecated (794b27)

Examples

  • Add save/load for PT2E example (0e724a)
  • Add IPEX XPU example for framework extension API (6e1b1d)
  • Enable TensorFlow yolov5 example for framework extension API (19024b)
  • Update example for framework extension IPEX SmoothQuant (b35ff8)
  • Add SDXL model example for framework extension API (000946)
  • Add PyTorch mixed precision example (e106de, 9077b3)
  • Add CV and LLM examples for PT2E quantization path (b401b0)
  • Add Recommendation System examples for IPEX path (e470f6)
  • Add TensorFlow examples for framework extension API (fb8577, 922b24)
  • Add PyTorch Microscaling(MX) Quant examples (6733da)
  • Add PyTorch SmoothQuant LLM examples for new framework extension API (137fa3)
  • Add PyTorch GPTQ/RTN example for framework extension API (813d93)
  • Add double quant example (ccd0c9)

Bug Fixes

  • Fix ITREX qbits nf4/int8 training core dumped issue (190e6b)
  • Fix unused pkgs import (437c8e)
  • Remove Gelu Fusion for TensorFlow New API (5592ac)
  • Fix GPTQ layer match issue (90fb43)
  • Fix static quant regression issue on IPEX path (70a1d5)
  • Fix config expansion with empty options (6b2738)
  • Fix act_observer for IPEX SmoothQuant and static quantization (263450)
  • Set automatic return_dict=False for GraphTrace (53e7df)
  • Fix WOQ Linear pack slow issue (da1ada, daa143)
  • Fix dtype of unpacked tensor (29fdec)
  • Fix WeightOnlyLinear bits type when dtype="intx" (19ff13)
  • Fix several issues for SmoothQuant and static quantization (7120dd)
  • Fix IPEX examples failed with evaluate (e82674)
  • Fix HQQ issue for group size of -1 (8dac9f)
  • Fix bug in GPTQ g_idx (4f893c)
  • Fix tune_cfg issue for static quant (ba1650)
  • Add non-str op_name match workaround for IPEX (911ccd)
  • Fix opt GPTQ double quant example config (62aa85)
  • Fix GPTQ accuracy issue in framework extension API example (c701ea)
  • Fix bf16 symbolic_trace bug (3fe2fd)
  • Fix opt_125m_woq_gptq_int4_dq_ggml issue (b99aba)

Documentations

Validated Configurations

  • Centos 8.4 & Ubuntu 22.04 & Win 11 & MacOS Ventura 13.5
  • Python 3.8, 3.9, 3.10, 3.11
  • PyTorch/IPEX 2.1, 2.2, 2.3
  • TensorFlow 2.14, 2.15, 2.16
  • ONNX Runtime 1.16, 1.17, 1.18