Intel® Neural Compressor v2.5 Release
- Highlights
- Features
- Improvement
- Productivity
- Bug Fixes
- External Contributes
- Validated Configurations
Highlights
- Integrated Weight-Only Quantization algorithm AutoRound and verified on Gaudi2, Intel CPU, NV GPU
- Applied SmoothQuant & Weight-Only Quantization algorithms with 15+ popular LLMs for INT8 & INT4 quantization and published the recipes
Features
- [Quantization] Integrate Weight-Only Quantization algorithm AutoRound (5c7f33, dfd083, 9a7ddd, cf1de7)
- [Quantization] Quantize weight with in-place mode in Weight-Only Quantization (deb1ed)
- [Pruning] Enable SNIP on multiple cards using DeepSpeed ZeRO-3 (49ab28)
- [Pruning] Support new pruning approach Wanda and DSNOT for PyTorch LLM (7a3671)
Improvement
- [Quantization] SmoothQuant code structure refactor (a8d81c)
- [Quantization] Optimize the workflow of parsing Keras model (b816d7)
- [Quantization] Support static_groups options in GPTQ API (1c426a)
- [Quantization] Update TEQ train dataloader (d1e994)
- [Quantization] WeightOnlyLinear keeps self.weight after recover (2835bd)
- [Quantization] Add version condition for IPEX prepare init (d96e14)
- [Quantization] Enhance the ORT node name checking (f1597a)
- [Pruning] Stop the tuning process early when enabling smooth quant (844a03)
Productivity
- ORT LLM examples support latest optimum version (26b260)
- Add coding style docs and recommended VS Code setting (c1f23c)
- Adapt transformers 4.37 loading (6133f4)
- Upgrade pre-commit checker for black/blacken-docs/ruff (7763ed)
- Support CI summary in PR comments (d4bcdd))
- Notebook example update to install latest INC & TF, add metric in fit (4239d3)
Bug Fixes
- Fix QA IPEX example fp32 input issue (c4de19)
- Update Conditions of Getting min-max during TF MatMul Requantize (d07175)
- Fix TF saved_model issues (d8e60b)
- Fix comparison of module_type and MulLinear (ba3aba)
- Fix ORT calibration issue (cd6d24)
- Fix ORT example bart export failure (b0dc0d)
- Fix TF example accuracy diff during benchmark and quantization (5943ea)
- Fix bugs for GPTQ exporting with static_groups (b4e37b)
- Fix ORT quant issue caused by tensors having same name (0a20f3)
- Fix Neural Solution SQL/CMD injection (14b7b0)
- Fix the best qmodel recovery issue (f2d9b7)
- Fix logger issue (83bc77)
- Store token in protected file (c6f9cc)
- Define the default SSL context (b08725)
- Fix IPEX stats bug (5af383)
- Fix ORT calibration for Dml EP (c58aea)
- Fix wrong socket number retrieval for non-english system (5b2a88)
- Fix trust remote for llm examples (2f2c9a)
External Contributes
Validated Configurations
- Centos 8.4 & Ubuntu 22.04 & Win 11 & MacOS Ventura 13.5
- Python 3.8, 3.9, 3.10, 3.11
- TensorFlow 2.13, 2.14, 2.15
- ITEX 2.13.0, 2.14.0
- PyTorch/IPEX 2.0, 2.1, 2.2
- ONNX Runtime 1.15, 1.16, 1.17