Skip to content

Intel® auto-round v0.1 Release

Compare
Choose a tag to compare
@wenhuach21 wenhuach21 released this 08 Mar 08:11
· 258 commits to main since this release
514aa49

Overview

AutoRound introduces an innovative weight-only quantization algorithm designed specifically for low-bit LLM inference, approaching near-lossless compression for a range of popular models including gemma-7B, Mistral-7b, Mixtral-8x7B-v0.1, Mixtral-8x7B-Instruct-v0.1, Phi2, LLAMA2 and more at W4G128. AutoRound consistently outperforms established methods across the majority of scenarios at W4G128, W4G-1, W3G128, and W2G128 .

Key Features

  • Wide Model Support: AutoRound caters to a diverse range of model families. About 20 model families have been verified.
  • Export Flexibility: Effortlessly export quantized models to ITREX[1] and AutoGPTQ[2] formats for seamless deployment on Intel CPU and Nvidia GPU platforms respectively.
  • Device Compatibility: Compatible with tuning devices including Intel CPUs, Intel Guadi2, and Nvidia GPUs.
  • Dataset Flexibility: AutoRound supports calibration with Pile10k and MBPP datasets, with easy extensibility to incorporate additional datasets.

Examples

  • Explore language modeling and code generation examples to unlock the full potential of AutoRound.

Additional Benefits

  • PreQuantized Models: Access a variety of pre-quantized models on Hugging Face for immediate integration into your projects, with more models under review and coming soon.
  • Comprehensive Accuracy Data: Simplified user deployment with extensive accuracy data provided.

Known issues:

  • baichuan-inc/Baichuan2-13B-Chat has some issues, we will support it soon

Reference:

[1] https://github.com/intel/intel-extension-for-transformers

[2] https://github.com/AutoGPTQ/AutoGPTQ