Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update torch ao integration information #287

Merged
merged 1 commit into from
Oct 22, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 7 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,20 +26,17 @@ more accuracy data and recipes across various models.
<div align="left">

## What's New

* [2024/10] AutoRound has been integrated to [torch/ao](https://github.com/pytorch/ao), check out their [release note](https://github.com/pytorch/ao/releases/tag/v0.6.1)
* [2024/10] Important update: We now support full-range symmetric quantization and have made it the default
configuration. This approach is typically better or comparable to asymmetric quantization and significantly
outperforms other symmetric variants, especially at low bit-widths like 2-bit. And,no need to compile from source to
run
AutoRound format anymore.
configuration. This configuration is typically better or comparable to asymmetric quantization and significantly
outperforms other symmetric variants, especially at low bit-widths like 2-bit.
* [2024/09] AutoRound format supports several LVM models, check out the
examples [Qwen2-Vl](./examples/multimodal-modeling/Qwen-VL),[Phi-3-vision](./examples/multimodal-modeling/Phi-3-vision), [Llava](./examples/multimodal-modeling/Llava)
* [2024/08] AutoRound format supports Intel Gaudi2 devices. Please refer
to [Intel/Qwen2-7B-int4-inc](https://huggingface.co/Intel/Qwen2-7B-int4-inc).
* [2024/08] AutoRound introduces several experimental features, including fast tuning of norm/bias parameters (for 2-bit
and W4A4), activation quantization, and the mx_fp data type.
* [2024/07] Important change: the default value of nsamples has been changed from 512 to 128 to reduce the memory
usages, which may cause a slight accuracy drop in some scenarios


## Installation

Expand Down Expand Up @@ -105,12 +102,11 @@ We provide two recipes for best accuracy and fast running speed with low memory.
**AutoRound Format**: This format is well-suited for CPU, HPU devices, 2 bits, as well as mixed-precision
inference. [2,4]
bits are supported. It also benefits
from the Marlin kernel, which can boost inference performance notably.However, it has not yet gained widespread
community adoption. For CUDA support, you will need to
install from the source.
from the Marlin kernel, which can boost inference performance notably. However, it has not yet gained widespread
community adoption.

**AutoGPTQ Format**: This format is well-suited for symmetric quantization on CUDA devices and is widely adopted by the
community, [2,3,4,8] bits are supported, for 3 bits, pip install auto-gptq first before quantization. It also benefits
community, [2,3,4,8] bits are supported. It also benefits
from the Marlin kernel, which can boost inference performance notably. However, **the
asymmetric kernel has issues** that can cause considerable accuracy drops, particularly at 2-bit quantization and small
models.
Expand Down