Question: Error when substituting the quantized matrix multiplication operator. #670

grysgreat · 2024-12-04T17:17:23Z

In AWQ inference, the quantized weight matrix is dequantized to fp16 and then multiplied by the input matrix x in the linear layer.

But I try to directly replace the fp16 matrix after the dequantization with the original weight matrix(llama2-7b-hf), the inference error will be particularly large. (According to the calculation formula, WX = (DQ(Q(Ws)s-1) X, [where DQ fuse scale s^-1]. In this case, the inverse quantized matrix should be equivalent to the original matrix W.)

From:

            out = dequantize_gemm(qweight, qzeros, scales, w_bit, group_size)
            out = torch.matmul(x, out)

to:

            out = weight.T
            out = torch.matmul(x, out)

where weight.T is the original weight matrix in fp16(llama2-7b-hf).

The Perplexity of wikitext2 is from 5.619 to 1324.6.

The text was updated successfully, but these errors were encountered:

casper-hansen · 2024-12-07T11:49:19Z

Hi @grysgreat, this seems to be expected. You cannot recover the original fp16 with a transpose since you have lost a bunch of information when you quantize -> dequantize -> transpose.

grysgreat · 2024-12-07T11:56:08Z

Thanks for your answer, but I'm still curious as to which part of the AutoAWQ algorithm is responsible for not replacing the weight in the linear layer directly with the original weight(downloaded from hf)
(in AutoGPTQ, such a replacement operation can get the correct result - the same accuracy as Fp16).

grysgreat changed the title ~~Error when substituting the quantized matrix multiplication operator.~~ Question: Error when substituting the quantized matrix multiplication operator. Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Error when substituting the quantized matrix multiplication operator. #670

Question: Error when substituting the quantized matrix multiplication operator. #670

grysgreat commented Dec 4, 2024 •

edited

Loading

casper-hansen commented Dec 7, 2024

grysgreat commented Dec 7, 2024

Question: Error when substituting the quantized matrix multiplication operator. #670

Question: Error when substituting the quantized matrix multiplication operator. #670

Comments

grysgreat commented Dec 4, 2024 • edited Loading

casper-hansen commented Dec 7, 2024

grysgreat commented Dec 7, 2024

grysgreat commented Dec 4, 2024 •

edited

Loading