Support quantization for non-multiples of 32. #328

mzbac · 2023-12-31T14:32:25Z

Yayi 30b k,v layer has [input_dims=7168, out_dims=112], so it failed to quantize due to error all dimensions should be divisible by 32 for now. FYI, here is the implementation of the yayi2 30b model's k,v layer -> https://huggingface.co/wenge-research/yayi2-30b/blob/main/modeling_yayi.py#L180

The text was updated successfully, but these errors were encountered:

awni · 2023-12-31T14:48:56Z

Yes that's a known limitation of our quantization at the moment. Will mark this as an ehnancment.

In the meantime one thing you could do as a workaround so you aren't blocked by this is to concatenate the K and V projections and do them as a single matmul (112 * 2 / 32 = 7).

mzbac · 2023-12-31T15:00:11Z

Yes that's a known limitation of our quantization at the moment. Will mark this as an ehnancment.

In the meantime one thing you could do as a workaround so you aren't blocked by this is to concatenate the K and V projections and do them as a single matmul (112 * 2 / 32 = 7).

Hi Awni, thanks for the reply. Would you be able to give a bit more detail about the workaround? By the way, happy new year! :)

awni · 2023-12-31T16:06:10Z

Yes so there are two projections, one for the keys and one for the values.

k = x @ Wk.T
v = x @ Wv.T

Instead you could do something like:

k, v = mx.split(x @ mx.concatenate([Wk, Wv], axis=0).T, 2)

Now you can pre compute the concatenated matrix and quantize it since its dimensions should be divisible by 32.

mzbac · 2024-01-01T03:57:30Z

Thanks @awni , this workaround works. I have added it to the yayi2 example. :)

awni · 2024-01-01T20:09:13Z

Nice.

I think (at least for the non-group axis) fixing this is mostly a matter of bounds checking in the matmul.. but @angeloskath can say more about if / when we could support more flexible quantization.

awni · 2024-01-10T15:05:04Z

Aslo ml-explore/mlx-examples#279

awni · 2024-01-19T14:26:41Z

@mzbac not sure you saw, in 0.0.10 this should be fixed. We can add yayi2 to mlx-lm now 😃

mzbac · 2024-01-19T14:29:30Z

@mzbac not sure you saw, in 0.0.10 this should be fixed. We can add yayi2 to mlx-lm now 😃

Nice, I will take a look. Hopefully, it just needs to be mapped to the llama arch and it will work.

awni · 2024-01-19T14:31:03Z

Oh that would be amazing.

mzbac · 2024-01-26T03:23:58Z

Please close this now as MLX supports non-32 dims quantization and I have tested it on the yayi2 model, and it works as expected.

awni added the enhancement New feature or request label Dec 31, 2023

mzbac mentioned this issue Dec 31, 2023

feat: add yayi2-30b example ml-explore/mlx-examples#208

Closed

awni changed the title ~~Quantization failed for the yayi2 30b model~~ Support quantization for non-multiples of 32. Jan 1, 2024

This was referenced Jan 4, 2024

Contribute Hugging Face models to the MLX Community ml-explore/mlx-examples#155

Open

[quantize] All dimensions should be divisible by 32 for now ml-explore/mlx-examples#279

Closed

mzbac closed this as completed Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support quantization for non-multiples of 32. #328

Support quantization for non-multiples of 32. #328

mzbac commented Dec 31, 2023

awni commented Dec 31, 2023

mzbac commented Dec 31, 2023

awni commented Dec 31, 2023

mzbac commented Jan 1, 2024

awni commented Jan 1, 2024

awni commented Jan 10, 2024

awni commented Jan 19, 2024

mzbac commented Jan 19, 2024

awni commented Jan 19, 2024

mzbac commented Jan 26, 2024

Support quantization for non-multiples of 32. #328

Support quantization for non-multiples of 32. #328

Comments

mzbac commented Dec 31, 2023

awni commented Dec 31, 2023

mzbac commented Dec 31, 2023

awni commented Dec 31, 2023

mzbac commented Jan 1, 2024

awni commented Jan 1, 2024

awni commented Jan 10, 2024

awni commented Jan 19, 2024

mzbac commented Jan 19, 2024

awni commented Jan 19, 2024

mzbac commented Jan 26, 2024