-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support quantization for non-multiples of 32. #328
Comments
Yes that's a known limitation of our quantization at the moment. Will mark this as an ehnancment. In the meantime one thing you could do as a workaround so you aren't blocked by this is to concatenate the K and V projections and do them as a single matmul (112 * 2 / 32 = 7). |
Hi Awni, thanks for the reply. Would you be able to give a bit more detail about the workaround? By the way, happy new year! :) |
Yes so there are two projections, one for the keys and one for the values.
Instead you could do something like:
Now you can pre compute the concatenated matrix and quantize it since its dimensions should be divisible by 32. |
Thanks @awni , this workaround works. I have added it to the yayi2 example. :) |
Nice. I think (at least for the non-group axis) fixing this is mostly a matter of bounds checking in the matmul.. but @angeloskath can say more about if / when we could support more flexible quantization. |
@mzbac not sure you saw, in 0.0.10 this should be fixed. We can add yayi2 to mlx-lm now 😃 |
Nice, I will take a look. Hopefully, it just needs to be mapped to the llama arch and it will work. |
Oh that would be amazing. |
Please close this now as MLX supports non-32 dims quantization and I have tested it on the yayi2 model, and it works as expected. |
Yayi 30b k,v layer has [input_dims=7168, out_dims=112], so it failed to quantize due to error
all dimensions should be divisible by 32 for now
. FYI, here is the implementation of the yayi2 30b model's k,v layer -> https://huggingface.co/wenge-research/yayi2-30b/blob/main/modeling_yayi.py#L180The text was updated successfully, but these errors were encountered: