Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State of quantization #154

Open
RomeoV opened this issue Sep 27, 2023 · 3 comments
Open

State of quantization #154

RomeoV opened this issue Sep 27, 2023 · 3 comments

Comments

@RomeoV
Copy link

RomeoV commented Sep 27, 2023

Hello, first of all thanks for this awesome package - this is very impressive!

We often run into resource constraints when running larger models from huggingface, even just for inference.
A common strategy has been to apply quantization to model weights before running them.

Do you know if quantization has been succesfully applied at all in the Julia ML ecosystem? Is this something that might potentially be part of this package in the future?

@chengchingwen
Copy link
Owner

It's widely used in some transformer implementation like llama.cpp, but currently it's with low priority on my list to support in this package.

@RomeoV
Copy link
Author

RomeoV commented Sep 28, 2023

Small update, it seems that CUDA.jl supports Float16 natively, but Float8 and Float4 are not even datatypes in Base Julia -- although there an issue in CUDA.jl for supporting Float8, but it doesn't seem to be active.

However, without looking into the exact specifics, it seems that supporting Float16 should be possible by loading the weights from a Float16 quantized model.

@RomeoV
Copy link
Author

RomeoV commented Sep 28, 2023

Grepping for Float32 in this repo seems to suggest one would need to change

  • layers/embed.jl
  • layers/base.jl
  • huggingface/models/load.jl
    and then in the respective load.jl files for each model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants