-
-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qauntization in glm4-9b failed #489
Comments
Yeah, the architecture isn't supported. There's a bunch of little things that would have to be updated, like how the EOS token is a list all of a sudden, scaled attention layers and such. It's not high on the list of priorities at the moment. Not sure if the model is any good, or if it's any good without the multimodal capabilities which wouldn't be supported anyway. |
As the THUDM declares, the glm4-9b could out-perform llama3-8b. So it might worth a try. BTW, would it be possible to add multimodal support to exllamav2 in the future? I see that multimodal llm (vlm) could be the next trend. |
Multimodal is possible, of course, as is GLM4 in general, along with diffusion models, TTS, you name it. I just have to prioritize. But contributions are always welcome. |
It's getting old so there will probably be a better glm model soon, maybe esier to integrate into exl2, but it is supposed to be SOTA in a hallucination(less) benchmark: It would be cool for the dev of exotic architectures to contribute to their integration into inference engine, but I'm not blaming them either First I'm not contributing much myself, plus things move so fast, and its good that people do what they are best at. If there would be a very high pressure to have glm exl2, someone would eventually do it. edit: Seems like it's getting hfized: huggingface/transformers#33823 |
As I run the
convert.py
with command:CUDA_VISIBLE_DEVICES=1 python convert.py -i /home/orion/ai/Models/glm4-9b -o ./tmp-file -cf /home/orion/ai/Models/glm4-9b-4-exl2 -r 256
, it runs into an error sayingTypeError: Value for eos_token_id is not of expected type <class 'int'>
.It seems that the architecture of glm4 hasn't been supported yet.
Steps to reproduce: Just download the glm4-9b model and run the
convert.py
as README says.Full console log:
The text was updated successfully, but these errors were encountered: