Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qauntization in glm4-9b failed #489

Open
Orion-zhen opened this issue Jun 5, 2024 · 4 comments
Open

Qauntization in glm4-9b failed #489

Orion-zhen opened this issue Jun 5, 2024 · 4 comments

Comments

@Orion-zhen
Copy link
Contributor

As I run the convert.py with command: CUDA_VISIBLE_DEVICES=1 python convert.py -i /home/orion/ai/Models/glm4-9b -o ./tmp-file -cf /home/orion/ai/Models/glm4-9b-4-exl2 -r 256, it runs into an error saying TypeError: Value for eos_token_id is not of expected type <class 'int'>.

It seems that the architecture of glm4 hasn't been supported yet.

Steps to reproduce: Just download the glm4-9b model and run the convert.py as README says.

Full console log:

 !! Warning, unknown architecture: ChatGLMModel
 !! Loading as LlamaForCausalLM
Traceback (most recent call last):
  File "/home/orion/repo/exllamav2/convert.py", line 71, in <module>
    config.prepare()
  File "/home/orion/repo/exllamav2/exllamav2/config.py", line 187, in prepare
    self.eos_token_id = read(read_config, int, "eos_token_id", None)  # 2
  File "/home/orion/repo/exllamav2/exllamav2/config.py", line 40, in read
    raise TypeError(f"Value for {key} is not of expected type {expected_type}")
TypeError: Value for eos_token_id is not of expected type <class 'int'>
@turboderp
Copy link
Member

Yeah, the architecture isn't supported. There's a bunch of little things that would have to be updated, like how the EOS token is a list all of a sudden, scaled attention layers and such. It's not high on the list of priorities at the moment. Not sure if the model is any good, or if it's any good without the multimodal capabilities which wouldn't be supported anyway.

@Orion-zhen
Copy link
Contributor Author

As the THUDM declares, the glm4-9b could out-perform llama3-8b. So it might worth a try. BTW, would it be possible to add multimodal support to exllamav2 in the future? I see that multimodal llm (vlm) could be the next trend.

@turboderp
Copy link
Member

Multimodal is possible, of course, as is GLM4 in general, along with diffusion models, TTS, you name it. I just have to prioritize. But contributions are always welcome.

@Trapper4888
Copy link

Trapper4888 commented Oct 2, 2024

It's getting old so there will probably be a better glm model soon, maybe esier to integrate into exl2, but it is supposed to be SOTA in a hallucination(less) benchmark:
https://huggingface.co/spaces/vectara/leaderboard

It would be cool for the dev of exotic architectures to contribute to their integration into inference engine, but I'm not blaming them either First I'm not contributing much myself, plus things move so fast, and its good that people do what they are best at. If there would be a very high pressure to have glm exl2, someone would eventually do it.

edit: Seems like it's getting hfized: huggingface/transformers#33823

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants