-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Qwen2Moe GGUF loading support #33264
Add Qwen2Moe GGUF loading support #33264
Conversation
Hello @SunMarc @LysandreJik @ArthurZucker! I would like to ask for a code review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this clean PR @VladOS95-cyber ! LGTM !
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
My bad, editted ;) usually the first name github suggests me is the author of the PR |
No worries at all :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @VladOS95-cyber!
Anytime guy's 🤪. Nice work @VladOS95-cyber! |
* update gguf doc, config and tensor mapping * add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit tests * apply code style fixes * reformat files * assign GGUFQwen2Converter to qwen2_moe
* update gguf doc, config and tensor mapping * add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit tests * apply code style fixes * reformat files * assign GGUFQwen2Converter to qwen2_moe
I'm hitting an error on our CI and locally for this test self = <ggml.test_ggml.GgufIntegrationTests testMethod=test_qwen2_moe_q4_0>
def test_qwen2_moe_q4_0(self):
tokenizer = AutoTokenizer.from_pretrained(self.qwen2_moe_model_id, gguf_file=self.q4_0_qwen2_moe_model_id)
model = AutoModelForCausalLM.from_pretrained(
self.qwen2_moe_model_id,
gguf_file=self.q4_0_qwen2_moe_model_id,
device_map="auto",
torch_dtype=torch.float16,
)
text = tokenizer(self.example_text, return_tensors="pt").to(torch_device)
out = model.generate(**text, max_new_tokens=10)
EXPECTED_TEXT = "Hello everyone, I'm a newbie here and would like"
> self.assertEqual(tokenizer.decode(out[0], skip_special_tokens=True), EXPECTED_TEXT)
E AssertionError: 'Hello部分齐值得关注erc区域堪称 btnCancel跳舞�ASC' != "Hello everyone, I'm a newbie here and would like"
E - Hello部分齐值得关注erc区域堪称 btnCancel跳舞�ASC
E + Hello everyone, I'm a newbie here and would like
tests/quantization/ggml/test_ggml.py:359: AssertionError Also here's some interesting logs:
Looks like we didn't manage to load the weights correctly. Also the GgufIntegrationTests.test_bloom_q8_0 test is also failing but it is easier to fix:
|
@SunMarc sure, I'll take a look |
What does this PR do?
Add Qwen2Moe GGUF loading support
Before submitting
Pull Request section?
to it if that's the case. Link: Community contribution: Adding GGUF support for more architectures #33260
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Regarding the task @SunMarc @LysandreJik @ArthurZucker .