Add Qwen2Moe GGUF loading support #33264

VladOS95-cyber · 2024-09-02T16:16:30Z

What does this PR do?

Add Qwen2Moe GGUF loading support

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. Link: Community contribution: Adding GGUF support for more architectures #33260
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Regarding the task @SunMarc @LysandreJik @ArthurZucker .

…tests

VladOS95-cyber · 2024-09-04T17:27:10Z

Hello @SunMarc @LysandreJik @ArthurZucker! I would like to ask for a code review

SunMarc

Thanks for this clean PR @VladOS95-cyber ! LGTM !

src/transformers/integrations/ggml.py

VladOS95-cyber · 2024-09-05T14:15:26Z

Thanks for this clean PR @vanpelt ! LGTM !

Thank you for review! Why @vanpelt? =))

HuggingFaceDocBuilderDev · 2024-09-05T14:30:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc · 2024-09-05T14:36:03Z

Thank you for review! Why @vanpelt? =))

My bad, editted ;) usually the first name github suggests me is the author of the PR

VladOS95-cyber · 2024-09-05T14:38:25Z

Thank you for review! Why @vanpelt? =))

My bad, editted ;) usually the first name github suggests me is the author of the PR

No worries at all :)

LysandreJik

Thanks @VladOS95-cyber!

vanpelt · 2024-09-05T15:56:12Z

Anytime guy's 🤪. Nice work @VladOS95-cyber!

* update gguf doc, config and tensor mapping * add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit tests * apply code style fixes * reformat files * assign GGUFQwen2Converter to qwen2_moe

SunMarc · 2024-10-02T14:33:14Z

I'm hitting an error on our CI and locally for this test test_qwen2_moe_q4_0 @VladOS95-cyber . Can you have a look?
Here's the traceback:

self = <ggml.test_ggml.GgufIntegrationTests testMethod=test_qwen2_moe_q4_0>

    def test_qwen2_moe_q4_0(self):
        tokenizer = AutoTokenizer.from_pretrained(self.qwen2_moe_model_id, gguf_file=self.q4_0_qwen2_moe_model_id)
        model = AutoModelForCausalLM.from_pretrained(
            self.qwen2_moe_model_id,
            gguf_file=self.q4_0_qwen2_moe_model_id,
            device_map="auto",
            torch_dtype=torch.float16,
        )

        text = tokenizer(self.example_text, return_tensors="pt").to(torch_device)
        out = model.generate(**text, max_new_tokens=10)

        EXPECTED_TEXT = "Hello everyone, I'm a newbie here and would like"
>       self.assertEqual(tokenizer.decode(out[0], skip_special_tokens=True), EXPECTED_TEXT)
E       AssertionError: 'Hello部分齐值得关注erc区域堪称 btnCancel跳舞�ASC' != "Hello everyone, I'm a newbie here and would like"
E       - Hello部分齐值得关注erc区域堪称 btnCancel跳舞�ASC
E       + Hello everyone, I'm a newbie here and would like

tests/quantization/ggml/test_ggml.py:359: AssertionError

Also here's some interesting logs:

Some weights of Qwen2MoeForCausalLM were not initialized from the model checkpoint at RichardErkhov/Qwen_-_Qwen1.5-MoE-A2.7B-Chat-gguf and are newly initialized: ['model.layers.0.mlp.experts.0.down_proj.weight', 'model.layers.0.mlp.experts.0.gate_proj.weight', 'model.layers.0.mlp.experts.0.up_proj.weight', 'model.layers.0.mlp.experts.1.down_proj.weight', 'model.layers.0.mlp.experts.1.gate_proj.weight', 'model.layers.0.mlp.experts.1.up_proj.weight', 'model.layers.0.mlp.experts.10.down_proj.weight', 'model.layers.0.mlp.experts.10.gate_proj.weight', 'model.layers.0.mlp.experts.10.up_proj.weight', 'model.layers.0.mlp.experts.11.down_proj.weight', 'model.layers.0.mlp.experts.11.gate_proj.weight', 'model.layers.0.mlp.experts.11.up_proj.weight', 'model.layers.0.mlp.experts.12.down_proj.weight', 'model.layers.0.mlp.experts.12.gate_proj.weight', 'model.layers.0.mlp.experts.12.up_proj.weight', 
.....

Looks like we didn't manage to load the weights correctly. Also the GgufIntegrationTests.test_bloom_q8_0 test is also failing but it is easier to fix:

>       self.assertEqual(tokenizer.decode(out[0], skip_special_tokens=True), EXPECTED_TEXT)
E       AssertionError: 'Hello, I just want to say that I am just' != 'Hello, I just want to say that I am very'
E       - Hello, I just want to say that I am just
E       ?                                     ^^^^
E       + Hello, I just want to say that I am very
E       ?                                     ^^^^

tests/quantization/ggml/test_ggml.py:422: AssertionError

VladOS95-cyber · 2024-10-02T14:36:50Z

@SunMarc sure, I'll take a look

VladOS95-cyber added 4 commits September 2, 2024 18:07

update gguf doc, config and tensor mapping

af2bc3b

add qwen2moe architecture support, GGUFQwen2MoeConverter and q4 unit …

6579d57

…tests

apply code style fixes

dd6a651

reformat files

b49e3b2

SunMarc approved these changes Sep 5, 2024

View reviewed changes

src/transformers/integrations/ggml.py Outdated Show resolved Hide resolved

SunMarc requested a review from ArthurZucker September 5, 2024 14:11

assign GGUFQwen2Converter to qwen2_moe

1e9bf7e

LysandreJik approved these changes Sep 5, 2024

View reviewed changes

LysandreJik merged commit 5d11de4 into huggingface:main Sep 5, 2024
23 checks passed

SunMarc mentioned this pull request Sep 11, 2024

Community contribution: Adding GGUF support for more architectures #33260

Open

15 tasks

VladOS95-cyber mentioned this pull request Oct 4, 2024

Bug fix gguf qwen2moe #33940

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen2Moe GGUF loading support #33264

Add Qwen2Moe GGUF loading support #33264

VladOS95-cyber commented Sep 2, 2024 •

edited

Loading

VladOS95-cyber commented Sep 4, 2024 •

edited

Loading

SunMarc left a comment •

edited

Loading

VladOS95-cyber commented Sep 5, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 5, 2024

SunMarc commented Sep 5, 2024 •

edited

Loading

VladOS95-cyber commented Sep 5, 2024

LysandreJik left a comment

vanpelt commented Sep 5, 2024

SunMarc commented Oct 2, 2024 •

edited

Loading

VladOS95-cyber commented Oct 2, 2024

Add Qwen2Moe GGUF loading support #33264

Add Qwen2Moe GGUF loading support #33264

Conversation

VladOS95-cyber commented Sep 2, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

VladOS95-cyber commented Sep 4, 2024 • edited Loading

SunMarc left a comment • edited Loading

Choose a reason for hiding this comment

VladOS95-cyber commented Sep 5, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Sep 5, 2024

SunMarc commented Sep 5, 2024 • edited Loading

VladOS95-cyber commented Sep 5, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

vanpelt commented Sep 5, 2024

SunMarc commented Oct 2, 2024 • edited Loading

VladOS95-cyber commented Oct 2, 2024

VladOS95-cyber commented Sep 2, 2024 •

edited

Loading

VladOS95-cyber commented Sep 4, 2024 •

edited

Loading

SunMarc left a comment •

edited

Loading

VladOS95-cyber commented Sep 5, 2024 •

edited

Loading

SunMarc commented Sep 5, 2024 •

edited

Loading

SunMarc commented Oct 2, 2024 •

edited

Loading