Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Granitemoe #33207

Merged
merged 100 commits into from
Sep 20, 2024
Merged

Granitemoe #33207

merged 100 commits into from
Sep 20, 2024

Conversation

mayank31398
Copy link
Contributor

@mayank31398 mayank31398 commented Aug 29, 2024

This PR adds support for IBM's PowerMoE model (3B)
This PR will also form a basis for IBM's upcoming MoE models by end of this month

text models: @ArthurZucker and @younesbelkada

@mayank31398 mayank31398 marked this pull request as ready for review September 3, 2024 19:22
@mayank31398
Copy link
Contributor Author

Hi @ArthurZucker any updates on this?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just add an integration tests with generation!

@ArthurZucker
Copy link
Collaborator

Thanks, let's merge!

@ArthurZucker ArthurZucker merged commit e472e07 into huggingface:main Sep 20, 2024
18 of 20 checks passed
@mayank31398
Copy link
Contributor Author

Thanks

@mayank31398 mayank31398 deleted the granitemoe branch September 21, 2024 20:16
@mayank31398 mayank31398 restored the granitemoe branch September 24, 2024 21:28
@mayank31398 mayank31398 deleted the granitemoe branch September 24, 2024 21:45
amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Oct 2, 2024
* first commit

* drop tokenizer

* drop tokenizer

* drop tokenizer

* drop convert

* granite

* drop tokenization test

* mup

* fix

* reformat

* reformat

* reformat

* fix docs

* stop checking for checkpoint

* update support

* attention multiplier

* update model

* tiny drop

* saibo drop

* skip test

* fix test

* fix test

* drop

* drop useless imports

* update docs

* drop flash function

* copied from

* drop pretraining tp

* drop pretraining tp

* drop pretraining tp

* drop unused import

* drop code path

* change name

* softmax scale

* head dim

* drop legacy cache

* rename params

* cleanup

* fix copies

* comments

* add back legacy cache

* multipliers

* multipliers

* multipliers

* text fix

* fix copies

* merge

* multipliers

* attention multiplier

* drop unused imports

* add granitemoe

* add decoration

* remove moe from sequenceclassification

* fix test

* fix

* fix

* fix

* move rope?

* merge

* drop bias

* drop bias

* Update src/transformers/models/granite/configuration_granite.py

Co-authored-by: Arthur <[email protected]>

* fix

* Update src/transformers/models/granite/modeling_granite.py

Co-authored-by: Arthur <[email protected]>

* fix

* fix

* fix

* fix

* drop

* drop

* fix

* fix

* cleanup

* cleanup

* fix

* fix granite tests

* fp32 test

* fix

* drop jitter

* fix

* rename

* rename

* fix config

* add gen test

---------

Co-authored-by: Yikang Shen <[email protected]>
Co-authored-by: Arthur <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants