Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mixtral-MoE format adaptors #145

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

llylly
Copy link

@llylly llylly commented Jan 19, 2024

In this PR, I provide full support of format conversion for mixtral-MoE architectures.

  • Converting mixtral-MoE (from either magnet source or huggingface source) to internal split and split-sparse consolidated checkpoints: accessory/tools/mixtral_moe_split_from_hf.py. Usage: python mixtral_moe_split_from_hf.py in-ckpt-dir output-ckpt-dir [--in_ckpt_source hf_or_magnet (default: hf)] [--convert_sparse (whether to convert to sparse format)] This functionality is a refactoring from https://huggingface.co/Alpha-VLLM/MoE-Mixtral-7B-8Expert/blob/main/converted/split.py and https://huggingface.co/Alpha-VLLM/MoE-Mixtral-7B-8Expert/blob/main/converted_sparse/split_sparse.py, but now it unifies the two scripts and supports huggingface checkpoint format.

  • Converting mixtral-MoE consolidated checkpoint to huggingface format: accessory/tools/convert_weights_to_hf.py. The usage is the same for mixtral-moe compared to that of llama, except that --mixtral is needed to identify it is a mixtral moe architecture. Note: we only support consolidated checkpoints of split format yet, rather than split_sparse which is a future work.

@ChrisLiu6 ChrisLiu6 self-assigned this Jan 20, 2024
Copy link
Collaborator

@ChrisLiu6 ChrisLiu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest refactoring the accessory/tools directory as follows:

accessory/tools
--checkpoint_conversion
----mixtral
------convert_from_hf_or_magnet.py ( the current mixtral_moe_split_from_hf.py)
------convert_to_hf.py
----llama
-------convert_to_hf.py ( the original convert_weights_to_hf.py)
--...

All other things look good to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to use separate scripts to handle the conversion of llama and mixtral respectively, as we may support a lot more new models in the future and one conversion script could hardly support them all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants