Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In this PR, I provide full support of format conversion for mixtral-MoE architectures.
Converting mixtral-MoE (from either magnet source or huggingface source) to internal split and split-sparse consolidated checkpoints:
accessory/tools/mixtral_moe_split_from_hf.py
. Usage:python mixtral_moe_split_from_hf.py in-ckpt-dir output-ckpt-dir [--in_ckpt_source hf_or_magnet (default: hf)] [--convert_sparse (whether to convert to sparse format)]
This functionality is a refactoring fromhttps://huggingface.co/Alpha-VLLM/MoE-Mixtral-7B-8Expert/blob/main/converted/split.py
andhttps://huggingface.co/Alpha-VLLM/MoE-Mixtral-7B-8Expert/blob/main/converted_sparse/split_sparse.py
, but now it unifies the two scripts and supports huggingface checkpoint format.Converting mixtral-MoE consolidated checkpoint to huggingface format:
accessory/tools/convert_weights_to_hf.py
. The usage is the same for mixtral-moe compared to that of llama, except that--mixtral
is needed to identify it is a mixtral moe architecture. Note: we only support consolidated checkpoints of split format yet, rather than split_sparse which is a future work.