Example on saving experts to one model when using distributed training #178

Luodian · 2022-08-07T06:07:57Z

Hi Thanks for providing such a wonderful codebase.

I have seen and used the save & load in MoE on multiple GPUs, now I can save them on different ranks. But is there away to convert them to one model?

Say, I trained a 8 experts MoE on 8 GPUs, and now I want to do next stage inference on 1 GPUs.

Will you consider provide an example on doing so? or could you provide some ideas on how to implement it myself.

ghostplant · 2022-08-08T01:38:11Z

A dup request of #177. We are going to add some utility functions to help with this conversion.

Luodian · 2022-08-08T05:31:17Z

thanks! I think it's worthy doing.

ghostplant added the duplicate This issue or pull request already exists label Aug 8, 2022

Provide feedback