Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support mqa in checkpoint-merging tools #40

Merged
merged 3 commits into from
May 8, 2023

Conversation

RaymondLi0
Copy link
Collaborator

@RaymondLi0 RaymondLi0 commented Mar 22, 2023

  • add support for MQA models in checkpoint_utils, and should work fine.
  • support checkpoints with distributed optimizer in checkpoint util
    TODO: test that the resulting models produces the same output.

@RaymondLi0
Copy link
Collaborator Author

I had to add mp.set_start_method('spawn') here: https://github.com/bigcode-project/Megatron-LM/blob/mqa-checkpoint-utils/tools/checkpoint_util.py#L106
But not sure whether it's just an issue with my environment

@RaymondLi0
Copy link
Collaborator Author

RaymondLi0 commented Apr 3, 2023

Added a --use-distributed-optimizer to the checkpoint_util script, so that it can load the correct checkpoint naming scheme.

The checkpoint-loader uses the data-parallel rank to get the name of the optimizer file to load. The problem is that it is not initialized in this script (and it is not required). Since only the model file is used, we circumvent the issue by setting loading the 0-th optimizer shard, which won't be used anyway. a8e64f6#diff-122925dfa160fba3c00803abba3577ef0d5aa5ab48989032a63d41c91f2a8002R122

EDIT: changed to not load any optimizer state at all, instead of arbitrarily loading the 0-th shard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants