support mqa in checkpoint-merging tools #40

RaymondLi0 · 2023-03-22T03:52:23Z

add support for MQA models in checkpoint_utils, and should work fine.
support checkpoints with distributed optimizer in checkpoint util
TODO: test that the resulting models produces the same output.

RaymondLi0 · 2023-03-22T03:57:08Z

I had to add mp.set_start_method('spawn') here: https://github.com/bigcode-project/Megatron-LM/blob/mqa-checkpoint-utils/tools/checkpoint_util.py#L106
But not sure whether it's just an issue with my environment

RaymondLi0 · 2023-04-03T18:56:01Z

Added a --use-distributed-optimizer to the checkpoint_util script, so that it can load the correct checkpoint naming scheme.

The checkpoint-loader uses the data-parallel rank to get the name of the optimizer file to load. The problem is that it is not initialized in this script (and it is not required). Since only the model file is used, we circumvent the issue by setting loading the 0-th optimizer shard, which won't be used anyway. a8e64f6#diff-122925dfa160fba3c00803abba3577ef0d5aa5ab48989032a63d41c91f2a8002R122

EDIT: changed to not load any optimizer state at all, instead of arbitrarily loading the 0-th shard

support mqa in checkpoint-merging tools

bd12802

RaymondLi0 requested review from loubnabnl and jlamypoirier March 22, 2023 04:01

support checkpoints with distrib optimizer in checkpoint-util

a8e64f6

don't load optimizer instead of arbitrarily loading dp-rank 0

57f21b7

loubnabnl approved these changes Apr 30, 2023

View reviewed changes

RaymondLi0 merged commit 1a7d54b into multi-query-attention May 8, 2023

RaymondLi0 deleted the mqa-checkpoint-utils branch May 8, 2023 15:55

RaymondLi0 mentioned this pull request May 15, 2023

Support MQA in tools.checkpoint_loader/saver_megatron #19

Closed

zxyscz mentioned this pull request Aug 30, 2023

Megatron-LM fine-tuning: No such file or directory model_optim_rng.pt bigcode-project/octopack#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support mqa in checkpoint-merging tools #40

support mqa in checkpoint-merging tools #40

RaymondLi0 commented Mar 22, 2023 •

edited

Loading

RaymondLi0 commented Mar 22, 2023

RaymondLi0 commented Apr 3, 2023 •

edited

Loading

support mqa in checkpoint-merging tools #40

support mqa in checkpoint-merging tools #40

Conversation

RaymondLi0 commented Mar 22, 2023 • edited Loading

RaymondLi0 commented Mar 22, 2023

RaymondLi0 commented Apr 3, 2023 • edited Loading

RaymondLi0 commented Mar 22, 2023 •

edited

Loading

RaymondLi0 commented Apr 3, 2023 •

edited

Loading