fix MegatronLayerPolicy to be compatible with the newest ParallelTransformerLayer #4236

dc3671 · 2023-08-30T09:29:01Z

Problem

Currently in DeepSpeed, will try to get client_module.attention while the actual ParallelTransformerLayer has no attention but self_attention according to this file https://github.com/microsoft/Megatron-DeepSpeed/blob/main/megatron/model/transformer.py#L927

The whole structure is:

Solution

I followed existed way by modifying version attribute to make it compatible with the newest ParallelTransformerLayer, and change attention to self_attention.

But I'm not sure whether ParallelTransformerLayer is changed or it's a new module, which means changing MegatronLayerPolicy.version is correct.

…sformerLayer

RezaYazdaniAminabadi

LGTM. Thanks @dc3671

fix MegatronLayerPolicy to be compatible with the newest ParallelTran…

0a329ac

…sformerLayer

dc3671 requested review from RezaYazdaniAminabadi, jeffra, mrwyattii, awan-10, cmikeh2 and arashb as code owners August 30, 2023 09:29

RezaYazdaniAminabadi approved these changes Aug 30, 2023

View reviewed changes

Merge branch 'master' into fix-megatron-gpt

2d97cf0

tjruwase added this pull request to the merge queue Aug 30, 2023

Merged via the queue into deepspeedai:master with commit 6cbf666 Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix MegatronLayerPolicy to be compatible with the newest ParallelTransformerLayer #4236

fix MegatronLayerPolicy to be compatible with the newest ParallelTransformerLayer #4236

dc3671 commented Aug 30, 2023

RezaYazdaniAminabadi left a comment

fix MegatronLayerPolicy to be compatible with the newest ParallelTransformerLayer #4236

fix MegatronLayerPolicy to be compatible with the newest ParallelTransformerLayer #4236

Conversation

dc3671 commented Aug 30, 2023

Problem

Solution

RezaYazdaniAminabadi left a comment

Choose a reason for hiding this comment