Support (Bias)SkipLayerNormalization fusion in GPT2 #13988

hariharans29 · 2022-12-15T16:56:51Z

Description

The GPT2 model has a slightly different SkipLayerNormalization pattern from BERT in that the residual to the next layer is not the output of LayerNorm like in BERT but the added residual and the input fed to the LayerNorm node thus making the Add and LayerNorm to not be fused as there is a consumer of the intermediate output of Add. This also means that any Add after MatMuls feeding into SkipLayerNortmalization won't be fused as well (as the pre-requisite to that fusion is that SkipLayerNormalization be fused first). This change adds support for this variant of SkipLayerNormalization (SLN) by adjusting the schema of SLN to have an optional output which will be the output of the addition of the residual and the input.

TODO: Add kernel test

Motivation and Context

Improve fusion coverage for GPT2 and help improve its perf and any language model using it

…crosoft/onnxruntime into hari/skip_layer_normalization

onnxruntime/test/python/transformers/test_attention_fusion.py

onnxruntime/python/tools/transformers/fusion_gpt_attention_no_past.py

onnxruntime/python/tools/transformers/fusion_embedlayer.py

onnxruntime/python/tools/transformers/fusion_skiplayernorm.py

tianleiwu · 2022-12-20T18:00:25Z

onnxruntime/python/tools/transformers/fusion_embedlayer.py

@@ -537,7 +537,14 @@ def fuse_gpt2(self, layernorm, add_before_layernorm, input_name_to_nodes, output
        if two_gather is None:
            return False

+        # If the add_before_layernorm node is an Add node, then the add_output output is the first index


nit: It is better to also update the comment at the beginning of this function. That comment only contains first case but not the second.

I donm't see any other comment at the beginnign of this function. Are you referring to the args description of is_embedding_sum_needed() before this by any chance ?

Streamlined some logic around optional_embedding_sum_output and add_output which both need the comment. It should be easier to understand the code now.

onnxruntime/core/graph/contrib_ops/bert_defs.cc

onnxruntime/test/python/transformers/test_optimizer.py

…lization

### Description Add support of ONNX conversion of GPT-2 for two stages: * Stage 1 is the initial stage that has empty past state. * Stage 2 has non-empty past state and sequence_length is 1. Add a parameter --stage to specify such stage. For stage 1, we will enable mask_index for Attention so that we can use fused attention in CUDA. Other changes: (1) use int32 inputs as default (otherwise, there is error in inference) (2) update gpt2_parity to include SkipLayerNormalization (see #13988) and EmbedLayerNormalization (3) get all environment variables that might impact GPT-2 latency in benchmark_gpt2 ### Motivation and Context To test fused attention for GPT-2 model for #13953.

### Description Add support of ONNX conversion of GPT-2 for two stages: * Stage 1 is the initial stage that has empty past state. * Stage 2 has non-empty past state and sequence_length is 1. Add a parameter --stage to specify such stage. For stage 1, we will enable mask_index for Attention so that we can use fused attention in CUDA. Other changes: (1) use int32 inputs as default (otherwise, there is error in inference) (2) update gpt2_parity to include SkipLayerNormalization (see microsoft#13988) and EmbedLayerNormalization (3) get all environment variables that might impact GPT-2 latency in benchmark_gpt2 ### Motivation and Context To test fused attention for GPT-2 model for microsoft#13953.

### Description 1. SkipLayerNormalization has a new output (#13988) and the symbolic shape inference script needs corresponding updates 2. The greedy sampling op (#13426) shouldn't re-use the logits buffer as its corresponding kernel doesn't seem to support it yet. ### Motivation and Context Fix some transformer issues

hariharans29 added 2 commits December 15, 2022 06:27

Initialc ommit

03d4d35

A

92f13bf

hariharans29 requested review from tianleiwu and wangyems December 15, 2022 16:57

hariharans29 and others added 6 commits December 15, 2022 09:19

Merge branch 'main' into hari/skip_layer_normalization

7b7c16a

Trial

0182bb8

Merge branch 'hari/skip_layer_normalization' of https://github.com/mi…

3cd2e13

…crosoft/onnxruntime into hari/skip_layer_normalization

Enable verbose log

1b46b96

Update first Attention fusion pattern

05e6fcc

Update second Attention fusion pattern

4a6cc0c

github-advanced-security bot found potential problems Dec 19, 2022

View reviewed changes

onnxruntime/test/python/transformers/test_attention_fusion.py Fixed Show fixed Hide fixed

hariharans29 added 5 commits December 19, 2022 04:03

Update CPU kernel

fcaf54f

Small fix

c301321

Fix more tests

9599d53

Nits

1a01612

Add logic in GPT Attention No Past

f23729e

hariharans29 commented Dec 20, 2022

View reviewed changes

onnxruntime/python/tools/transformers/fusion_gpt_attention_no_past.py Show resolved Hide resolved

hariharans29 changed the title ~~WIP: Support (Bias)SkipLayerNormalization fusion in GPT2~~ Support (Bias)SkipLayerNormalization fusion in GPT2 Dec 20, 2022

Add test file for real

d142531

hariharans29 commented Dec 20, 2022

View reviewed changes

onnxruntime/python/tools/transformers/fusion_embedlayer.py Outdated Show resolved Hide resolved

Check-in file for real

8297152

tianleiwu requested changes Dec 20, 2022

View reviewed changes

wangyems reviewed Dec 20, 2022

View reviewed changes

onnxruntime/test/python/transformers/test_optimizer.py Show resolved Hide resolved

tianleiwu mentioned this pull request Dec 21, 2022

Support two stages onnx GPT-2 conversion #14025

Merged

hariharans29 added 6 commits December 20, 2022 22:20

PR feedback

9851856

Merge remote-tracking branch 'origin/main' into hari/skip_layer_norma…

4bcd4ea

…lization

Update test file to pass the ROCM builds

7e0a691

More refinements

bdd3787

Docs ?

f4bd52f

Merge remote-tracking branch 'origin/main' into hari/skip_layer_norma…

a33ee13

…lization

hariharans29 added 2 commits December 21, 2022 06:45

Disable test for DML EP

b700143

Merge remote-tracking branch 'origin/main' into hari/skip_layer_norma…

9d193af

…lization

tianleiwu approved these changes Dec 21, 2022

View reviewed changes

hariharans29 merged commit 7ed8bd4 into main Dec 22, 2022

hariharans29 deleted the hari/skip_layer_normalization branch December 22, 2022 07:04

henrywu2019 pushed a commit to henrywu2019/onnxruntime that referenced this pull request Dec 26, 2022

Support (Bias)SkipLayerNormalization fusion in GPT2 (microsoft#13988)

3979854

hariharans29 mentioned this pull request Dec 31, 2022

Misc transformer fixes #14103

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support (Bias)SkipLayerNormalization fusion in GPT2 #13988

Support (Bias)SkipLayerNormalization fusion in GPT2 #13988

hariharans29 commented Dec 15, 2022 •

edited

Loading

tianleiwu Dec 20, 2022 •

edited

Loading

hariharans29 Dec 21, 2022

hariharans29 Dec 21, 2022

Support (Bias)SkipLayerNormalization fusion in GPT2 #13988

Support (Bias)SkipLayerNormalization fusion in GPT2 #13988

Conversation

hariharans29 commented Dec 15, 2022 • edited Loading

Description

Motivation and Context

tianleiwu Dec 20, 2022 • edited Loading

Choose a reason for hiding this comment

hariharans29 Dec 21, 2022

Choose a reason for hiding this comment

hariharans29 Dec 21, 2022

Choose a reason for hiding this comment

hariharans29 commented Dec 15, 2022 •

edited

Loading

tianleiwu Dec 20, 2022 •

edited

Loading