Add linear layer and ffn config to enable TransformerEngine layers (with FP8) #432

vchiley · 2023-07-06T06:03:28Z

This PR adds a config for Linear Layers and FFN modules which allows the use TransformerEngine's te.Linear and te.LayerNormMLP modules (which have fp8 with amp.fp8).
+ I did a little cleanup

This PR is built on top of #271

In the future, this'll also allow us to add and prototype other linear layers and ffn blocks. Furthermore it enables us to configure TP/SP for the MLP block, in the build_ffn util fn.

AMP FP8 training gets results which are nearly identical to AMP BF16:

but has faster runtime.

Furthermore ffn_config_defaults: ffn_type: te_ln_mlp allows us to use transformer engine's LayerNormMLP layer which has SP and TP support if configured correctly.

llmfoundry/models/mpt/configuration_mpt.py

vancoykendall · 2023-07-06T22:29:48Z

I'm excited that this PR is being worked on, mainly because I've been extending the MPTMLP myself in order to try using gated linear units (GEGLU, SwiGLU, etc) and also decided to add a ffn_config in my implementation. I was wondering though, why expansion_ratio is not a part of the ffn_config? From my understanding, expansion_ratio is only used when building the ffn block, so it makes more sense to me for it to be defined in the ffn_config. Also, what does fc stand for in fc_type? It isn't clear to me.

vchiley · 2023-07-06T22:34:50Z

@vancoykendall

fc stands for fully connected layer

While expansion_ratio should really be part of the ffn_config, moving it into the ffn_config breaks backwards compatibility with models already uploaded to HuggingFace hub and all of our internal checkpoints so we are leaving that config variable alone.

scripts/train/train.py

sashaDoubov · 2023-07-17T17:56:13Z

LGTM

vchiley and others added 19 commits June 2, 2023 18:32

adding te Linear for fp8 support

93750a1

integ warning / config auto updt

479ef70

req updts

0897820

updt FC_CLASS_REGISTRY to enable init

202fd27

updt fc.py

2c79644

updt

28d8a39

skip running install in readme

24ac1db

Update attention.py

e475487

Merge branch 'main' into teLinear

6f9e47c

Merge branch 'main' into teLinear

5a02e82

Merge branch 'main' into teLinear

a515217

Merge branch 'main' into teLinear

7f0aa38

Merge branch 'main' into teLinear

c88b929

Merge branch 'main' into teLinear

7950ae0

Merge branch 'main' into teLinear

bf2c009

Merge branch 'main' into teLinear

1b755cf

Merge branch 'main' into teLinear

a047f59

add te layers and linear / ffn config

34b6614

cleanup

6785de1

vchiley requested review from sashaDoubov, dakinggg and abhi-mosaic July 6, 2023 06:03

vchiley self-assigned this Jul 6, 2023

vchiley and others added 3 commits July 5, 2023 23:03

Merge branch 'main' into cfgte

3d3562e

add te linear and fp8 instructions

c36a02b

install help

bcb83bc

vchiley force-pushed the cfgte branch 3 times, most recently from 92dbf7e to 23ff7b9 Compare July 6, 2023 15:08

updt imports

cdf66cb

vchiley added 2 commits July 6, 2023 16:58

add ffn file

3e4ed1c

updt

5b1be87

dakinggg reviewed Jul 6, 2023

View reviewed changes

llmfoundry/models/mpt/configuration_mpt.py Outdated Show resolved Hide resolved

vchiley force-pushed the cfgte branch from 260c08c to c36d945 Compare July 6, 2023 17:57

leave ex_ratio alone

e8d3049

vchiley force-pushed the cfgte branch from c36d945 to e8d3049 Compare July 6, 2023 18:05

vchiley requested a review from dakinggg July 6, 2023 18:06

vchiley added 5 commits July 6, 2023 18:24

updt warning

8029680

lint

5452db0

add default PE

7f869fe

updt tutorial

cf6b72d

add tutorial warning

3fa2ad7

vchiley marked this pull request as ready for review July 6, 2023 20:19

vchiley mentioned this pull request Jul 7, 2023

Enable SequenceParallel in te.LayerNormMLP layer vchiley/llm-foundry#1

Draft

vchiley and others added 8 commits July 7, 2023 14:11

cleanup

efa05aa

updt yaml checks

e177968

undo line erase

f5e87a5

diable torch._dynamo for te.LayerNormMLP

51b4e93

Merge branch 'main' into cfgte

51f69cb

Merge branch 'main' into cfgte

05f705a

Merge branch 'main' into cfgte

6416695

updt te version

6006c60

sashaDoubov reviewed Jul 17, 2023

View reviewed changes

scripts/train/train.py Outdated Show resolved Hide resolved

updt conditional

776c9a7

sashaDoubov approved these changes Jul 17, 2023

View reviewed changes

vchiley merged commit 340a566 into mosaicml:main Jul 17, 2023

casper-hansen mentioned this pull request Oct 21, 2023

Training with FP8 axolotl-ai-cloud/axolotl#755

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add linear layer and ffn config to enable TransformerEngine layers (with FP8) #432

Add linear layer and ffn config to enable TransformerEngine layers (with FP8) #432

vchiley commented Jul 6, 2023 •

edited

Loading

vancoykendall commented Jul 6, 2023

vchiley commented Jul 6, 2023

sashaDoubov commented Jul 17, 2023

Add linear layer and ffn config to enable TransformerEngine layers (with FP8) #432

Add linear layer and ffn config to enable TransformerEngine layers (with FP8) #432

Conversation

vchiley commented Jul 6, 2023 • edited Loading

vancoykendall commented Jul 6, 2023

vchiley commented Jul 6, 2023

sashaDoubov commented Jul 17, 2023

vchiley commented Jul 6, 2023 •

edited

Loading