Add new transformers model type: Bart #8698

wangyems · 2021-08-12T01:13:48Z

Description: Describe your changes.
issues that apply:
#8637
#7796
-enable the a new type as "bart" in transformer optimizer
-enhance the reshape fusion for huggingface(4.9) bart model
-fuse the self-attention in encoder

Linux/V-100/intra_num_threads=24

Basically benefit cpu with smaller size. For large size one could disable attention fusion

Before:

model	EP	precision	b1_s128	b1_s512	b2_s128	b2_s512
bart-base	cpu	fp32	71.94	339.22	131.94	432.31
bart-large	cpu	fp32	213.95	656.69	377.22	1287.50
bart-base	cuda	fp32	5.30	14.67	10.89	25.53
bart-large	cuda	fp32	13.82	43.64	21.25	81.58

After:

model	EP	precision	b1_s128	b1_s512	b2_s128	b2_s512
bart-base	cpu	fp32	59.73	219.49	102.28	414.53
bart-large	cpu	fp32	189.58	699.35	328.35	1323.91
bart-base	cuda	fp32	5.12	14.61	10.00	25.43
bart-large	cuda	fp32	13.22	43.29	20.89	81.12

TODO: (Will include in the separate PR when get to them)
-accepts attention masks
-implement cross-attention in decoder
-fuse the decoder attention

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.

onnxruntime/python/tools/transformers/fusion_reshape.py

yufenglee · 2021-08-16T17:53:26Z

Do you perf improvement for fp16 on GPU? And why there is no improvement for fp32 on GPU?

wangyems added 8 commits August 10, 2021 20:11

update

d4ccb35

bart-base encoder attention fusion

22b22d0

update

80379f0

update

b66cb47

update

ebc71c0

update

9b80728

update

864bb6a

yapf

04560e7

wangyems requested review from tianleiwu and viboga August 13, 2021 21:41

wangyems marked this pull request as ready for review August 13, 2021 21:42

wangyems requested a review from a team as a code owner August 13, 2021 21:42

wangyems changed the title ~~[WIP]Add new transformers model type: Bart~~ Add new transformers model type: Bart Aug 13, 2021

tianleiwu reviewed Aug 16, 2021

View reviewed changes

onnxruntime/python/tools/transformers/fusion_reshape.py Outdated Show resolved Hide resolved

tianleiwu reviewed Aug 16, 2021

View reviewed changes

onnxruntime/python/tools/transformers/fusion_reshape.py Outdated Show resolved Hide resolved

review comments

e381215

wangyems requested a review from tianleiwu August 20, 2021 19:15

tianleiwu approved these changes Aug 25, 2021

View reviewed changes

wangyems merged commit 56b37e5 into master Aug 25, 2021

wangyems deleted the wangye/bart branch August 25, 2021 01:13

tianleiwu mentioned this pull request Jan 19, 2022

ONNX Exported BART Model Performance is degraded than native pytorch on T4 #7796

Open

sam-writer mentioned this pull request Jan 20, 2022

why do transformer onnx model perform slower than pytorch model for longer sequence length? #6835

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new transformers model type: Bart #8698

Add new transformers model type: Bart #8698

wangyems commented Aug 12, 2021 •

edited

Loading

yufenglee commented Aug 16, 2021

Add new transformers model type: Bart #8698

Add new transformers model type: Bart #8698

Conversation

wangyems commented Aug 12, 2021 • edited Loading

yufenglee commented Aug 16, 2021

wangyems commented Aug 12, 2021 •

edited

Loading