Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new transformers model type: Bart #8698

Merged
merged 9 commits into from
Aug 25, 2021
Merged

Add new transformers model type: Bart #8698

merged 9 commits into from
Aug 25, 2021

Conversation

wangyems
Copy link
Contributor

@wangyems wangyems commented Aug 12, 2021

Description: Describe your changes.
issues that apply:
#8637
#7796
-enable the a new type as "bart" in transformer optimizer
-enhance the reshape fusion for huggingface(4.9) bart model
-fuse the self-attention in encoder

Linux/V-100/intra_num_threads=24

Basically benefit cpu with smaller size. For large size one could disable attention fusion

Before:

model EP precision b1_s128 b1_s512 b2_s128 b2_s512
bart-base cpu fp32 71.94 339.22 131.94 432.31
bart-large cpu fp32 213.95 656.69 377.22 1287.50
bart-base cuda fp32 5.30 14.67 10.89 25.53
bart-large cuda fp32 13.82 43.64 21.25 81.58

After:

model EP precision b1_s128 b1_s512 b2_s128 b2_s512
bart-base cpu fp32 59.73 219.49 102.28 414.53
bart-large cpu fp32 189.58 699.35 328.35 1323.91
bart-base cuda fp32 5.12 14.61 10.00 25.43
bart-large cuda fp32 13.22 43.29 20.89 81.12

TODO: (Will include in the separate PR when get to them)
-accepts attention masks
-implement cross-attention in decoder
-fuse the decoder attention

Motivation and Context

  • Why is this change required? What problem does it solve?
  • If it fixes an open issue, please link to the issue here.

@wangyems wangyems requested review from tianleiwu and viboga August 13, 2021 21:41
@wangyems wangyems marked this pull request as ready for review August 13, 2021 21:42
@wangyems wangyems requested a review from a team as a code owner August 13, 2021 21:42
@wangyems wangyems changed the title [WIP]Add new transformers model type: Bart Add new transformers model type: Bart Aug 13, 2021
@yufenglee
Copy link
Member

Do you perf improvement for fp16 on GPU? And why there is no improvement for fp32 on GPU?

@wangyems wangyems requested a review from tianleiwu August 20, 2021 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants