Add LLaMA GQA ragged batching #18337

kunal-vaishnavi · 2023-11-08T00:46:42Z

Description

This PR updates replacing MHA with GQA and updates the LLaMA scripts for the modified GQA op. It is related to the changes in this PR.

Motivation and Context

This PR allows us to run LLaMA with the GQA op end-to-end using ragged batching (i.e. batched inputs of different lengths).

This PR updates replacing MHA with GQA and updates the LLaMA scripts for the modified GQA op. It is related to the changes in [this PR](#18283). ### Motivation and Context This PR allows us to run LLaMA with the GQA op end-to-end using ragged batching (i.e. batched inputs of different lengths).

onnxruntime/python/tools/transformers/convert_generation.py

This PR updates replacing MHA with GQA and updates the LLaMA scripts for the modified GQA op. It is related to the changes in [this PR](microsoft#18283). ### Motivation and Context This PR allows us to run LLaMA with the GQA op end-to-end using ragged batching (i.e. batched inputs of different lengths).

kunal-vaishnavi added 3 commits November 8, 2023 00:27

Fix GQA ragged batching

7ccec20

Remove GQA left padding fix

645225e

Add changes suggested by linter

21fd855

kunal-vaishnavi added release:1.16.2 labels Nov 8, 2023

kunal-vaishnavi added 2 commits November 8, 2023 01:49

Merge branch 'main' into kvaishnavi/llama-fix-gqa-batching

1eabc50

Update README

d1b5511

tianleiwu requested a review from frank-dong-ms November 8, 2023 15:49

yufenglee approved these changes Nov 8, 2023

View reviewed changes

tianleiwu merged commit c8def0c into microsoft:main Nov 8, 2023

tianleiwu reviewed Nov 8, 2023

View reviewed changes

onnxruntime/python/tools/transformers/convert_generation.py Show resolved Hide resolved

tianleiwu removed release:1.16.2 labels Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLaMA GQA ragged batching #18337

Add LLaMA GQA ragged batching #18337

kunal-vaishnavi commented Nov 8, 2023 •

edited

Loading

Add LLaMA GQA ragged batching #18337

Add LLaMA GQA ragged batching #18337

Conversation

kunal-vaishnavi commented Nov 8, 2023 • edited Loading

Description

Motivation and Context

kunal-vaishnavi commented Nov 8, 2023 •

edited

Loading