Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix gqa cpu nan bug #20521

Merged
merged 15 commits into from
May 7, 2024
Merged

fix gqa cpu nan bug #20521

merged 15 commits into from
May 7, 2024

Conversation

aciddelgado
Copy link
Contributor

Description

There was a bug with gqa on cpu where on token case, with batch_size > 1, and with past_present_share_buffer off, the output would occasionally contain nans. this pr fixes that. it also updates documentation and fixes posid gen for rotary in cuda in prompt case.

Motivation and Context

this pr solves the GQA CPU bug as well as updates the documentation and makes seqlens_k irrelevant for prompt case, which is useful to prevent user error.

@yufenglee
Copy link
Member

Could you please enable to run the unit test in CI?

docs/ContribOperators.md Outdated Show resolved Hide resolved
@sophies927 sophies927 added the triage:approved Approved for cherrypicks for release label May 3, 2024
@aciddelgado aciddelgado requested a review from a team as a code owner May 6, 2024 17:39
@yihonglyu yihonglyu requested a review from tianleiwu May 7, 2024 03:53
tianleiwu
tianleiwu previously approved these changes May 7, 2024
@tianleiwu
Copy link
Contributor

tianleiwu commented May 7, 2024

@aciddelgado aciddelgado merged commit 4e27841 into main May 7, 2024
95 checks passed
@aciddelgado aciddelgado deleted the aciddelgado/gqa_cpu_fix branch May 7, 2024 22:19
@yihonglyu yihonglyu added the cherry-picked Cherry-picked for a cherrypicks branch label May 9, 2024
yihonglyu pushed a commit that referenced this pull request May 9, 2024
### Description
There was a bug with gqa on cpu where on token case, with batch_size >
1, and with past_present_share_buffer off, the output would occasionally
contain nans. this pr fixes that. it also updates documentation and
fixes posid gen for rotary in cuda in prompt case.



### Motivation and Context
this pr solves the GQA CPU bug as well as updates the documentation and
makes seqlens_k irrelevant for prompt case, which is useful to prevent
user error.
@yihonglyu yihonglyu added the rel-merged Cherrypicks merged into release label May 10, 2024
poweiw pushed a commit to poweiw/onnxruntime that referenced this pull request Jun 25, 2024
### Description
There was a bug with gqa on cpu where on token case, with batch_size >
1, and with past_present_share_buffer off, the output would occasionally
contain nans. this pr fixes that. it also updates documentation and
fixes posid gen for rotary in cuda in prompt case.



### Motivation and Context
this pr solves the GQA CPU bug as well as updates the documentation and
makes seqlens_k irrelevant for prompt case, which is useful to prevent
user error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked Cherry-picked for a cherrypicks branch rel-merged Cherrypicks merged into release release:1.18.0 triage:approved Approved for cherrypicks for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants