fix gqa cpu nan bug #20521

aciddelgado · 2024-04-30T18:06:58Z

Description

There was a bug with gqa on cpu where on token case, with batch_size > 1, and with past_present_share_buffer off, the output would occasionally contain nans. this pr fixes that. it also updates documentation and fixes posid gen for rotary in cuda in prompt case.

Motivation and Context

this pr solves the GQA CPU bug as well as updates the documentation and makes seqlens_k irrelevant for prompt case, which is useful to prevent user error.

onnxruntime/test/python/transformers/test_gqa_cpu.py

onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu

yufenglee · 2024-05-02T15:19:22Z

Could you please enable to run the unit test in CI?

docs/ContribOperators.md

onnxruntime/test/python/transformers/test_flash_attn_cuda.py

tianleiwu · 2024-05-07T04:13:27Z

Could you please enable to run the unit test in CI?

The test is skipped in CI:
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1372239&view=logs&j=95838853-166c-5f4e-049e-ad4aefc5dff3&t=2000d094-9251-5d29-452e-e2ad87fa05ed&l=96288

onnxruntime/test/python/transformers/test_gqa_cpu.py

onnxruntime/test/python/transformers/test_flash_attn_cuda.py

onnxruntime/test/python/transformers/test_gqa_cpu.py

onnxruntime/contrib_ops/cuda/bert/group_query_attention.cc

### Description There was a bug with gqa on cpu where on token case, with batch_size > 1, and with past_present_share_buffer off, the output would occasionally contain nans. this pr fixes that. it also updates documentation and fixes posid gen for rotary in cuda in prompt case. ### Motivation and Context this pr solves the GQA CPU bug as well as updates the documentation and makes seqlens_k irrelevant for prompt case, which is useful to prevent user error.

fix cpu nan bug, update documentation, fix promptposid gen for cuda

d0d7980

github-advanced-security bot found potential problems Apr 30, 2024

View reviewed changes

onnxruntime/test/python/transformers/test_gqa_cpu.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Apr 30, 2024

View reviewed changes

onnxruntime/test/python/transformers/test_gqa_cpu.py Fixed Show fixed Hide fixed

aciddelgado added 4 commits April 30, 2024 14:29

pipeline fix

780bd5c

docs

8f35c6d

lint

958c975

Merge branch 'main' into aciddelgado/gqa_cpu_fix

9c29446

yufenglee added the release:1.18.0 label May 1, 2024

tianleiwu reviewed May 2, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu Outdated Show resolved Hide resolved

tianleiwu reviewed May 2, 2024

View reviewed changes

docs/ContribOperators.md Outdated Show resolved Hide resolved

tianleiwu reviewed May 2, 2024

View reviewed changes

onnxruntime/test/python/transformers/test_flash_attn_cuda.py Outdated Show resolved Hide resolved

sophies927 added the triage:approved Approved for cherrypicks for release label May 3, 2024

aciddelgado added 3 commits May 3, 2024 15:40

ci pipeline

1b8a928

lint

786ea74

address comments and activate cuda test

e52361f

aciddelgado requested a review from a team as a code owner May 6, 2024 17:39

aciddelgado added 4 commits May 6, 2024 12:44

docs

3d8dacb

lint and rerun

b4846e6

disable broken tests

00216d8

lint

9c895f7

yihonglyu requested a review from tianleiwu May 7, 2024 03:53

tianleiwu previously approved these changes May 7, 2024

View reviewed changes

tianleiwu reviewed May 7, 2024

View reviewed changes

onnxruntime/test/python/transformers/test_gqa_cpu.py Outdated Show resolved Hide resolved

unmark slow

9b2d5ed

aciddelgado dismissed tianleiwu’s stale review via 9b2d5ed May 7, 2024 15:52

github-advanced-security bot found potential problems May 7, 2024

View reviewed changes

onnxruntime/test/python/transformers/test_gqa_cpu.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_flash_attn_cuda.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems May 7, 2024

View reviewed changes

onnxruntime/test/python/transformers/test_flash_attn_cuda.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_gqa_cpu.py Fixed Show fixed Hide fixed

lint

7cca082

test bounds

c121cca

tianleiwu approved these changes May 7, 2024

View reviewed changes

yufenglee reviewed May 7, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/bert/group_query_attention.cc Show resolved Hide resolved

yufenglee approved these changes May 7, 2024

View reviewed changes

snnn approved these changes May 7, 2024

View reviewed changes

aciddelgado merged commit 4e27841 into main May 7, 2024
95 checks passed

aciddelgado deleted the aciddelgado/gqa_cpu_fix branch May 7, 2024 22:19

yihonglyu added the cherry-picked Cherry-picked for a cherrypicks branch label May 9, 2024

yihonglyu added the rel-merged Cherrypicks merged into release label May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix gqa cpu nan bug #20521

fix gqa cpu nan bug #20521

aciddelgado commented Apr 30, 2024

yufenglee commented May 2, 2024

tianleiwu commented May 7, 2024 •

edited

Loading

fix gqa cpu nan bug #20521

fix gqa cpu nan bug #20521

Conversation

aciddelgado commented Apr 30, 2024

Description

Motivation and Context

yufenglee commented May 2, 2024

tianleiwu commented May 7, 2024 • edited Loading

tianleiwu commented May 7, 2024 •

edited

Loading