Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[js/webgpu] Support GroupQueryAttention #20237

Merged
merged 11 commits into from
May 13, 2024
Merged

Conversation

axinging
Copy link
Contributor

@axinging axinging commented Apr 9, 2024

TODOs:

  1. Handle H * params.kvNumHeads greater than work group size limit.
  2. Support BNSH kv cache.

@axinging axinging force-pushed the groupqueryattn branch 2 times, most recently from 46f14e6 to cc13e08 Compare April 9, 2024 00:43
@axinging axinging changed the title [Nijs/webgpu] Support GroupQueryAttention [js/webgpu] Support GroupQueryAttention Apr 9, 2024
@axinging axinging force-pushed the groupqueryattn branch 8 times, most recently from e6e2a93 to 94aa95e Compare April 9, 2024 07:53
@axinging axinging force-pushed the groupqueryattn branch 5 times, most recently from 2c1c062 to 54ff1c4 Compare April 16, 2024 07:47
@axinging axinging marked this pull request as ready for review April 16, 2024 07:48
@axinging axinging marked this pull request as draft April 16, 2024 08:21
@axinging axinging marked this pull request as ready for review April 16, 2024 13:37
@axinging axinging marked this pull request as draft April 17, 2024 07:05
@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Apr 17, 2024
@axinging axinging changed the title [js/webgpu] Support GroupQueryAttention [js/webgpu] Support GroupQueryAttention(NotForReview) Apr 25, 2024
@axinging axinging force-pushed the groupqueryattn branch 7 times, most recently from 790376f to 20fe303 Compare April 26, 2024 06:10
@axinging axinging changed the title [js/webgpu] Support GroupQueryAttention(NotForReview) [js/webgpu] Support GroupQueryAttention Apr 26, 2024
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@guschmue
Copy link
Contributor

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline

@guschmue
Copy link
Contributor

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@guschmue
Copy link
Contributor

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 7 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@fs-eire
Copy link
Contributor

fs-eire commented May 11, 2024

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@fs-eire
Copy link
Contributor

fs-eire commented May 11, 2024

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

@fs-eire
Copy link
Contributor

fs-eire commented May 11, 2024

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@guschmue guschmue merged commit 8c59cd4 into microsoft:main May 13, 2024
82 checks passed
poweiw pushed a commit to poweiw/onnxruntime that referenced this pull request Jun 25, 2024
TODOs:
1. Handle H * params.kvNumHeads greater than work group size limit.
2. Support BNSH kv cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebGPU ort-web webgpu provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants