Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of attention cuda kernel: move prepare qkv and concat_past_to_present #17559

Merged
merged 3 commits into from
Sep 15, 2023

Conversation

tianleiwu
Copy link
Contributor

Description

  • Move PrepareQKV to separate cu file (attention_prepare_qkv.cu)
  • Move ConcatPastToPresent to attention_concat.cu
  • Add default value for AttentionData
  • Add a data structure QkvData to track Q, K and V pointers and track QKV format.

Motivation and Context

To avoid a huge cu file and make code more readable.

@tianleiwu tianleiwu marked this pull request as draft September 15, 2023 01:24
@tianleiwu tianleiwu marked this pull request as ready for review September 15, 2023 15:40
@tianleiwu tianleiwu merged commit adb0be4 into main Sep 15, 2023
@tianleiwu tianleiwu deleted the tlwu/prepare_qkv_refactor branch September 15, 2023 17:57
@faxu faxu added triage:approved Approved for cherrypicks for release sdxl_llama labels Oct 25, 2023
tianleiwu added a commit that referenced this pull request Oct 31, 2023
…t_to_present (#17559)

To avoid a huge cu file and make code more readable:
 - Move PrepareQKV to separate cu file (attention_prepare_qkv.cu)
 - Move ConcatPastToPresent to attention_concat.cu
 - Add default value for AttentionData
- Add a data structure QkvData to track Q, K and V pointers and track
QKV format.
@tianleiwu tianleiwu removed triage:approved Approved for cherrypicks for release release:1.16.2 labels Nov 1, 2023
kleiti pushed a commit to kleiti/onnxruntime that referenced this pull request Mar 22, 2024
…t_to_present (microsoft#17559)

To avoid a huge cu file and make code more readable:
 - Move PrepareQKV to separate cu file (attention_prepare_qkv.cu)
 - Move ConcatPastToPresent to attention_concat.cu
 - Add default value for AttentionData
- Add a data structure QkvData to track Q, K and V pointers and track
QKV format.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants