Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: non-contiguous query with paged kv cache #553

Merged
merged 2 commits into from
Oct 25, 2024

Conversation

LinHeLurking
Copy link
Contributor

Motivation

Previously, only ragged version of prefill kernel supported non-contiguous query tensor (#404). But with paged kv cache, you have to make query tensor contiguous. Libraries like vLLM or SGLang must make query tensor contiguous before calling flashinfer kernels (vLLM call of flashinfer, SGLang call of flashinfer). This PR solves it, ensuring that prefill/decode kernels with paged kv cache support non-contiguous query tensor.

Main Changes

  1. Add strides of query tensor in BatchPrefillPagedParams and BatchDecodeParams.
  2. Set stride parameters before calling those kernels.
  3. Modify JIT compiling templates to support new kernel parameters.
  4. Add some tests.

The Python interfaces remain the same. Nothing changes except it accepts non-contiguous query tensors now!

Signed-off-by: LinHeLurking <[email protected]>
Copy link
Collaborator

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @LinHeLurking and thank @reyoung for the review!

include/flashinfer/attention/decode.cuh Show resolved Hide resolved
@yzh119 yzh119 merged commit 89f2c4a into flashinfer-ai:main Oct 25, 2024
tsu-bin added a commit to tsu-bin/flashinfer_dev that referenced this pull request Oct 30, 2024
tsu-bin added a commit to tsu-bin/flashinfer_dev that referenced this pull request Oct 30, 2024
yzh119 pushed a commit that referenced this pull request Oct 30, 2024
Hi, when I try to rebase my current work, just found cpp integration
(benchmark and test) failed to build, this is introduced by the feature
#553.
Tests have been passed.

Co-authored-by: tsu-bin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants