-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc]: How to access the KV cache directly? #4156
Comments
Curios about this topic too, I want to implement a simple request transfer (including kv cache) between nodes. #2809 seems did it, but only support with infiniband, and has a dependency on MSCCL++. |
Any updates on this? |
interested in this as well, can anyone guide a few first steps? |
just use cudaIPChandle and cudamemcopy |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
Any updates on this? |
Anything you want to discuss about vllm.
I'm looking to conduct an experiment, which involves copying the contents of KV cache between nodes. I'm not super familiar with the codebase, is there any way to access the page table/KV cache directly? Where do I start? Any suggestions are helpful!
The text was updated successfully, but these errors were encountered: