-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have any plans to optimize the decode kernel for NV-Hopper #576
Comments
ref #507 (comment) |
Hi @JamesLim-sy , if I understand it correctly, I think what you mean is that using some SM for decode and some other SM within the same cluster for merge states to use distributed shared memory, is that correct? |
@yzh119 Yes, after my profiling, time cost by |
I noticed hopper cluster setting may have a chance to optimize the performance of batch_decode by merging
VariableLengthMergeStates
withBatchDecodeWithPagedKVCacheKernel
. Is there any plan to use SM90 features for it ?The text was updated successfully, but these errors were encountered: