-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add static cache support for Whisper #30707
Comments
Let me try, I think I can make it, just need to patch https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L313 like llama model and pass |
@huseinzol05 great, thanks ! I think you also need to make sure the model supports initializing the static cache via from transformers import StaticCache
model._setup_cache(StaticCache, batch_size, max_cache_len=max_cache_length) |
I got hit by pytorch/pytorch#123592 at https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L230, but the static cache is already working without torch compile from my local, arange should solved the problem |
Maybe you can use |
Anything dynamic not possible, feed |
Great 👍 ! But that |
Compiled static cache able to achieve 186.26it/s while non-compiled got 150.20it/s |
Feature request
Would be great to have static cache support for Whisper to make it faster with
torch.compile
. Currently, thegenerate()
function doesn't supportcache_implementation="static"
for Whisper.Motivation
Static cache with torch.compile can make generation much faster.
Your contribution
Static cache is already supported for LLMs and we see great speed-up.
The text was updated successfully, but these errors were encountered: