Add static cache support for Whisper #30707

mobicham · 2024-05-08T11:13:06Z

Feature request

Would be great to have static cache support for Whisper to make it faster with torch.compile. Currently, the generate() function doesn't support cache_implementation="static" for Whisper.

Motivation

Static cache with torch.compile can make generation much faster.

Your contribution

Static cache is already supported for LLMs and we see great speed-up.

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-05-08T16:46:07Z

cc @sanchit-gandhi

huseinzol05 · 2024-05-10T07:36:17Z

Let me try, I think I can make it, just need to patch https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L313 like llama model and pass cache_position should be ok

mobicham · 2024-05-10T07:42:50Z

@huseinzol05 great, thanks ! I think you also need to make sure the model supports initializing the static cache via _setup_cache:

from transformers import StaticCache
model._setup_cache(StaticCache, batch_size, max_cache_len=max_cache_length)

huseinzol05 · 2024-05-11T07:49:05Z

I got hit by pytorch/pytorch#123592 at https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L230, but the static cache is already working without torch compile from my local, arange should solved the problem

mobicham · 2024-05-11T09:53:54Z

Maybe you can use arange instead like here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L964-L966

huseinzol05 · 2024-05-11T11:02:55Z

Anything dynamic not possible, feed position_ids solved the problem, just like cache_position, i will push the initial later, so you can verify, the speedz is good

mobicham · 2024-05-11T11:18:55Z

Great 👍 ! But that arange works well in Llama with fullgraph torch compile.

huseinzol05 · 2024-05-11T14:17:26Z

#30760

Compiled static cache able to achieve 186.26it/s while non-compiled got 150.20it/s

ArthurZucker · 2024-09-06T09:30:01Z

Closing as this is fixed: #31166 and #31772

amyeroberts added Feature request Request for a new feature Audio labels May 8, 2024

mobicham changed the title ~~Add support for static cache with Whisper~~ Add support for static cache for Whisper May 9, 2024

mobicham changed the title ~~Add support for static cache for Whisper~~ Add static cache support for Whisper May 9, 2024

kadirnar mentioned this issue May 9, 2024

Add Hqq + Compile + Flash Attention support kadirnar/whisper-plus#98

Open

ArthurZucker closed this as completed Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add static cache support for Whisper #30707

Add static cache support for Whisper #30707

mobicham commented May 8, 2024

amyeroberts commented May 8, 2024

huseinzol05 commented May 10, 2024

mobicham commented May 10, 2024

huseinzol05 commented May 11, 2024 •

edited

Loading

mobicham commented May 11, 2024

huseinzol05 commented May 11, 2024 •

edited

Loading

mobicham commented May 11, 2024 •

edited

Loading

huseinzol05 commented May 11, 2024 •

edited

Loading

ArthurZucker commented Sep 6, 2024

Add static cache support for Whisper #30707

Add static cache support for Whisper #30707

Comments

mobicham commented May 8, 2024

Feature request

Motivation

Your contribution

amyeroberts commented May 8, 2024

huseinzol05 commented May 10, 2024

mobicham commented May 10, 2024

huseinzol05 commented May 11, 2024 • edited Loading

mobicham commented May 11, 2024

huseinzol05 commented May 11, 2024 • edited Loading

mobicham commented May 11, 2024 • edited Loading

huseinzol05 commented May 11, 2024 • edited Loading

ArthurZucker commented Sep 6, 2024

huseinzol05 commented May 11, 2024 •

edited

Loading

huseinzol05 commented May 11, 2024 •

edited

Loading

mobicham commented May 11, 2024 •

edited

Loading

huseinzol05 commented May 11, 2024 •

edited

Loading