add flax whisper implementation #20479

andyehrenberg · 2022-11-28T22:15:54Z

Adds Flax whisper implementations, and adjusts flax generation utils to support it.

See discussion in #19512

ydshieh · 2022-11-29T09:55:49Z

Thank you for the PR. However, a pull request should focus on a single objective/goal, rather than changing multiple things at the same time which are not absolutely coupled.

Please

follow the pytorch implementation regarding the past_key_values
revert the changes on the flax generation utils
(You may want to have a backup branch to save these changes for future pull requests.)

The goal of this PR is to add Flax implementation of Whisper. For other changes, it's better to open issue tickets, and if we all agree with the proposals, a PR could proceed :-)

Thank you!

andyehrenberg · 2022-11-29T16:24:42Z

I see a few other instances in this repo where the pytorch implementation computes past_key_values_length while the flax implementation uses position_ids (BART, OPT, etc) - to me, keeping consistency among the APIs of the flax models is something we should strive for. What do you think @ydshieh @patrickvonplaten ?

Happy to remove the changes to the generation stuff and open a separate PR for that - will definitely do this to make flax Whisper generation work!

ydshieh · 2022-11-29T17:16:44Z

I wasn't aware of that inconsistency, thank you for pointing out. This is a good question! But I don't think that's a very serious problem so far - the most important thing is the different frameworks produce the same outputs when feeding the same (supported) inputs + the API on the top model levels being consistent.

(The internal computation could be somehow different - if there is good reason)

In any case, this could be discussed in an issue and we can proceed with a PR once decided :-)

ydshieh · 2022-11-29T17:20:11Z

BTW, there is some issue for triggering CircleCI. The message is

Could not find a usable config.yml, you may have revoked the CircleCI OAuth app.
Please sign out of CircleCI and log back in with your VCS before triggering a new pipeline.

Do you use some IDE to push the commits? Could you try to push the commit with a commandline tool or some git GUI tools instead?

HuggingFaceDocBuilderDev · 2022-11-29T17:52:46Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2022-11-30T11:28:16Z

Also cc @sanchit-gandhi

…into flax_whisper

ArthurZucker · 2022-12-02T09:01:51Z

Hey! Thanks for opening the follow PR 🤗

I don't think I agree with @ydshieh here, adding the flax_generation_utils along with whisper totally makes sense as it was done for pytorch and tf, and is required to add the generation tests which are currently missing!
Regarding the past_key_values, we don't really strive to match transformers with other APIs, rather I think we prefer consistency within our own library, and code clarity.
However you can still open an issue and we can discuss whether we should refactor the design of past_key_values for our flax model!

Will have a look at the PR 😉

ydshieh · 2022-12-02T11:06:44Z

You are right! I am not aware of those generation features are introduced when you added Whisper @ArthurZucker . Sorry about that, @andyehrenberg !

sanchit-gandhi · 2022-12-02T16:34:59Z

Super excited by this PR! 🚀 Feel free to tag me with questions / review requests as well @andyehrenberg 🤗

ArthurZucker

Nice work there!
Not really think we are gonna push for the scan methods, but it is debatable. @sgugger correct me if I am wrong

src/transformers/generation/flax_logits_process.py

src/transformers/models/whisper/modeling_flax_whisper.py

ArthurZucker · 2022-12-02T16:19:15Z

src/transformers/models/whisper/modeling_flax_whisper.py

+        if attention_mask is not None:
+            if position_ids is None:
+                position_ids = attention_mask.cumsum(-1) - 1
+        if position_ids is None:
+            batch_size, sequence_length = input_ids.shape
+            position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))


Would be great if we could follow the simple logic that we have in the pytorch version where we use the input_ids with self.embed_positions(input_ids, past_key_values_length=past_key_values_length.

I think we should stick with computing position_ids to keep a similar api to the other flax models, and because this better handles the scenario where we have a batch to run generation for with different decoder prompt lengths. The pytorch version ends up just using past_key_values_length to compute something akin to position_ids, but we can just use the attention_mask to figure them out. I'd actually argue we should change the pytorch whisper implementation to use position_ids, because as it currently stands it'll fail to decode batches of varying decoder prompt lengths - it should take more inspiration from the decoder-only models that compute position_ids as opposed to the encoder-decoder models that don't assume decoder prefixes.

I agree with @andyehrenberg that we should use the Flax implementation here. However, it would be better still in terms of Flax compatibility if this logic went under the decode's __call__ method, rather than under FlaxWhisperDecoder (as we do in Flax MBart for example)

src/transformers/models/whisper/modeling_flax_whisper.py

tests/models/whisper/test_modeling_flax_whisper.py

ArthurZucker · 2023-01-26T14:01:30Z

Also sorry! We just modified Whisper quit a bit 😅

andyehrenberg · 2023-01-26T14:18:09Z

Also sorry! We just modified Whisper quit a bit 😅

@ArthurZucker - Doesn't actually look too bad to catch up with those changes! Can do that soon-ish. I already have a jax timestamp processor that's compilable.

sanchit-gandhi · 2023-01-27T15:46:24Z

Oh no - sorry you have to iterate again here @andyehrenberg! Feel free to ping me with any questions / discussions - more than happy to help with the final sprint of the integration! Otherwise super excited to review a final time before merge! 🚀

…into flax_whisper

andyehrenberg · 2023-02-04T15:22:48Z

@sanchit-gandhi - I think this is ready for another look - the recent commits (I think) get us to feature parity with the torch version.

andyehrenberg · 2023-02-13T15:17:52Z

@sanchit-gandhi Bump

ArthurZucker

Wow! Very clean, thanks a lot for the long work! I just left 1 comment on testing the timestamp generation but should be good to merge otherwise! cc @sanchit-gandhi

src/transformers/generation/flax_utils.py

src/transformers/generation/flax_logits_process.py

ArthurZucker · 2023-02-14T13:11:26Z

tests/models/whisper/test_modeling_flax_whisper.py

+        # fmt: on
+
+        transcript = processor.batch_decode(generated_ids, skip_special_tokens=True)
+        self.assertListEqual(transcript, EXPECTED_TRANSCRIPT)


Can you add the test_tiny_timestamp_generation where you can test if jit compile produces the correct timestamps?

This is just to make sure that the logit processor correctly predicts them. I speak from TF experience, my code worked but when compiling it started failing 😓

Just added - some local sanity checks were working for me under jit compilation at least!

Co-authored-by: Arthur <[email protected]>

andyehrenberg · 2023-02-15T21:25:39Z

@sanchit-gandhi @ArthurZucker - Addressed Arthur's comments and cleaned up the timestamp logits processor a bit. Hopefully we're close to getting this merged!

sanchit-gandhi

Very nice @andyehrenberg! Thanks for iterating here - reviewed the new changes and the PR is looking super clean. Last request from me is if we can avoid defining the if_true() functions if possible and just add the code explicitly! Good for merge otherwise :)

src/transformers/generation/flax_logits_process.py

andyehrenberg · 2023-02-17T22:12:16Z

Very nice @andyehrenberg! Thanks for iterating here - reviewed the new changes and the PR is looking super clean. Last request from me is if we can avoid defining the if_true() functions if possible and just add the code explicitly! Good for merge otherwise :)

For sure, made those changes :)

sgugger

Thanks again for your contribution!

* add flax whisper implementation * rever change to setup * remove unused imports * revert generation changes * flax whisper docs * docs * import order * import sorting * isort * add dummy objects * doc formatting * formatting * remove trailing whitespaces * fix flax whisper docs * add generation logic to unlock flax whisper * remove scans * give credits to Flax Bart implementation * remove unused imports * add license * remove assert * more credits to Bart * fix style * formatting * support left padding * add flax whisper generation test * remove copied from comments whenever not a full copy * fix docstrings for logits processors * revert change to FlaxForceTokensLogitsProcessor * revert doc changes * improve generation docs * reorganize * formatting * cleanup docs * add tests * handle empty list case * fix forced decoder ids in flax tests * add flax whisper to inits * upate dummy objects * docs for FlaxAutoModelForSpeechSeq2Seq * fix decoder_position_ids computation in pretrained model decode/__call__ fns * add Copied from statements as necessary * compute position_ids only in __call__ and decode methods of pretrained model subclasses * improve readabilityof compute positional embeddings * check dimensionality of input_features instead of hidden_states * copied from statement for init_cache * formatting * fix copies * fix copies * pass attention mask to encoder layers * fix decoder module outputs * set dtype Co-authored-by: Sanchit Gandhi <[email protected]> * smaller flax model for whisper test * Update src/transformers/generation/flax_utils.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/whisper/modeling_flax_whisper.py Co-authored-by: Sylvain Gugger <[email protected]> * Update tests/models/whisper/test_modeling_flax_whisper.py Co-authored-by: Sylvain Gugger <[email protected]> * cleanup Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/whisper/modeling_flax_whisper.py Co-authored-by: Sylvain Gugger <[email protected]> * bias cleanup * doc fix * align style for force tokens processor * readability * fix input shape in tests * revert FlaxGenerationMixin docstring * formatting * fix tests * fix imports * consistent encoder hidden states * consistent hidden states * input shapes * typo * partial class trick * partial class for input shape * base_class with correct input shape * partial base classes * match by name * set main_input_name * compare on names * formatting * remove unused import * safer position ids computation * safer position id computation * Update src/transformers/models/whisper/modeling_flax_whisper.py Co-authored-by: Sanchit Gandhi <[email protected]> * Update src/transformers/models/whisper/modeling_flax_whisper.py Co-authored-by: Sanchit Gandhi <[email protected]> * remove identical inherited tests * fix prompt ids in tests * use generation config * use jnp array * better var names * more explicit bias use * import transformers * formatting * test formatting * remove unused imports * remove unused imports * formatting * isort * docs * fix ln orders for encoder hidden states * whisper unique generation stuff * flake * use finfo for attention bias * docs * Update src/transformers/generation/flax_utils.py Co-authored-by: Arthur <[email protected]> * docs * add timestamp flax test * jit for timestamps * formatting * clean up timestamps processor * formatting * remove if_true * cleanup --------- Co-authored-by: Sanchit Gandhi <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Arthur <[email protected]>

Tungbillee · 2024-02-06T10:22:21Z

Is there any instructions to open the google cloud TPU port, admin?

andyehrenberg added 3 commits November 28, 2022 14:13

add flax whisper implementation

7d3b6ef

rever change to setup

a9bed4c

remove unused imports

0312993

revert generation changes

c71fe4f

flax whisper docs

828d800

andyehrenberg and others added 12 commits December 1, 2022 08:05

docs

baafb1c

Merge branch 'huggingface:main' into flax_whisper

7dba8b5

import order

2da5a58

Merge branch 'flax_whisper' of github.com:andyehrenberg/transformers …

5ee9c1f

…into flax_whisper

import sorting

00f695f

isort

0ecc03b

add dummy objects

f66a005

doc formatting

175f344

formatting

3329e6c

remove trailing whitespaces

c05089b

fix flax whisper docs

7551181

Merge branch 'huggingface:main' into flax_whisper

153f2cb

andyehrenberg marked this pull request as ready for review December 1, 2022 19:10

andyehrenberg mentioned this pull request Dec 1, 2022

Support token suppression, forced tokens (besides eos and bos), and decoder prompting for flax generation #20539

Closed

ArthurZucker self-assigned this Dec 2, 2022

ArthurZucker reviewed Dec 2, 2022

View reviewed changes

fix ln orders for encoder hidden states

1daee2b

andyehrenberg and others added 6 commits February 3, 2023 09:19

Merge branch 'main' into flax_whisper

fdb0a61

whisper unique generation stuff

632c4be

Merge branch 'flax_whisper' of github.com:andyehrenberg/transformers …

95403d6

…into flax_whisper

flake

c5c3ac1

use finfo for attention bias

907905f

docs

9dbcda8

ArthurZucker approved these changes Feb 14, 2023

View reviewed changes

andyehrenberg and others added 7 commits February 14, 2023 14:12

Update src/transformers/generation/flax_utils.py

d36cd2c

Co-authored-by: Arthur <[email protected]>

docs

ab01cfc

add timestamp flax test

62d172a

jit for timestamps

455b8bf

formatting

89658d0

clean up timestamps processor

a75fd03

formatting

758d56c

sanchit-gandhi approved these changes Feb 17, 2023

View reviewed changes

src/transformers/generation/flax_logits_process.py Outdated Show resolved Hide resolved

src/transformers/generation/flax_logits_process.py Outdated Show resolved Hide resolved

src/transformers/generation/flax_logits_process.py Outdated Show resolved Hide resolved

andyehrenberg added 2 commits February 17, 2023 13:02

remove if_true

f9ac652

cleanup

94a526e

sgugger approved these changes Feb 20, 2023

View reviewed changes

sgugger merged commit 2840272 into huggingface:main Feb 20, 2023

andyehrenberg deleted the flax_whisper branch February 28, 2023 20:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add flax whisper implementation #20479

add flax whisper implementation #20479

andyehrenberg commented Nov 28, 2022 •

edited

Loading

ydshieh commented Nov 29, 2022

andyehrenberg commented Nov 29, 2022 •

edited

Loading

ydshieh commented Nov 29, 2022 •

edited

Loading

ydshieh commented Nov 29, 2022

HuggingFaceDocBuilderDev commented Nov 29, 2022 •

edited

Loading

patrickvonplaten commented Nov 30, 2022

ArthurZucker commented Dec 2, 2022 •

edited

Loading

ydshieh commented Dec 2, 2022

sanchit-gandhi commented Dec 2, 2022

ArthurZucker left a comment

ArthurZucker Dec 2, 2022

andyehrenberg Dec 2, 2022 •

edited

Loading

sanchit-gandhi Dec 15, 2022

ArthurZucker commented Jan 26, 2023

andyehrenberg commented Jan 26, 2023 •

edited

Loading

sanchit-gandhi commented Jan 27, 2023

andyehrenberg commented Feb 4, 2023

andyehrenberg commented Feb 13, 2023

ArthurZucker left a comment

ArthurZucker Feb 14, 2023

ArthurZucker Feb 14, 2023

andyehrenberg Feb 14, 2023

andyehrenberg commented Feb 15, 2023

sanchit-gandhi left a comment •

edited

Loading

andyehrenberg commented Feb 17, 2023

sgugger left a comment

Tungbillee commented Feb 6, 2024

add flax whisper implementation #20479

add flax whisper implementation #20479

Conversation

andyehrenberg commented Nov 28, 2022 • edited Loading

ydshieh commented Nov 29, 2022

andyehrenberg commented Nov 29, 2022 • edited Loading

ydshieh commented Nov 29, 2022 • edited Loading

ydshieh commented Nov 29, 2022

HuggingFaceDocBuilderDev commented Nov 29, 2022 • edited Loading

patrickvonplaten commented Nov 30, 2022

ArthurZucker commented Dec 2, 2022 • edited Loading

ydshieh commented Dec 2, 2022

sanchit-gandhi commented Dec 2, 2022

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Dec 2, 2022

Choose a reason for hiding this comment

andyehrenberg Dec 2, 2022 • edited Loading

Choose a reason for hiding this comment

sanchit-gandhi Dec 15, 2022

Choose a reason for hiding this comment

ArthurZucker commented Jan 26, 2023

andyehrenberg commented Jan 26, 2023 • edited Loading

sanchit-gandhi commented Jan 27, 2023

andyehrenberg commented Feb 4, 2023

andyehrenberg commented Feb 13, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Feb 14, 2023

Choose a reason for hiding this comment

ArthurZucker Feb 14, 2023

Choose a reason for hiding this comment

andyehrenberg Feb 14, 2023

Choose a reason for hiding this comment

andyehrenberg commented Feb 15, 2023

sanchit-gandhi left a comment • edited Loading

Choose a reason for hiding this comment

andyehrenberg commented Feb 17, 2023

sgugger left a comment

Choose a reason for hiding this comment

Tungbillee commented Feb 6, 2024

andyehrenberg commented Nov 28, 2022 •

edited

Loading

andyehrenberg commented Nov 29, 2022 •

edited

Loading

ydshieh commented Nov 29, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 29, 2022 •

edited

Loading

ArthurZucker commented Dec 2, 2022 •

edited

Loading

andyehrenberg Dec 2, 2022 •

edited

Loading

andyehrenberg commented Jan 26, 2023 •

edited

Loading

sanchit-gandhi left a comment •

edited

Loading