[Flax Examples] Seq2Seq ASR Fine-Tuning Script #21764

sanchit-gandhi · 2023-02-23T16:24:26Z

What does this PR do?

Can be used to fine-tune Flax Whisper for speech recognition.

Tested and verified as working with the following (dummy) config:

run_flax_speech_recognition_seq2seq.py \
            --model_name_or_path openai/whisper-tiny.en \
            --dataset_name hf-internal-testing/librispeech_asr_dummy \
            --dataset_config clean \
            --train_split_name validation \
            --eval_split_name validation \
            --output_dir whisper-tiny-ft-dummy \
            --overwrite_output_dir \
            --num_train_epochs=2 \
            --max_train_samples 10 \
            --max_eval_samples 10 \
            --warmup_steps=8 \
            --do_train \
            --do_eval \
            --learning_rate=2e-4 \
            --per_device_train_batch_size=2 \
            --per_device_eval_batch_size=1 \
            --predict_with_generate

Will add a README with preliminary training configs / results later this week after doing a full fine-tuning run.

cc @peregilk @andyehrenberg for interest

HuggingFaceDocBuilderDev · 2023-02-23T16:53:55Z

The documentation is not available anymore as the PR was closed or merged.

peregilk · 2023-04-03T15:35:22Z

@sanchit-gandhi @andyehrenberg

We have made a version of this script will support streaming and training on the TPU pods.

The current version of the script is available here:
https://github.com/NbAiLab/nb-whisper/blob/main/run_flax_speech_recognition_seq2seq_streaming.py

We are however struggling with a bug at the moment. The script seems to work for training the Tiny models on multiple pod sizes. Both for scaling for speed and for increasing the batch size. All the other model sizes (small, base, medium, large) also works on the single TPU v4-8. However, training on the non-Tiny-model sizes runs for a few steps then freezes.

If anyone have any idea about this could be happening, I really appreciate it.

github-actions · 2023-09-05T08:04:00Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sanchit-gandhi · 2023-09-28T18:26:25Z

Given the popularity of the PyTorch fine-tuning script and Whisper JAX, it's a pretty easy addition adding a Whisper fine-tuning script in JAX/Flax.

Note: this is largely based off the distil-whisper training script, but simplified to run offline, with just 1 training dataset and the cross-entropy objective https://github.com/huggingface/distil-whisper#training

ArthurZucker

Thanks a lot! Looks great 😉

* from seq2seq speech * [Flax] Example script for speech seq2seq * tests and fixes * make style * fix: label padding tokens * fix: label padding tokens over list * update ln names for Whisper * try datasets iter loader * create readme and append results * style * make style * adjust lr * use pt dataloader * make fast * pin gen max len * finish * add pt to requirements for test * fix pt -> torch * add accelerate

sanchit-gandhi force-pushed the flax-speech-seq2seq branch from f3a2c44 to 13b6487 Compare February 24, 2023 13:41

sanchit-gandhi mentioned this pull request Feb 27, 2023

Flax Whisper predicts erroneous exclamation mark #21791

Closed

4 tasks

github-actions bot closed this Apr 3, 2023

sanchit-gandhi reopened this Apr 19, 2023

huggingface deleted a comment from github-actions bot May 15, 2023

sanchit-gandhi mentioned this pull request Jun 12, 2023

Fix bug in using TPU #24041

Closed

4 tasks

huggingface deleted a comment from github-actions bot Jun 12, 2023

github-actions bot closed this Jul 15, 2023

sanchit-gandhi reopened this Jul 28, 2023

huggingface deleted a comment from github-actions bot Jul 28, 2023

sanchit-gandhi force-pushed the flax-speech-seq2seq branch from 6470651 to 575c7fd Compare July 28, 2023 10:36

sanchit-gandhi marked this pull request as ready for review August 11, 2023 11:08

github-actions bot closed this Sep 13, 2023

sanchit-gandhi reopened this Sep 28, 2023

sanchit-gandhi added 12 commits September 28, 2023 19:06

from seq2seq speech

ea6b3db

[Flax] Example script for speech seq2seq

50799e1

tests and fixes

1506455

make style

411db20

fix: label padding tokens

a164f1b

fix: label padding tokens over list

40d1851

update ln names for Whisper

f2921ea

try datasets iter loader

66ff540

create readme and append results

d548854

style

9fe1516

make style

e178cbf

adjust lr

9a01bfc

sanchit-gandhi added 6 commits September 28, 2023 19:06

use pt dataloader

dfa2fc1

make fast

6ea443e

pin gen max len

2275f2d

finish

fe9f4dc

add pt to requirements for test

7d15201

fix pt -> torch

bca0ad6

sanchit-gandhi force-pushed the flax-speech-seq2seq branch from f6fc0fb to bca0ad6 Compare September 28, 2023 18:06

add accelerate

0425632

sanchit-gandhi requested a review from ArthurZucker September 28, 2023 18:25

ArthurZucker approved these changes Sep 29, 2023

View reviewed changes

sanchit-gandhi merged commit 68e85fc into huggingface:main Sep 29, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flax Examples] Seq2Seq ASR Fine-Tuning Script #21764

[Flax Examples] Seq2Seq ASR Fine-Tuning Script #21764

sanchit-gandhi commented Feb 23, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 23, 2023 •

edited

Loading

peregilk commented Apr 3, 2023

github-actions bot commented Sep 5, 2023

sanchit-gandhi commented Sep 28, 2023

ArthurZucker left a comment

[Flax Examples] Seq2Seq ASR Fine-Tuning Script #21764

[Flax Examples] Seq2Seq ASR Fine-Tuning Script #21764

Conversation

sanchit-gandhi commented Feb 23, 2023 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Feb 23, 2023 • edited Loading

peregilk commented Apr 3, 2023

github-actions bot commented Sep 5, 2023

sanchit-gandhi commented Sep 28, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

sanchit-gandhi commented Feb 23, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 23, 2023 •

edited

Loading