-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Whisper] Fix word-level timestamps for audio < 30 seconds #25607
[Whisper] Fix word-level timestamps for audio < 30 seconds #25607
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Looks like there were some failing unit tests, but they were actually wrong 😅 Original unit test:{
"text": " Conquered returned to its place amidst the tents.",
"chunks": [
{"text": " Conquered", "timestamp": (29.78, 29.9)},
{"text": " returned", "timestamp": (29.9, 29.9)},
{"text": " to", "timestamp": (29.9, 29.9)},
{"text": " its", "timestamp": (29.9, 29.9)},
{"text": " place", "timestamp": (29.9, 29.9)},
{"text": " amidst", "timestamp": (29.9, 29.9)},
{"text": " the", "timestamp": (29.9, 29.9)},
{"text": " tents.", "timestamp": (29.9, 29.9)}
]
} New (fixed) unit test:{
"text": " Conquered returned to its place amidst the tents.",
"chunks": [
{"text": " Conquered", "timestamp": (0.5, 1.2)},
{"text": " returned", "timestamp": (1.2, 1.64)},
{"text": " to", "timestamp": (1.64, 1.84)},
{"text": " its", "timestamp": (1.84, 2.02)},
{"text": " place", "timestamp": (2.02, 2.28)},
{"text": " amidst", "timestamp": (2.28, 2.78)},
{"text": " the", "timestamp": (2.78, 2.96)},
{"text": " tents.", "timestamp": (2.96, 3.48)},
],
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice catch! Will leave @sanchit-gandhi have a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really nice already @xenova! Thanks for the fix! Echo'ing @ArthurZucker's suggestion about using the generation config, but otherwise looks top 👌
Co-authored-by: Arthur <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely thanks @xenova! LGTM! Feel free to merge @ArthurZucker if you're happy with the PR
* Fix language detection * Remove debug statement * Fix punctuation regex for whisper decoding (Closes #223) * Fix word-level timestamps for audio < 30 seconds Issue in python library: huggingface/transformers#25605 PR for above: huggingface/transformers#25607 * Add multilingual transcription w/ word-level timestamps unit test * Fix unit tests
Gently pinging @ArthurZucker for the final 👍 before merge - thank you again for the PR @xenova! |
Sorry for the miss reviewing now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the wait! Looks super good to me!
Left a very small nit 😉
Thanks for fixing
Thank you for your contribution @xenova! Thanks! |
Hi there - It's not yet merged, but will hopefully be soon! |
Do you happen to know when the next version will be out? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quickly pushed the last requested changes! Will merge when the CI is green!
Hi @xenova, I just upgraded to the latest version of Transformers (4.33.1) and tried the following. It seems that the word timestamps are still incorrect. Thanks! ###########
vx0['chunks']: |
This is because a full release hasn't come out yet. To fix it, you can install from source (see docs):
|
…ce#25607) * Fix word-level timestamps for audio < 30 seconds * Fix code quality * fix unit tests * Fix unit tests * Fix unit test * temp: print out result * temp: set max diff to None * fix unit tests * fix typo * Fix typo Co-authored-by: Arthur <[email protected]> * Use generation config for `num_frames` * fix docs * Move `num_frames` to kwargs * compute stride/attn_mask once * mark test as slow --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: sanchit-gandhi <[email protected]>
…ce#25607) * Fix word-level timestamps for audio < 30 seconds * Fix code quality * fix unit tests * Fix unit tests * Fix unit test * temp: print out result * temp: set max diff to None * fix unit tests * fix typo * Fix typo Co-authored-by: Arthur <[email protected]> * Use generation config for `num_frames` * fix docs * Move `num_frames` to kwargs * compute stride/attn_mask once * mark test as slow --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: sanchit-gandhi <[email protected]>
…ce#25607) * Fix word-level timestamps for audio < 30 seconds * Fix code quality * fix unit tests * Fix unit tests * Fix unit test * temp: print out result * temp: set max diff to None * fix unit tests * fix typo * Fix typo Co-authored-by: Arthur <[email protected]> * Use generation config for `num_frames` * fix docs * Move `num_frames` to kwargs * compute stride/attn_mask once * mark test as slow --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: sanchit-gandhi <[email protected]>
Hi guys, I just tried whisper v3 and find that your updated code is gone in the current main branch. |
What does this PR do?
In OpenAI's original implementation for word-level timestamps, they crop the cross attentions before perform dynamic time warping (to only run the algorithm on valid audio; this prevents getting stuck when backtracking). The current transformers implementation misses this, so this PR fixes that.
Testing code:
Fixed:
Previous (broken):
Fixes #25605 (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sanchit-gandhi @ArthurZucker