Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper-large-v3 compatibility #1530

Merged
merged 6 commits into from
Nov 8, 2023
Merged

Conversation

funboarder13920
Copy link
Contributor

@funboarder13920 funboarder13920 commented Nov 7, 2023

openai whisper large-v3 introduces change from 80 to 128 in mel input feature.
exposing n_mels is required to propagate the input size to the audio feature extractor

we also need to add the large-v3 alignment heads
a fix is required in the computation of _is_multilingual

Valentin Berkes added 2 commits November 7, 2023 13:08
openai whisper large-v3 introduces change from 80 to 128 in mel input feature.
exposing n_mels is required to propagate the input size to the audio feature extractor
@chiiyeh
Copy link
Contributor

chiiyeh commented Nov 8, 2023

Hi could you also add to here: https://github.com/OpenNMT/CTranslate2/blob/master/python/ctranslate2/converters/transformers.py#L1929-L2042

    "openai/whisper-large-v3": [
        (7, 0),
        (10, 17),
        (12, 18),
        (13, 12),
        (16, 1),
        (17, 14),
        (19, 11),
        (21, 4),
        (24, 1),
        (25, 6),
    ],

obtained from here: https://github.com/openai/whisper/blob/fcfeaf1b61994c071bba62da47d7846933576ac9/whisper/__init__.py#L45

funboarder13920 pushed a commit to funboarder13920/CTranslate2 that referenced this pull request Nov 8, 2023
@funboarder13920 funboarder13920 changed the title expose n_mels whisper-large-v3 compatibility Nov 8, 2023
funboarder13920 pushed a commit to funboarder13920/CTranslate2 that referenced this pull request Nov 8, 2023
@RafaRed
Copy link

RafaRed commented Nov 8, 2023

The current check for multilingual support seems to be hardcoded with a specific vocabulary size:
_is_multilingual = vocabulary.size() == 51865;
Link to code

For instance, I believe the whisper-latest-v3 model has a vocabulary size of 51866, which is one more than the hardcoded value. This discrepancy could lead to the multilingual feature being incorrectly disabled for this model.

Probably a more dynamic check need to be implemented to ensure compatibility with future models.

edit: oh sorry, did not notice its already fixed on PR.

@funboarder13920
Copy link
Contributor Author

funboarder13920 commented Nov 8, 2023

The current check for multilingual support seems to be hardcoded with a specific vocabulary size: _is_multilingual = vocabulary.size() == 51865; Link to code

For instance, I believe the whisper-latest-v3 model has a vocabulary size of 51866, which is one more than the hardcoded value. This discrepancy could lead to the multilingual feature being incorrectly disabled for this model.

Probably a more dynamic check need to be implemented to ensure compatibility with future models.

A fix is already in this PR :

_is_multilingual = vocabulary.size() >= 51865;

@Purfview
Copy link

Purfview commented Nov 8, 2023

@vince62s Can you merge this?

@@ -2039,4 +2039,16 @@ def main():
(26, 12),
(27, 15),
],
"openai/whisper-large-v3": [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible worth adding a comment on the source of these since its different from the source on L1928.

@vince62s vince62s merged commit 23f744f into OpenNMT:master Nov 8, 2023
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants