whisper-large-v3 compatibility #1530

funboarder13920 · 2023-11-07T12:16:05Z

openai whisper large-v3 introduces change from 80 to 128 in mel input feature.
exposing n_mels is required to propagate the input size to the audio feature extractor

we also need to add the large-v3 alignment heads
a fix is required in the computation of _is_multilingual

openai whisper large-v3 introduces change from 80 to 128 in mel input feature. exposing n_mels is required to propagate the input size to the audio feature extractor

chiiyeh · 2023-11-08T07:36:40Z

Hi could you also add to here: https://github.com/OpenNMT/CTranslate2/blob/master/python/ctranslate2/converters/transformers.py#L1929-L2042

    "openai/whisper-large-v3": [
        (7, 0),
        (10, 17),
        (12, 18),
        (13, 12),
        (16, 1),
        (17, 14),
        (19, 11),
        (21, 4),
        (24, 1),
        (25, 6),
    ],

obtained from here: https://github.com/openai/whisper/blob/fcfeaf1b61994c071bba62da47d7846933576ac9/whisper/__init__.py#L45

see OpenNMT#1530 (comment)

RafaRed · 2023-11-08T10:09:09Z

The current check for multilingual support seems to be hardcoded with a specific vocabulary size:
_is_multilingual = vocabulary.size() == 51865;
Link to code

For instance, I believe the whisper-latest-v3 model has a vocabulary size of 51866, which is one more than the hardcoded value. This discrepancy could lead to the multilingual feature being incorrectly disabled for this model.

Probably a more dynamic check need to be implemented to ensure compatibility with future models.

edit: oh sorry, did not notice its already fixed on PR.

funboarder13920 · 2023-11-08T10:18:28Z

The current check for multilingual support seems to be hardcoded with a specific vocabulary size: _is_multilingual = vocabulary.size() == 51865; Link to code

For instance, I believe the whisper-latest-v3 model has a vocabulary size of 51866, which is one more than the hardcoded value. This discrepancy could lead to the multilingual feature being incorrectly disabled for this model.

Probably a more dynamic check need to be implemented to ensure compatibility with future models.

A fix is already in this PR :

CTranslate2/src/models/whisper.cc

Line 73 in 7615e41

_is_multilingual = vocabulary.size() >= 51865;

Purfview · 2023-11-08T15:00:04Z

@vince62s Can you merge this?

ostegm · 2023-11-08T15:45:47Z

python/ctranslate2/converters/transformers.py

@@ -2039,4 +2039,16 @@ def main():
        (26, 12),
        (27, 15),
    ],
+    "openai/whisper-large-v3": [


Possible worth adding a comment on the source of these since its different from the source on L1928.

Valentin Berkes added 2 commits November 7, 2023 13:08

expose n_mels

261687c

openai whisper large-v3 introduces change from 80 to 128 in mel input feature. exposing n_mels is required to propagate the input size to the audio feature extractor

Merge branch 'master' into n_mels_param

3836555

Purfview mentioned this pull request Nov 7, 2023

feat: code for whisper-large-v3 SYSTRAN/faster-whisper#548

Closed

funboarder13920 force-pushed the n_mels_param branch from dabcfa0 to 3836555 Compare November 7, 2023 15:53

fix guessing is_multilingual for large-v3

bd3e7df

funboarder13920 pushed a commit to funboarder13920/CTranslate2 that referenced this pull request Nov 8, 2023

alignement heads for large-v3

2ffb482

see OpenNMT#1530 (comment)

funboarder13920 changed the title ~~expose n_mels~~ whisper-large-v3 compatibility Nov 8, 2023

funboarder13920 pushed a commit to funboarder13920/CTranslate2 that referenced this pull request Nov 8, 2023

alignement heads for large-v3

17b96e4

see OpenNMT#1530 (comment)

funboarder13920 force-pushed the n_mels_param branch from 2ffb482 to 17b96e4 Compare November 8, 2023 09:41

alignement heads for large-v3

7615e41

see OpenNMT#1530 (comment)

funboarder13920 force-pushed the n_mels_param branch from 17b96e4 to 7615e41 Compare November 8, 2023 09:41

fireattack mentioned this pull request Nov 8, 2023

Will it be possible to use the large-v3 model? SYSTRAN/faster-whisper#544

Closed

add num_languages property to whisper models

f43afed

ostegm reviewed Nov 8, 2023

View reviewed changes

update comment documentation

161894c

vince62s merged commit 23f744f into OpenNMT:master Nov 8, 2023
17 checks passed

ostegm mentioned this pull request Nov 9, 2023

Whisper large-v3 SYSTRAN/faster-whisper#549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper-large-v3 compatibility #1530

whisper-large-v3 compatibility #1530

funboarder13920 commented Nov 7, 2023 •

edited

Loading

chiiyeh commented Nov 8, 2023

RafaRed commented Nov 8, 2023 •

edited

Loading

funboarder13920 commented Nov 8, 2023 •

edited

Loading

Purfview commented Nov 8, 2023

ostegm Nov 8, 2023

whisper-large-v3 compatibility #1530

whisper-large-v3 compatibility #1530

Conversation

funboarder13920 commented Nov 7, 2023 • edited Loading

chiiyeh commented Nov 8, 2023

RafaRed commented Nov 8, 2023 • edited Loading

funboarder13920 commented Nov 8, 2023 • edited Loading

Purfview commented Nov 8, 2023

ostegm Nov 8, 2023

Choose a reason for hiding this comment

funboarder13920 commented Nov 7, 2023 •

edited

Loading

RafaRed commented Nov 8, 2023 •

edited

Loading

funboarder13920 commented Nov 8, 2023 •

edited

Loading