Can whisper-tiny speech-to-text translate to English as well as transcribe foreign language? #87

patrickinminneapolis · 2023-04-14T16:23:14Z

I know there is a separate translation engine (t5-small), but I'm wondering if speech-to-text with whisper-tiny (not whisper-tiny.en) can return English translation alongside the foreign-language transcription? -- I read Whisper.ai can do this. It seems like it would just be a parameter, but I don't know where to look.

xenova · 2023-04-14T16:37:00Z

Hi 👋 I haven't added that yet, but it should be quite simple. Seems like all I need to do is set a prefix token. Something like:

It doesn't look like their pipeline function supports this (unless I'm wrong). Perhaps there's a way to pass tokenizer keyword arguments to the pipeline. I'll check it out. 👍

kungfooman · 2023-05-12T15:59:24Z

It already works, but there is one issue left for me right now: #107

Then you can use this code:

env.localModelPath = "http://127.0.0.1/transformer/models/";
const pipe = await pipeline("automatic-speech-recognition", "whisper-tiny");
const audioCTX = new AudioContext({
  sampleRate: 16000
});
const arrayBuffer = await (await fetch(SPEECH2TEXT_AUDIO.currentSrc)).arrayBuffer();
const decoded = await audioCTX.decodeAudioData(arrayBuffer);
const audio = decoded.getChannelData(0);
const result = await pipe(audio, {
  return_timestamps: true,
  //chunk_length_s: 30,
  chunk_callback: (obj) => {
      const decodedTokens = pipe.tokenizer.decode(obj.tokens);
      console.log("progress tokens:", decodedTokens);
  }
});
console.log("result", result);

The output will look somewhat like:

The interesting part is basically just the [1] index: pipe.tokenizer.decode([50288]);

(no meaning Norwegian)

EDIT: I realized this is about actually translating lol

…95)

xenova · 2023-06-05T00:28:04Z

Opened PR for this 👍 Expect it to be merged soon.

…95) (#133) * Align `.generate()` return type with python library * Add multilingual transcription + translation for whisper models (#87, #95) * Include `return_timestamps` in calculation of `forced_decoder_ids` * Only return non-null `forced_decoder_ids` * Allow user to specify task in any case * Only set `forced_decoder_ids` when non-empty * Implement `SuppressTokensAtBeginLogitsProcessor`

xenova · 2023-06-23T19:07:31Z

This was added in v2.2.0 🎉 Check the release notes (https://github.com/xenova/transformers.js/releases/tag/2.2.0) for example code.

patrickinminneapolis added the question Further information is requested label Apr 14, 2023

xenova added the enhancement New feature or request label Apr 14, 2023

xenova mentioned this issue Apr 24, 2023

[Feature request] Whisper with language support #95

Closed

xenova added a commit that referenced this issue Jun 5, 2023

Add multilingual transcription + translation for whisper models (#87, #…

29bbfe5

…95)

xenova closed this as completed Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can whisper-tiny speech-to-text translate to English as well as transcribe foreign language? #87

Can whisper-tiny speech-to-text translate to English as well as transcribe foreign language? #87

patrickinminneapolis commented Apr 14, 2023

xenova commented Apr 14, 2023 •

edited

Loading

kungfooman commented May 12, 2023 •

edited

Loading

xenova commented Jun 5, 2023 •

edited

Loading

xenova commented Jun 23, 2023

Can whisper-tiny speech-to-text translate to English as well as transcribe foreign language? #87

Can whisper-tiny speech-to-text translate to English as well as transcribe foreign language? #87

Comments

patrickinminneapolis commented Apr 14, 2023

xenova commented Apr 14, 2023 • edited Loading

kungfooman commented May 12, 2023 • edited Loading

xenova commented Jun 5, 2023 • edited Loading

xenova commented Jun 23, 2023

xenova commented Apr 14, 2023 •

edited

Loading

kungfooman commented May 12, 2023 •

edited

Loading

xenova commented Jun 5, 2023 •

edited

Loading