Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can whisper-tiny speech-to-text translate to English as well as transcribe foreign language? #87

Closed
patrickinminneapolis opened this issue Apr 14, 2023 · 4 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@patrickinminneapolis
Copy link

I know there is a separate translation engine (t5-small), but I'm wondering if speech-to-text with whisper-tiny (not whisper-tiny.en) can return English translation alongside the foreign-language transcription? -- I read Whisper.ai can do this. It seems like it would just be a parameter, but I don't know where to look.

@patrickinminneapolis patrickinminneapolis added the question Further information is requested label Apr 14, 2023
@xenova
Copy link
Collaborator

xenova commented Apr 14, 2023

Hi 👋 I haven't added that yet, but it should be quite simple. Seems like all I need to do is set a prefix token. Something like:

It doesn't look like their pipeline function supports this (unless I'm wrong). Perhaps there's a way to pass tokenizer keyword arguments to the pipeline. I'll check it out. 👍

@kungfooman
Copy link
Contributor

kungfooman commented May 12, 2023

It already works, but there is one issue left for me right now: #107

Then you can use this code:

env.localModelPath = "http://127.0.0.1/transformer/models/";
const pipe = await pipeline("automatic-speech-recognition", "whisper-tiny");
const audioCTX = new AudioContext({
  sampleRate: 16000
});
const arrayBuffer = await (await fetch(SPEECH2TEXT_AUDIO.currentSrc)).arrayBuffer();
const decoded = await audioCTX.decodeAudioData(arrayBuffer);
const audio = decoded.getChannelData(0);
const result = await pipe(audio, {
  return_timestamps: true,
  //chunk_length_s: 30,
  chunk_callback: (obj) => {
      const decodedTokens = pipe.tokenizer.decode(obj.tokens);
      console.log("progress tokens:", decodedTokens);
  }
});
console.log("result", result);

The output will look somewhat like:

image

The interesting part is basically just the [1] index: pipe.tokenizer.decode([50288]);

image

(no meaning Norwegian)

EDIT: I realized this is about actually translating lol

@xenova
Copy link
Collaborator

xenova commented Jun 5, 2023

Opened PR for this 👍 Expect it to be merged soon.

xenova added a commit that referenced this issue Jun 9, 2023
…95) (#133)

* Align `.generate()` return type with python library

* Add multilingual transcription + translation for whisper models (#87, #95)

* Include `return_timestamps` in calculation of `forced_decoder_ids`

* Only return non-null `forced_decoder_ids`

* Allow user to specify task in any case

* Only set `forced_decoder_ids` when non-empty

* Implement `SuppressTokensAtBeginLogitsProcessor`
@xenova
Copy link
Collaborator

xenova commented Jun 23, 2023

This was added in v2.2.0 🎉 Check the release notes (https://github.com/xenova/transformers.js/releases/tag/2.2.0) for example code.

@xenova xenova closed this as completed Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants