WhisperTranslator is an application based on N46Whisper, aimed at improving the efficiency of transcription, translation, and summarization for various foreign language videos.
This application utilizes the optimized deployment of the AI speech recognition model Whisper, known as faster-whisper.
The output files are in ass or srt format, preformatted for a specific subtitle group, and can be directly imported into Aegisub for further translation and timing adjustments. You have the option to enable full text extraction and summarization.
-
Converting videos and audio into corresponding language text.
-
Translating transcribed text into any language (using local large models).
-
Outputting full text and timeline captions after transcription.
-
Summarizing the full content of videos (using local large models).
-
2024.2.24:
- 🤗Added support for local large models; now using InternLM2 7B to automatically translate ass and srt, entire texts, and summarize them. All operations can be run with just a 12GB GPU by executing
WhisperTranslator_local.py
.
- 🤗Added support for local large models; now using InternLM2 7B to automatically translate ass and srt, entire texts, and summarize them. All operations can be run with just a 12GB GPU by executing
-
2024.2.20:
- Initial release, providing transcription and segmented article output.
- If running locally, execute
pip install -r requirements.txt
to install dependencies. To run local large models for translation and summarization, you need to install additional dependencies withpip install -r requirements_localllm.txt
.
-
Transcription alone requires only 6GB of VRAM; for full functionality, which includes translation and summarization, you'll need a 12GB VRAM GPU (ampere architecture, e.g., similar to a 3060 series). Modify the configuration file local_whisper_config.toml, then simply run WhisperTranslator_local.py using Python.
-
After completion, you'll get: 1. Subtitles and full text, 2. Translated subtitles and full text, and 3. A full-text summary, which is currently embedded within the translated full-text file by default.
-
You can choose to enable or disable translation and summarization features according to your needs; refer to the configuration for details.
- Click here to open the application in Google Colab.
- Upload the file to transcribe and run the application.
- The ass file will automatically download after successful transcription.
-
If you choose to use AI tools (AI translation, AI summarization) that rely on online APIs, you need to write API tokens into the
.env
file in the current folder using the following variable names:OPENAI_API_KEY= OPENAI_API_BASE= ZHIPUAI_API_KEY= BAIDU_API_KEY=
-
If you're using local AI tools, simply wait for the models to download and then run the application.
- This project provides standalone translation and summarization capabilities without requiring transcription. Simply modify the input and output file addresses in
summay_everything.py
and run it.
The application can now perform line-by-line translation of transcribed texts using AI translation tools.
Users can also upload individual srt or ass files to use the translation module.
Currently supports translation with InternLM2.
Translated texts are merged with the original on the same line separated by /N
, creating bilingual subtitles.
Example images:
Users require their OpenAI API Key to use the translation feature. To generate a free Key, visit your account settings at https://platform.openai.com/account/api-keys.
When there are multiple sentences in one line, users can choose to split them into separate lines by spaces. The temporary timestamps for these new lines are the same as the original line, marked with 'adjust_required' to indicate the need for adjusting timestamps to avoid overlapping.
Regular splitting occurs only when single-word or single-sentence characters exceed 5 in length: Before Splitting:
After Splitting:
As seen, particularly in line 7, short phrases and interjections are preserved while long sentences are split. The character length limit of 5 is default, generally filtering out most short phrases and interjections in Japanese.
Comprehensive splitting creates a new line for every space, resulting in:
In both cases, English single words are not split.