Video Transcription

Video Transcription

This project extracts audio from video files ('.mp4', '.avi', '.mkv'), applies Voice Activity Detection (VAD) to filter out non-speech segments, and generates subtitles using the Whisper model.

We can clone the source code and run the tool from there. We can also use the pre-built docker image that is published here: https://hub.docker.com/r/hungdoan/video-transciption

Features

Extract audio from video files
Apply VAD to filter out non-speech segments
Transcribe audio to text using Whisper
Generate subtitles in SRT format

Requirements

Python 3.7+
Docker
Docker Compose
CUDA supported machine

Setup

Clone the repository:

git clone <repository-url>
cd <repository-directory>

Ensure Docker and Docker Compose are installed and running on your machine.

Available models

There are five model sizes, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.

Size	Multilingual model	Required VRAM	Relative speed
tiny	`tiny`	~1 GB	~32x
base	`base`	~1 GB	~16x
small	`small`	~2 GB	~6x
medium	`medium`	~5 GB	~2x
large	`large-v3`	~10 GB	1x

(Source: Whisper)

Usage

Option 1 - Run from source code

Place your video files in the input directory.
Run the script using Docker Compose:
```
docker compose -p "video-transcript" --env-file .env --env-file .env.base up --build --remove-orphans
```
All configurable variables are defined in the .env files, where:
- MODEL_NAME: model name that we wanna use, see Available models section for the list. The default is base
- DEVICE: It can be "cuda" or "cpu", the "cuda" is recommended if you have GPU(s) has CUDA cores. The default is cuda
The output SRT files will be saved in the output directory.

Example:

To transcribe a video file named example.mp4. In the directory that contain the source code.

Clone the source code
```
git clone <repository-url>
```
Change working directory to the source code
```
cd <repository-directory>
```

Place example.mp4 in the input directory.

cp example.mp4 <repository-directory>/input/

Run the script using Docker Compose:

docker compose -p "video-transcript" --env-file .env --env-file .env.base up --build --remove-orphans

The generated subtitle file example.srt will be saved in the [output] directory.

Option 2 - Run from a pre-built image

Execute this script in your powershell

$input_dir = "D:/input" 
$output_dir = "D:/output" 
$model_cache_dir = "D:/model_caches"
$model_name = "base" 
$device = "cuda" 
docker pull hungdoan/video-transciption:latest
docker run --gpus=all --rm -it --env MODEL_NAME=${model_name} --env DEVICE=${device} -v ${input_dir}:/input -v ${output_dir}:/output -v ${model_cache_dir}:/root/.cache/whisper hungdoan/video-transciption:latest

Where:

MODEL_NAME: model name that we wanna use, see Available models section for the list. The default is base
DEVICE: It can be "cuda" or "cpu", the "cuda" is recommended if you have GPU(s) has CUDA cores. The default is cuda
-v ${input_dir}:/input : we need to specify (mount) input folder so that the tool could scans input videos
-v ${output_dir}:/output: we need to specify (mount) output folder to save the file.
-v ${model_cache_dir}:/root/.cache/whisper: optionally, we can specify the model folder to keep downloaded model and reuse it.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

Whisper - A general-purpose speech recognition model by OpenAI
WebRTC VAD - Python interface to the WebRTC Voice Activity Detector
PyAV - Pythonic bindings for FFmpeg's libraries

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.env		.env
.env.base		.env.base
.env.largev3		.env.largev3
.env.medium		.env.medium
.env.small		.env.small
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
download-models.ps1		download-models.ps1
runbook.md		runbook.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Transcription

Features

Requirements

Setup

Available models

Usage

Option 1 - Run from source code

Option 2 - Run from a pre-built image

License

Acknowledgements

About

Releases

Packages

Languages

hung-doan/video-transcription

Folders and files

Latest commit

History

Repository files navigation

Video Transcription

Features

Requirements

Setup

Available models

Usage

Option 1 - Run from source code

Option 2 - Run from a pre-built image

License

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages