Extract hardcoded (burned-in) subtitles from videos using the PaddleOCR OCR engine with Python. A Colab notebook for installing and running this library is included for convenience:
# example.py
from videocr import save_subtitles_to_file
if __name__ == '__main__':
save_subtitles_to_file('example_cropped.mp4', 'example.srt', lang='ch', time_start='7:10', time_end='7:34',
sim_threshold=80, conf_threshold=75, use_fullframe=True,
brightness_threshold=210, similar_image_threshold=1000, frames_to_skip=1)
$ python3 example.py
example.srt:
0
00:07:10,000 --> 00:07:10,083
商城......现在没什么东西
1
00:07:10,416 --> 00:07:12,000
这边是战斗辅助系统
2
00:07:13,083 --> 00:07:14,500
要进去才能了解了
3
00:07:15,083 --> 00:07:15,916
没问题了吧
4
00:07:16,333 --> 00:07:17,166
我们准备登录
5
00:07:18,416 --> 00:07:21,083
啊对了, 登录没有服务器的选择么
6
00:07:21,333 --> 00:07:25,000
没有本游戏所有玩家, 都在个服务器内
7
00:07:25,833 --> 00:07:28,833
刺激了, 这么多玩家居然都不分流的么
8
00:07:29,500 --> 00:07:31,083
那......现在登录吗?
9
00:07:31,166 --> 00:07:32,416
好,登录吧!
Python 3.7 - 3.10
paddlepaddle or paddlepaddle-gpu See https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/en/install/pip/linux-pip_en.html
pip install git+https://github.com/oliverfei/videocr-PaddleOCR.git
Alternatively for development:
- Clone this repo
- From the root directory of this repository run
python -m pip install .
The OCR process can be very slow on CPU. Running with paddlepaddle-gpu
is recommended if you have a CUDA GPU.
To shorten the amount of time it takes to perform OCR on each frame, you can use the crop_x
, crop_y
, crop_width
, crop_height
params to crop out only the areas of the videos where the subtitles appear. When cropping, leave a bit of buffer space above and below the text to ensure accurate readings.
More Speed | More Accuracy | Notes | |
---|---|---|---|
Input Video Quality | Use lower quality | Use higher quality | Performance impact of using higher resolution video can be reduced with cropping |
frames_to_skip |
Higher number | Lower number | |
brightness_threshold |
Higher threshold | N/A | A brightness threshold can help speed up the OCR process by filtering out dark frames. In certain circumstances such as when subtitles are white and against a bright background, it may also help with accuracy. |
-
Return subtitle string in SRT format
get_subtitles( video_path: str, lang='ch', time_start='0:00', time_end='', conf_threshold=75, sim_threshold=80, use_fullframe=False, det_model_dir=None, rec_model_dir=None, use_gpu=False, brightness_threshold=None, similar_image_threshold=100, similar_pixel_threshold=25, frames_to_skip=1, crop_x=None, crop_y=None, crop_width=None, crop_height=None)
-
Write subtitles to
file_path
save_subtitles_to_file( video_path: str, file_path='subtitle.srt', lang='ch', time_start='0:00', time_end='', conf_threshold=75, sim_threshold=80, use_fullframe=False, det_model_dir=None, rec_model_dir=None, use_gpu=False, brightness_threshold=None, similar_image_threshold=100, similar_pixel_threshold=25, frames_to_skip=1, crop_x=None, crop_y=None, crop_width=None, crop_height=None)
-
lang
The language of the subtitles. See PaddleOCR docs for list of supported languages and their abbreviations
-
conf_threshold
Confidence threshold for word predictions. Words with lower confidence than this value will be discarded. The default value
75
is fine for most cases.Make it closer to 0 if you get too few words in each line, or make it closer to 100 if there are too many excess words in each line.
-
sim_threshold
Similarity threshold for subtitle lines. Subtitle lines with larger Levenshtein ratios than this threshold will be merged together. The default value
80
is fine for most cases.Make it closer to 0 if you get too many duplicated subtitle lines, or make it closer to 100 if you get too few subtitle lines.
-
time_start
andtime_end
Extract subtitles from only a clip of the video. The subtitle timestamps are still calculated according to the full video length.
-
use_fullframe
By default, the specified cropped area is used for OCR or if a crop is not specified, then the bottom third of the frame will be used. By setting this value to
True
the entire frame will be used. -
crop_x
,crop_y
,crop_width
,crop_height
Specifies the bounding area in pixels for the portion of the frame that will be used for OCR. See image below for example:
-
det_model_dir
the text detection inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to ~/.paddleocr/det; 2. The path of a specific inference model, the model and params files must be included in the model path.
See PaddleOCR repo for list of prebuilt models: https://github.com/PaddlePaddle/PaddleOCR/.
-
rec_model_dir
the text recognition inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to ~/.paddleocr/rec; 2. The path of a specific inference model, the model and params files must be included in the model path.
See PaddleOCR repo for list of prebuilt models: https://github.com/PaddlePaddle/PaddleOCR/.
-
use_gpu
Set to
True
if performing ocr with gpu (requires thepaddlepaddle-gpu
python package to be installed) -
brightness_threshold
If set, pixels whose brightness are less than the threshold will be blackened out. Valid brightness values range from 0 (black) to 255 (white). This can help improve accuracy when performing OCR on videos with white subtitles.
-
similar_image_threshold
The number of non-similar pixels there can be before the program considers 2 consecutive frames to be different. If a frame is not different from the previous frame, then the OCR result from the previous frame will be used (which can save a lot of time depending on how fast each OCR inference takes).
-
similar_pixel_threshold
Brightness threshold from 0-255 used with the
similar_image_threshold
to determine if 2 consecutive frames are different. If the difference between 2 pixels exceeds the threshold, then they will be considered non-similar. -
frames_to_skip
The number of frames to skip before sampling a frame for OCR. Keep in mind the fps of the input video before increasing.
- parallel processing
- handle multiple lines of text in the same frame
- publish to pypi
- commandline interface
- user-friendly application for non-devs