Skip to content

Commit

Permalink
Update OCR Plugin features and whitelist characters (#20)
Browse files Browse the repository at this point in the history
  • Loading branch information
royshil authored Apr 11, 2024
1 parent af1e5e4 commit ab93e5d
Show file tree
Hide file tree
Showing 14 changed files with 28 additions and 7 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ OCR Plugin enables many use cases for enhancing your stream or recording:
### Features
Available now:
- Add OCR Filter to any source with image or video output
- Choose from English model or Scoreboard model
- Choose from Scoreboard model or English, French, Spanish, German, Chinese, Japanese, Arabic, Turkish, Portugese, Hindi, Russian and Italian
- Output OCR result to an OBS Text Source
- Choose the segmentation mode: Word, Line, Page, etc.
- "Semantic Smoothing": getting more consistent outputs with higher accuracy and confidence by "averaging" several text outputs
Expand All @@ -71,10 +71,11 @@ Coming soon:

Check out our other plugins:
- [Background Removal](https://github.com/occ-ai/obs-backgroundremoval) removes background from webcam without a green screen.
- 🚧 Experimental 🚧 [CleanStream](https://github.com/occ-ai/obs-cleanstream) for real-time filler word (uh,um) and profanity removal from live audio stream
- [Detect](https://github.com/occ-ai/obs-detect) will detect and track >80 types of objects in any OBS source.
- [LocalVocal](https://github.com/occ-ai/obs-localvocal) speech AI assistant plugin for real-time, local transcription (captions), translation and more language functions
- [Polyglot](https://github.com/occ-ai/obs-polyglot) translation AI plugin for real-time, local translation to hunderds of languages
- [URL/API Source](https://github.com/occ-ai/obs-urlsource) will connect to any URL/API HTTP and get the data/image/audio to your scene.
- 🚧 Experimental 🚧 [CleanStream](https://github.com/occ-ai/obs-cleanstream) for real-time filler word (uh,um) and profanity removal from live audio stream

If you like this work, which is given to you completely free of charge, please consider supporting it https://github.com/sponsors/royshil or https://www.patreon.com/RoyShilkrot

Expand Down
Binary file added data/tessdata/ara.traineddata
Binary file not shown.
Binary file added data/tessdata/chi_sim.traineddata
Binary file not shown.
Binary file added data/tessdata/deu.traineddata
Binary file not shown.
Binary file added data/tessdata/fra.traineddata
Binary file not shown.
Binary file added data/tessdata/hin.traineddata
Binary file not shown.
Binary file added data/tessdata/ita.traineddata
Binary file not shown.
Binary file added data/tessdata/jpn.traineddata
Binary file not shown.
Binary file added data/tessdata/por.traineddata
Binary file not shown.
Binary file added data/tessdata/rus.traineddata
Binary file not shown.
Binary file modified data/tessdata/scoreboard.traineddata
Binary file not shown.
Binary file added data/tessdata/spa.traineddata
Binary file not shown.
21 changes: 21 additions & 0 deletions src/consts.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,25 @@ const char *const PLUGIN_INFO_TEMPLATE =
"<a href=\"https://github.com/occ-ai\">OCC AI</a> ❤️ "
"<a href=\"https://www.patreon.com/RoyShilkrot\">Support & Follow</a>";

const char *const WHITELIST_CHARS_ENGLISH =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()_+-=[]{}|;':\",./<>?`~\\ ";
// add french characters with accents
const char *const WHITELIST_CHARS_FRENCH =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()_+-=[]{}|;':\",./<>?`~\\éèêàâùûç ";
// add german characters with umlauts
const char *const WHITELIST_CHARS_GERMAN =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()_+-=[]{}|;':\",./<>?`~\\äöüß ";
// add spanish characters with accents
const char *const WHITELIST_CHARS_SPANISH =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()_+-=[]{}|;':\",./<>?`~\\áéíóúüñ ";
// add italian characters with accents
const char *const WHITELIST_CHARS_ITALIAN =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()_+-=[]{}|;':\",./<>?`~\\àèéìòù ";
// add portuguese characters with accents
const char *const WHITELIST_CHARS_PORTUGUESE =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()_+-=[]{}|;':\",./<>?`~\\áàãâéêíóôõúüç ";
// add russian characters
const char *const WHITELIST_CHARS_RUSSIAN =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()_+-=[]{}|;':\",./<>?`~\\абвгдеёжзийклмнопрстуфхцчшщъыьэюя ";

#endif /* CONSTS_H */
9 changes: 4 additions & 5 deletions src/ocr-filter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -83,11 +83,11 @@ obs_properties_t *ocr_filter_properties(void *data)

// scan the tessdata folder for files using std::filesystem
std::string tessdata_folder = obs_module_file("tessdata");
obs_log(LOG_INFO, "Scanning tessdata folder: %s", tessdata_folder.c_str());
obs_log(LOG_DEBUG, "Scanning tessdata folder: %s", tessdata_folder.c_str());
for (const auto &entry : std::filesystem::directory_iterator(tessdata_folder)) {
std::string filename = entry.path().filename().string();
if (filename.find(".traineddata") != std::string::npos) {
obs_log(LOG_INFO, "Found traineddata file: %s", filename.c_str());
obs_log(LOG_DEBUG, "Found traineddata file: %s", filename.c_str());
std::string language = filename.substr(0, filename.find(".traineddata"));
obs_property_list_add_string(lang_list, language.c_str(), language.c_str());
}
Expand Down Expand Up @@ -295,9 +295,8 @@ void ocr_filter_defaults(obs_data_t *settings)
obs_data_set_default_int(settings, "rescale_target_size", 35);
obs_data_set_default_string(settings, "text_sources", "none");
obs_data_set_default_string(settings, "text_detection_mask_sources", "none");
obs_data_set_default_string(
settings, "char_whitelist",
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.,:;!?-()[]{}<>|/@#$%&*+=_~ ");
obs_data_set_default_string(settings, "char_whitelist",
WHITELIST_CHARS_ENGLISH); // default to english characters
obs_data_set_default_int(settings, "conf_threshold", 50);
obs_data_set_default_bool(settings, "enable_smoothing", false);
obs_data_set_default_int(settings, "word_length", 5);
Expand Down

0 comments on commit ab93e5d

Please sign in to comment.