Releases: Sharrnah/whispering
v1.0.5.1
Standalone Release File (2.30 GB):
Download Server:
Changelog:
- [FEATURE] Added OCR to recognize and translate text written in games. (Still a bit hard/annoying to use. I hope to improve on that later.)
- [FEATURE] Added Audio Loopback support. (Should in theory be easier to capture game audio. But wasn't successful myself with it yet.)
- [FEATURE] Allow to define the speaker language, so the AI does not need to guess the language. Should improve recognition quality.
- [FEATURE] Added M2M100 text translation AI. (Only needs a single model file and supports more languages then ARGOS. Both are still available)
- [BUGFIX] Added missing OCR dependency in Standalone Release.
OCR Usage:
- Select a window title either with the
--ocr_window_name
start argument
or inside the websocket remote clientwebsocket_clients)/websocket-remote/index.html
. - Select OCR Language in the remote client.
- Click on
OCR transl.
.
If the OCR AI model is not already downloaded, it will first download it (might take a bit).
It then tries to focus the window with the title and take a screenshot,
After that, its send to the OCR Model and the result is send back to the Remote Client, including the text translation of the selected Target Language.
v1.0.5
Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.5_win.zip (2.29 GB)
Release File had a missing dependency for OCR to work. Fixed in v1.0.5.1
Changelog:
- [FEATURE] Added OCR to recognize and translate text written in games. (Still a bit hard/annoying to use. I hope to improve on that later.)
- [FEATURE] Added Audio Loopback support. (Should in theory be easier to capture game audio. But wasn't successful myself with it yet.)
- [FEATURE] Allow to define the speaker language, so the AI does not need to guess the language. Should improve recognition quality.
- [FEATURE] Added M2M100 text translation AI. (Only needs a single model file and supports more languages then ARGOS. Both are still available)
OCR Usage:
- Select a window title either with the
--ocr_window_name
start argument
or inside the websocket remote clientwebsocket_clients)/websocket-remote/index.html
. - Select OCR Language in the remote client.
- Click on
OCR transl.
.
If the OCR AI model is not already downloaded, it will first download it (might take a bit).
It then tries to focus the window with the title and take a screenshot,
After that, its send to the OCR Model and the result is send back to the Remote Client, including the text translation of the selected Target Language.
v1.0.4
Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.4_win.zip (2.23 GB)
Changelog:
- [TASK] Changed default recording sample rate to 16000, since the Whisper AI down-sampled it anyway.
- [TASK] Added audio conversion using pydub (should remove ffmpeg dependency and allows audio processing in RAM)
- [FEATURE] Added Threaded queue handling for Whisper AI. - This should speed up processing and remove delayed audio recordings.
- [FEATURE] Added swap textual translation languages to websocket client.
- [FEATURE] Made "condition on previous text" configurable without needing restart.
v1.0.3
Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.3_win.zip (2.21 GB)
Changelog:
- [BUGFIX] Attention caching fix for Whisper AI Speed improvement (30% or even more on CPU).
- [BUGFIX] open_browser argument with wrong path.
- [FEATURE] Option to disable OSC ASCII conversion. (so it does not need a new release if VRC supports non-ASCII)
- [FEATURE] Activate typing indicator on audio processing start + send processing start event over websocket.
- [FEATURE] Show processing indicator on websocket clients.
- [FEATURE] Broadcast setting changes to all websocket clients.
- [FEATURE] Added show_transl_results argument to websocket clients to configure display of translations / transcriptions.
v1.0.2
Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.2_win.zip (2.20 GB)
v1.0.1
Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.1_win.zip (2.20 GB)
v1.0.0
Standalone Windows Version (Python + ffmpeg included)
Can be downloaded here:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2FWhispering_win32.zip (2.21 GB)
Only CUDA is recommended to install for GPU acceleration.
See included start-*.bat
and get-device-list.bat
for how to run it.
(same as mentioned in readme except python audioWhisper.py
replaced with audioWhisper\audioWhisper.exe
.)
do not run audioWhisper.exe directly, or it will create a new .cache directory and download the whisperAI model again.
websocket_remote/
and websocket_clients/*
are included as well.