Releases · Sharrnah/whispering

01 Nov 00:56

Sharrnah

v1.0.5.1

00f4d50

v1.0.5.1

Standalone Release File (2.30 GB):
Download Server:

Changelog:

[FEATURE] Added OCR to recognize and translate text written in games. (Still a bit hard/annoying to use. I hope to improve on that later.)
[FEATURE] Added Audio Loopback support. (Should in theory be easier to capture game audio. But wasn't successful myself with it yet.)
[FEATURE] Allow to define the speaker language, so the AI does not need to guess the language. Should improve recognition quality.
[FEATURE] Added M2M100 text translation AI. (Only needs a single model file and supports more languages then ARGOS. Both are still available)
[BUGFIX] Added missing OCR dependency in Standalone Release.

OCR Usage:

Select a window title either with the --ocr_window_name start argument
or inside the websocket remote client websocket_clients)/websocket-remote/index.html.
Select OCR Language in the remote client.
Click on OCR transl..
If the OCR AI model is not already downloaded, it will first download it (might take a bit).
It then tries to focus the window with the title and take a screenshot,
After that, its send to the OCR Model and the result is send back to the Remote Client, including the text translation of the selected Target Language.

Assets 2

31 Oct 23:11

Sharrnah

v1.0.5

837e611

v1.0.5

Standalone Release File:
~~https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.5_win.zip (2.29 GB)~~

Release File had a missing dependency for OCR to work. Fixed in v1.0.5.1

Changelog:

[FEATURE] Added OCR to recognize and translate text written in games. (Still a bit hard/annoying to use. I hope to improve on that later.)
[FEATURE] Added Audio Loopback support. (Should in theory be easier to capture game audio. But wasn't successful myself with it yet.)
[FEATURE] Allow to define the speaker language, so the AI does not need to guess the language. Should improve recognition quality.
[FEATURE] Added M2M100 text translation AI. (Only needs a single model file and supports more languages then ARGOS. Both are still available)

OCR Usage:

Select a window title either with the --ocr_window_name start argument
or inside the websocket remote client websocket_clients)/websocket-remote/index.html.
Select OCR Language in the remote client.
Click on OCR transl..
If the OCR AI model is not already downloaded, it will first download it (might take a bit).
It then tries to focus the window with the title and take a screenshot,
After that, its send to the OCR Model and the result is send back to the Remote Client, including the text translation of the selected Target Language.

Assets 2

26 Oct 17:09

Sharrnah

v1.0.4

c39627c

v1.0.4

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.4_win.zip (2.23 GB)

Changelog:

[TASK] Changed default recording sample rate to 16000, since the Whisper AI down-sampled it anyway.
[TASK] Added audio conversion using pydub (should remove ffmpeg dependency and allows audio processing in RAM)
[FEATURE] Added Threaded queue handling for Whisper AI. - This should speed up processing and remove delayed audio recordings.
[FEATURE] Added swap textual translation languages to websocket client.
[FEATURE] Made "condition on previous text" configurable without needing restart.

Assets 2

23 Oct 16:48

Sharrnah

v1.0.3

71a1f0b

v1.0.3

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.3_win.zip (2.21 GB)

Changelog:

[BUGFIX] Attention caching fix for Whisper AI Speed improvement (30% or even more on CPU).
[BUGFIX] open_browser argument with wrong path.
[FEATURE] Option to disable OSC ASCII conversion. (so it does not need a new release if VRC supports non-ASCII)
[FEATURE] Activate typing indicator on audio processing start + send processing start event over websocket.
[FEATURE] Show processing indicator on websocket clients.
[FEATURE] Broadcast setting changes to all websocket clients.
[FEATURE] Added show_transl_results argument to websocket clients to configure display of translations / transcriptions.

Assets 2

21 Oct 10:27

Sharrnah

v1.0.2

5fe9a7a

v1.0.2

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.2_win.zip (2.20 GB)

Assets 2

19 Oct 19:46

Sharrnah

v1.0.1

6f9db3a

v1.0.1

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.1_win.zip (2.20 GB)

Assets 2

18 Oct 18:12

Sharrnah

v1.0.0

997f27b

v1.0.0

Standalone Windows Version (Python + ffmpeg included)
Can be downloaded here:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2FWhispering_win32.zip (2.21 GB)

Only CUDA is recommended to install for GPU acceleration.

See included start-*.bat and get-device-list.bat for how to run it.
(same as mentioned in readme except python audioWhisper.py replaced with audioWhisper\audioWhisper.exe.)

do not run audioWhisper.exe directly, or it will create a new .cache directory and download the whisperAI model again.

websocket_remote/ and websocket_clients/* are included as well.

Read README.md for more infos.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release File had a missing dependency for OCR to work. Fixed in v1.0.5.1

Releases: Sharrnah/whispering

v1.0.5.1

v1.0.5

Release File had a missing dependency for OCR to work. Fixed in v1.0.5.1

v1.0.4

v1.0.3

v1.0.2

v1.0.1

v1.0.0