Skip to content

Releases: Sharrnah/whispering

v1.0.5.1

01 Nov 00:56
00f4d50
Compare
Choose a tag to compare

Standalone Release File (2.30 GB):
Download Server:

Changelog:

  • [FEATURE] Added OCR to recognize and translate text written in games. (Still a bit hard/annoying to use. I hope to improve on that later.)
  • [FEATURE] Added Audio Loopback support. (Should in theory be easier to capture game audio. But wasn't successful myself with it yet.)
  • [FEATURE] Allow to define the speaker language, so the AI does not need to guess the language. Should improve recognition quality.
  • [FEATURE] Added M2M100 text translation AI. (Only needs a single model file and supports more languages then ARGOS. Both are still available)
  • [BUGFIX] Added missing OCR dependency in Standalone Release.

OCR Usage:

  • Select a window title either with the --ocr_window_name start argument
    or inside the websocket remote client websocket_clients)/websocket-remote/index.html.
  • Select OCR Language in the remote client.
  • Click on OCR transl..
    If the OCR AI model is not already downloaded, it will first download it (might take a bit).
    It then tries to focus the window with the title and take a screenshot,
    After that, its send to the OCR Model and the result is send back to the Remote Client, including the text translation of the selected Target Language.

v1.0.5

31 Oct 23:11
837e611
Compare
Choose a tag to compare

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.5_win.zip (2.29 GB)

Release File had a missing dependency for OCR to work. Fixed in v1.0.5.1

Changelog:

  • [FEATURE] Added OCR to recognize and translate text written in games. (Still a bit hard/annoying to use. I hope to improve on that later.)
  • [FEATURE] Added Audio Loopback support. (Should in theory be easier to capture game audio. But wasn't successful myself with it yet.)
  • [FEATURE] Allow to define the speaker language, so the AI does not need to guess the language. Should improve recognition quality.
  • [FEATURE] Added M2M100 text translation AI. (Only needs a single model file and supports more languages then ARGOS. Both are still available)

OCR Usage:

  • Select a window title either with the --ocr_window_name start argument
    or inside the websocket remote client websocket_clients)/websocket-remote/index.html.
  • Select OCR Language in the remote client.
  • Click on OCR transl..
    If the OCR AI model is not already downloaded, it will first download it (might take a bit).
    It then tries to focus the window with the title and take a screenshot,
    After that, its send to the OCR Model and the result is send back to the Remote Client, including the text translation of the selected Target Language.

v1.0.4

26 Oct 17:09
c39627c
Compare
Choose a tag to compare

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.4_win.zip (2.23 GB)

Changelog:

  • [TASK] Changed default recording sample rate to 16000, since the Whisper AI down-sampled it anyway.
  • [TASK] Added audio conversion using pydub (should remove ffmpeg dependency and allows audio processing in RAM)
  • [FEATURE] Added Threaded queue handling for Whisper AI. - This should speed up processing and remove delayed audio recordings.
  • [FEATURE] Added swap textual translation languages to websocket client.
  • [FEATURE] Made "condition on previous text" configurable without needing restart.

v1.0.3

23 Oct 16:48
71a1f0b
Compare
Choose a tag to compare

Standalone Release File:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2Fwhispering-tiger1.0.3_win.zip (2.21 GB)

Changelog:

  • [BUGFIX] Attention caching fix for Whisper AI Speed improvement (30% or even more on CPU).
  • [BUGFIX] open_browser argument with wrong path.
  • [FEATURE] Option to disable OSC ASCII conversion. (so it does not need a new release if VRC supports non-ASCII)
  • [FEATURE] Activate typing indicator on audio processing start + send processing start event over websocket.
  • [FEATURE] Show processing indicator on websocket clients.
  • [FEATURE] Broadcast setting changes to all websocket clients.
  • [FEATURE] Added show_transl_results argument to websocket clients to configure display of translations / transcriptions.

v1.0.2

21 Oct 10:27
5fe9a7a
Compare
Choose a tag to compare

v1.0.1

19 Oct 19:46
6f9db3a
Compare
Choose a tag to compare

v1.0.0

18 Oct 18:12
997f27b
Compare
Choose a tag to compare

Standalone Windows Version (Python + ffmpeg included)
Can be downloaded here:
https://eu2.contabostorage.com/bf1a89517e2643359087e5d8219c0c67:projects/whispering%2FWhispering_win32.zip (2.21 GB)

Only CUDA is recommended to install for GPU acceleration.

See included start-*.bat and get-device-list.bat for how to run it.
(same as mentioned in readme except python audioWhisper.py replaced with audioWhisper\audioWhisper.exe.)

do not run audioWhisper.exe directly, or it will create a new .cache directory and download the whisperAI model again.

websocket_remote/ and websocket_clients/* are included as well.

Read README.md for more infos.