Skip to content

Releases: Sharrnah/whispering

v1.3.4.1

06 Apr 22:53
Compare
Choose a tag to compare

Standalone Release File (2.55 GB):

Download Server:

Changelog (v1.3.4.1)

  • [FEATURE] Send errors on stdErr in json format for the UI to parse
  • [FEATURE] Added download function that works in threads
  • [TASK] Added download arguments to the plugin tts method (Plugin incompatible change!)

Full Changelog: v1.3.4.0...v1.3.4.1

v1.3.4.0

01 Apr 16:33
Compare
Choose a tag to compare

Standalone Release File (2.55 GB):

Download Server:

Changelog (v1.3.4.0)

  • [FEATURE] Add support to show from/to language at translation in osc prefix
  • [BUGFIX] Update CTRanslate2 + faster-whisper to latest version to fix #10
  • [BUGFIX] Use sentencepiece also as tokenizer for better multi sentence translation
  • [BUGFIX] error for plugin if on_enable is defined but not on_disable
  • [TASK] Renamed txt_ascii to txt_romaji to reflect better what it does
  • [TASK] Added realtime_temperature_fallback option.
  • [TASK] separate ignorelist into external ignorelist.txt
  • [TASK] Added Volume direction OSC plugin to readme
  • [TASK] Add new command control VRChat Parameters Plugin to Readme

Full Changelog: v1.3.3.2...v1.3.4.0

v1.3.3.2

25 Mar 22:40
Compare
Choose a tag to compare

Standalone Release File (2.55 GB):

Download Server:

Changelog (v1.3.3.2)

  • [FEATURE] Added quit signal to quit running backend
  • [TASK] Improved NLLB-200 Ctranslate based text-translator on multiple sentences in one text.
  • [TASK] Updated CTranslate2
  • [BUGFIX] Make sure audio frames are cleared on new start of recording
  • [BUGFIX] Fix case when FindWindow windows API call returns wrong window handle for window title. (should fix OCR in these cases not working without error)

Full Changelog: v1.3.3.1...v1.3.3.2

v1.3.3.1

24 Mar 00:35
Compare
Choose a tag to compare

Standalone Release File (2.55 GB):

Download Server:

Changelog (v1.3.3.1)

  • [FEATURE] Added NLLB-200 using CTranslate2. (same as faster-whisper)
  • [FEATURE] Added new streaming overlay (completely rewritten and looks more like traditional subtitles)
  • [FEATURE] Added option to set OSC Chat Prefix
  • [FEATURE] Added mirror setting to html websocket overlays
  • [TASK] Added option to set NLLB-200 precision.
  • [TASK] Show typing indicator when starting speaking even without realtime mode
  • [TASK] Updated some dependencies
  • [TASK] Allow ctranslate to use float16 even on non-efficient FP16 devices
  • [BUGFIX] invalidating TTS data when using SSML break tag.

Full Changelog: v1.3.3.0...v1.3.3.1

v1.3.3.0

18 Mar 20:36
Compare
Choose a tag to compare

Standalone Release File (2.55 GB):

Download Server:

Changelog (v1.3.3.0)

  • [FEATURE] Added realtime transcription feature (only available when using VAD)
    • [FEATURE] Added optional seperate realtime Whisper model. (allows using a smaller+faster model for realtime transcriptions. Only the final full-clip transcription uses the regular selected whisper model.)
    • [FEATURE] Updated websocket clients to show realtime transcriptions
  • [TASK] Make phrase_time_limit, pause and energy values configurable at runtime
  • [TASK] Remove LLM/FLAN-T5 Large Language Model functions from main code and split it into a seperate plugin. (see https://gist.github.com/Sharrnah/eeaf2acda3e92d8eed1747f05a3f4102 )
  • [FEATURE] Added optional on_enable, on_disable methods for plugins
  • [TASK] cleaned up some ARGOS translation remains in code.
  • [TASK] set faster-whisper as default.
  • [BUGFIX] reactivate channel downsampling to improve detection when more than 1 channel is send

Full Changelog: v1.3.2.2...v1.3.3.0

v1.3.2.2

13 Mar 14:03
Compare
Choose a tag to compare

Standalone Release File (2.55 GB):

Download Server:

Changelog (v1.3.2.2)

  • [TASK] Added Option to set more precision types (for faster-whisper). int8_float16 should improve memory footprint and speed even more without sacrificing much of the precision
  • [TASK] Added beam_size option to increase speed even more while sacrificing quality. Default is 5. a beam_size of 2 or 1 can make it really fast.
  • [TASK] Added cpu_threads and num_workers options. num_workers is not really used yet, but cpu_threads can improve performance when running on CPU if the CPU has enough cores/threads.

Full Changelog: v1.3.2.1...v1.3.2.2

v1.3.2.1 (hotfix)

12 Mar 01:46
Compare
Choose a tag to compare

Standalone Release File (2.55 GB):

Download Server:

Changelog (hotfix):

  • [BUGFIX] Downloading of faster-whisper models

Changelog (v1.3.2.0)

  • [FEATURE] Added faster-whisper (smaller memory footprint + can be about 3x faster)
  • [BUGFIX] Updated whisper with bugfix of repeating sentences
  • [TASK] Improved on the Plugin system
  • [TASK] removed ARGOS translate because of incompatibility with faster-whisper
  • [BUGFIX] lock scikit-image to version 0.19.3 because of build bug in 0.20.0

Full Changelog: v1.3.1.0...v1.3.2.1

v1.3.2.0

11 Mar 21:04
Compare
Choose a tag to compare

Standalone Release File (2.55 GB):

Download Server:

Release File failed at downloading faster-whisper models. See hotfix

Changelog:

  • [FEATURE] Added faster-whisper (smaller memory footprint + can be about 3x faster)
  • [BUGFIX] Updated whisper with bugfix of repeating sentences
  • [TASK] Improved on the Plugin system
  • [TASK] removed ARGOS translate because of incompatibility with faster-whisper
  • [BUGFIX] lock scikit-image to version 0.19.3 because of build bug in 0.20.0

Full Changelog: v1.3.1.0...v1.3.2.0

v1.3.1.0

06 Mar 18:02
Compare
Choose a tag to compare

Standalone Release File (2.55 GB):

Download Server:

Changelog:

  • [FEATURE] Added Option for additional VAD Check on full Clip in addition to each frame
  • [TASK] Reduced default VAD confidence threshold to 0.4
  • [TASK] Expose FP16 Option for Whisper Model
  • [TASK] Skip NLLB-200 translation if source and target language are the same
  • [FEATURE] Simple Plugin System added.
  • [FEATURE] Proof of concept for additional LLM models.

Full Changelog: v1.3.0.1...v1.3.1.0

v1.3.0.1

09 Jan 21:10
Compare
Choose a tag to compare

Standalone Release File (2.55 GB):

Download Server:

Changelog:

  • [FEATURE] Added VAD (Voice Activity Detection)
  • [BUGFIX] Fix error on logging special language characters
  • [TASK] Reverted back logprob_threshold and no_speech_threshold to old defaults because of less consistent recognition with new values

Full Changelog: v1.2.0.6...v1.3.0.1