Releases: Sharrnah/whispering
Releases · Sharrnah/whispering
v1.3.4.1
Standalone Release File (2.55 GB):
Download Server:
Changelog (v1.3.4.1)
- [FEATURE] Send errors on stdErr in json format for the UI to parse
- [FEATURE] Added download function that works in threads
- [TASK] Added download arguments to the plugin tts method (Plugin incompatible change!)
Full Changelog: v1.3.4.0...v1.3.4.1
v1.3.4.0
Standalone Release File (2.55 GB):
Download Server:
Changelog (v1.3.4.0)
- [FEATURE] Add support to show from/to language at translation in osc prefix
- [BUGFIX] Update CTRanslate2 + faster-whisper to latest version to fix #10
- [BUGFIX] Use sentencepiece also as tokenizer for better multi sentence translation
- [BUGFIX] error for plugin if on_enable is defined but not on_disable
- [TASK] Renamed txt_ascii to txt_romaji to reflect better what it does
- [TASK] Added realtime_temperature_fallback option.
- [TASK] separate ignorelist into external ignorelist.txt
- [TASK] Added Volume direction OSC plugin to readme
- [TASK] Add new command control VRChat Parameters Plugin to Readme
Full Changelog: v1.3.3.2...v1.3.4.0
v1.3.3.2
Standalone Release File (2.55 GB):
Download Server:
Changelog (v1.3.3.2)
- [FEATURE] Added quit signal to quit running backend
- [TASK] Improved NLLB-200 Ctranslate based text-translator on multiple sentences in one text.
- [TASK] Updated CTranslate2
- [BUGFIX] Make sure audio frames are cleared on new start of recording
- [BUGFIX] Fix case when FindWindow windows API call returns wrong window handle for window title. (should fix OCR in these cases not working without error)
Full Changelog: v1.3.3.1...v1.3.3.2
v1.3.3.1
Standalone Release File (2.55 GB):
Download Server:
Changelog (v1.3.3.1)
- [FEATURE] Added NLLB-200 using CTranslate2. (same as faster-whisper)
- [FEATURE] Added new streaming overlay (completely rewritten and looks more like traditional subtitles)
- [FEATURE] Added option to set OSC Chat Prefix
- [FEATURE] Added mirror setting to html websocket overlays
- [TASK] Added option to set NLLB-200 precision.
- [TASK] Show typing indicator when starting speaking even without realtime mode
- [TASK] Updated some dependencies
- [TASK] Allow ctranslate to use float16 even on non-efficient FP16 devices
- [BUGFIX] invalidating TTS data when using SSML break tag.
Full Changelog: v1.3.3.0...v1.3.3.1
v1.3.3.0
Standalone Release File (2.55 GB):
Download Server:
Changelog (v1.3.3.0)
- [FEATURE] Added realtime transcription feature (only available when using VAD)
- [FEATURE] Added optional seperate realtime Whisper model. (allows using a smaller+faster model for realtime transcriptions. Only the final full-clip transcription uses the regular selected whisper model.)
- [FEATURE] Updated websocket clients to show realtime transcriptions
- [TASK] Make phrase_time_limit, pause and energy values configurable at runtime
- [TASK] Remove LLM/FLAN-T5 Large Language Model functions from main code and split it into a seperate plugin. (see https://gist.github.com/Sharrnah/eeaf2acda3e92d8eed1747f05a3f4102 )
- [FEATURE] Added optional on_enable, on_disable methods for plugins
- [TASK] cleaned up some ARGOS translation remains in code.
- [TASK] set faster-whisper as default.
- [BUGFIX] reactivate channel downsampling to improve detection when more than 1 channel is send
Full Changelog: v1.3.2.2...v1.3.3.0
v1.3.2.2
Standalone Release File (2.55 GB):
Download Server:
Changelog (v1.3.2.2)
- [TASK] Added Option to set more precision types (for faster-whisper). int8_float16 should improve memory footprint and speed even more without sacrificing much of the precision
- [TASK] Added beam_size option to increase speed even more while sacrificing quality. Default is 5. a beam_size of 2 or 1 can make it really fast.
- [TASK] Added cpu_threads and num_workers options. num_workers is not really used yet, but cpu_threads can improve performance when running on CPU if the CPU has enough cores/threads.
Full Changelog: v1.3.2.1...v1.3.2.2
v1.3.2.1 (hotfix)
Standalone Release File (2.55 GB):
Download Server:
Changelog (hotfix):
- [BUGFIX] Downloading of faster-whisper models
Changelog (v1.3.2.0)
- [FEATURE] Added faster-whisper (smaller memory footprint + can be about 3x faster)
- [BUGFIX] Updated whisper with bugfix of repeating sentences
- [TASK] Improved on the Plugin system
- [TASK] removed ARGOS translate because of incompatibility with faster-whisper
- [BUGFIX] lock scikit-image to version 0.19.3 because of build bug in 0.20.0
Full Changelog: v1.3.1.0...v1.3.2.1
v1.3.2.0
Standalone Release File (2.55 GB):
Download Server:
Release File failed at downloading faster-whisper models. See hotfix
Changelog:
- [FEATURE] Added faster-whisper (smaller memory footprint + can be about 3x faster)
- [BUGFIX] Updated whisper with bugfix of repeating sentences
- [TASK] Improved on the Plugin system
- [TASK] removed ARGOS translate because of incompatibility with faster-whisper
- [BUGFIX] lock scikit-image to version 0.19.3 because of build bug in 0.20.0
Full Changelog: v1.3.1.0...v1.3.2.0
v1.3.1.0
Standalone Release File (2.55 GB):
Download Server:
Changelog:
- [FEATURE] Added Option for additional VAD Check on full Clip in addition to each frame
- [TASK] Reduced default VAD confidence threshold to 0.4
- [TASK] Expose FP16 Option for Whisper Model
- [TASK] Skip NLLB-200 translation if source and target language are the same
- [FEATURE] Simple Plugin System added.
- [FEATURE] Proof of concept for additional LLM models.
Full Changelog: v1.3.0.1...v1.3.1.0
v1.3.0.1
Standalone Release File (2.55 GB):
Download Server:
Changelog:
- [FEATURE] Added VAD (Voice Activity Detection)
- [BUGFIX] Fix error on logging special language characters
- [TASK] Reverted back logprob_threshold and no_speech_threshold to old defaults because of less consistent recognition with new values
Full Changelog: v1.2.0.6...v1.3.0.1