Releases: Sharrnah/whispering
Releases · Sharrnah/whispering
v1.3.14.8
Important:
This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.
Standalone Release File (3.1 GB):
Download Server:
Changelog (v1.3.14.8)
- [FEATURE] Add F5 TTS
- [FEATURE] Add option to translate to more than one target language
- [FEATURE] Add OSC Server to synchronize with VRChat Mute state
- [FEATURE] Add support to load a user custom model
- [TASK] Add reload voices event
- [TASK] Update dependencies
- [TASK] Initialize TTS after UI connected
- [TASK] Only send source + translation if both actually exist
- [TASK] remove direct-ml for linux
- [TASK] Add large-v3-turbo model for faster-whisper
- [TASK] Open playback audio device directly with detected informations instead of trying multiple options
- [TASK] return audio segments in faster whisper
- [TASK] additional translation improvements
- [TASK] Upadate ctranslate library
- [BUGFIX] use defined exclude_client for BroadcastMessage
- [BUGFIX] Add possible stream playback fix
- [BUGFIX] Add linux build portaudio dependency
- [BUGFIX] Return correct download status on fallback download
- [BUGFIX] Error if invalid F5/E5 model is requested
Full Changelog: v1.3.14.6...v1.3.14.8
v1.3.14.6
Important:
This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.
Standalone Release File (3.2 GB):
Download Server:
Changelog (v1.3.14.6)
- [FEATURE] Add stt_processing plugin function
- [TASK] Return detected language from transformer whisper
- [TASK] Update transformers library
- [BUGFIX] NLLB200 transformers based implementation
- [BUGFIX] Calculate correct chunk size for VAD v3 model (Fixes #26)
- [BUGFIX] vad_frames_per_buffer validity check
- [BUGFIX] API of M4T and NLLB-200 models with newer transformers library version
- [BUGFIX] disabled VAD error "KeyError: 'plugins'"
Full Changelog: v1.3.14.5...v1.3.14.6
v1.3.14.5
Important:
This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.
Standalone Release File (3.2 GB):
Download Server:
Changelog (v1.3.14.5)
- [FEATURE] Direct-ML support. (Should allow to run many AI models on any DirectX 12 comatible GPU, including Intel and AMD)
- [TASK] Update dependencies
- [BUGFIX] Return original text in case TranslateLanguage thinks a text-translator is active but fails.
- [BUGFIX] voice markers reusing the initial audio data every time.
Full Changelog: v1.3.14.4...v1.3.14.5
v1.3.14.4
Important:
This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.
Standalone Release File (3.1 GB):
Download Server:
Changelog (v1.3.14.4)
- [FEATURE] Write TTS result to file directly if path is provided.
- [TASK] Enable 'thread_per_transcription' by default again.
- [TASK] Show traceback on plugin error.
- [TASK] play_audio supporting bytes, torch.Tensor or numpy array
- [TASK] Add frozendict library (used by ChatTTS plugin)
Full Changelog: v1.3.14.2...v1.3.14.4
v1.3.14.2
Important:
This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.
Standalone Release File (3.1 GB):
Download Server:
Changelog (v1.3.14.2)
- [FEATURE] Support for audio with more than 2 channels.
- [FEATURE] Add MMS STT model
- [FEATURE] clipboard image OCR support
- [FEATURE] Add select_audio widget for Plugins
- [FEATURE] Add textfield widget type
- [FEATURE] Add Speaker diarization class (experimental)
- [FEATURE] Add noisereduce algorythm
- [TASK] Improve streamed audio playback
- [TASK] use romaji setting for translation requests
- [TASK] Update ignorelist
- [TASK] add playback hook, simplify buffer size setting
- [TASK] Separation of audio processing for recording
- [TASK] Add get languages plugin method
- [TASK] Update dependencies
- [TASK] remove downloaded zip renaming
- [TASK] Add multiple file hash check utility function
- [TASK] Send loading message over stdout instead of websocket
- [TASK] Add plugin name to plugin errors
- [TASK] Upgrade dependencies + VAD model to v5
- [BUGFIX] catch plugin exceptions to not break whole application
- [BUGFIX] Fix possible process management error if process could not be run
- [BUGFIX] error on modified value in websocket message
- [BUGFIX] streamed playback of dynamic chunk size
- [BUGFIX] tagged streamed playback
- [BUGFIX] buffer element size calculation.
- [BUGFIX] Wait for resampling until full chunk is ready for streamed playback
- [BUGFIX] resample_audio function on gpu tensors, reshaping audio data
- [BUGFIX] Faster whisper handling of non avialable precision model files
- [BUGFIX] plugin on_*_call calls not returning anything.
Full Changelog: v1.3.13.1...v1.3.14.2
v1.3.13.1
Important:
This requires a lot of configuration if run directly. Recommended way is to use UI Application: https://github.com/Sharrnah/whispering-ui which downloads this automatically.
Standalone Release File (3.1 GB):
Download Server:
Changelog (v1.3.13.1)
- [FEATURE] Add seamlessM4T v2 model
- [FEATURE] Add Wav2Vec Bert2.0 STT Models
- [FEATURE] Add TextCorrection model (for Wav2Vec Bert2.0 models)
- [FEATURE] Add Whisper using Transformer library
- [FEATURE] Add NVIDIA NeMo Canary STT model
- [FEATURE] Add streaming overlay03 with 2 columns
- [FEATURE] plugin event methods
- [TASK] Load settings from Profile folder
- [TASK] Support more datatypes in audio processing methods
- [TASK] use pyaudio pool for audio streamer playback
- [TASK] Update libraries
- [TASK] Add annotated-types lib to build
- [TASK] Switch to CUDA 12.1
- [TASK] split pytorch requirements
- [TASK] Add triton for windows again
- [TASK] Change default vad_frames_per_buffer value
- [TASK] Add TTS playback over html to streaming overlay03
- [TASK] Make Whisper Voice Marker class a singleton
- [TASK] Process transcription in single thread by default
- [BUGFIX] use filename from provided url instead of last redirect
- [BUGFIX] Audio processing without VAD
- [BUGFIX] Multiprocess tasks running main python code
- [BUGFIX] loading whisper large-v3 in different precisions
- [BUGFIX] download whisper large-v3 when set to float32
- [BUGFIX] allow downloads without checksum
Full Changelog: v1.3.12.2...v1.3.13.1
v1.3.12.2
Standalone Release File (3.1 GB):
Download Server:
Changelog (v1.3.12.2)
- [FEATURE] Add Whisper V3 Support
- [FEATURE] Add Whisper Distilled Support
- [FEATURE] Add Option to write continuesly transcriptions to file (
transcription_auto_save_continous_text
setting) - [FEATURE] Add Option to write the audio file of each final transcription (
transcription_save_audio_dir
setting) - [FEATURE] Add buffered streamed audio playback
- [TASK] Replaced Icon
- [TASK] Replaced download library
- [TASK] Add grpcio library
- [TASK] Update omegaconf
- [TASK] Made DeepFilterNet model class a singleton
Full Changelog: v1.3.12.1...v1.3.12.2
v1.3.12.1
Standalone Release File (3.1 GB):
Download Server:
Changelog (v1.3.12.1)
- [FEATURE] Python 3.11 support.
- [BUGFIX] possible deadlock on adding to transcription_list via thread
- [BUGFIX] prevent issue on audio callback with incoming data = Null
- [TASK] early exit audio callback if no audio on realtime process
- [TASK] remove obsolete nltk parts from M2M100_CTranslate
- [TASK] Display current platform version on startup
- [TASK] Send websocket messages in threads
- [TASK] Add option to add environment variables to processmanager started processes
- [TASK] Add RVC dependencies
- [TASK] Update Ignorelist
With this update i also released an RVC Voice-Conversion Plugin.
https://github.com/Sharrnah/whispering/blob/main/documentation/plugins.md#list-of-plugins
Full Changelog: v1.3.11.4...v1.3.12.1
v1.3.11.4
Standalone Release File (3.1 GB):
Download Server:
Changelog (v1.3.11.4)
- [FEATURE] Add M4T Quantization
- [FEATURE] Add marker support for Seamless-M4T model
- [FEATURE] Add option to save transcriptions + translations as CSV
- [FEATURE] Add option to remove repetitions in results
- [FEATURE] Add support for text translation plugins
- [TASK] Add some M4T settings (repetition_penalty, length_penalty, no_repeat_ngram_size)
- [TASK] Add bitsandbytes library
- [TASK] Update pytorch
- [TASK] remove deprecated set_audio_backend() call
- [TASK] seperate sentence splitting model
- [TASK] make most plugin functions optional.
- [TASK] Add target language to result object for M4T
- [BUGFIX] Fix M4T model loading without using hf hash
- [BUGFIX] silero TTS resampling
- [BUGFIX] Error when loading medium M4T Model with custom SeamlessM4TConfig instance
- [BUGFIX] OSC prefix building for Auto source language
- [BUGFIX] Download Seamless-M4T to correct subfolder
Full Changelog: v1.3.11.2...v1.3.11.4
v1.3.11.2
Standalone Release File (2.6 GB):
Download Server:
Changelog (v1.3.11.2)
- [FEATURE] Add source + target language option for M4T Model
- [TASK] Make faster-whisper class a singleton
Full Changelog: v1.3.11.1...v1.3.11.2