Releases: Sharrnah/whispering
v1.2.0.6
Standalone Release File (2.54 GB):
Download Server:
Changelog:
- [FEATURE] Added loading state message for loading dialogues.
- [FEATURE] Send OCR processed image.
- [TASK] Exposed some additional advanced settings, like tts rate and pitch and whisper logprob and no_speech thresholds
- [TASK] Small stability improvements
Full Changelog: v1.2.0.5...v1.2.0.6
v1.2.0.5
Standalone Release File (2.31 GB):
Download Server:
Changelog:
- [TASK] Added available settings values for UI display
- [TASK] Added settings update request event
- [TASK] Hide processing loader in websocket html clients if Whisper returns without result
- [BUGFIX] Fixed fallback of Silero model list requiring internet connection
- [FEATURE] expose initial_prompt setting for Whisper
Full Changelog: v1.2.0.4...v1.2.0.5
initial_prompt
can be used to give Whisper a texting style to try to follow.
For example, setting initial_prompt
to "Umm, let me think like, hmm... Okay, here's what I'm, like, thinking.
" will let Whisper transcribe filler words if they appear in the audio.
v1.2.0.4
Standalone Release File (2.31 GB):
Download Server:
Changelog:
- [TASK] Make most long-running tasks threaded.
- [TASK] Only send OCR + Translation results to requesting client
- [BUGFIX] Only send updated settings to other clients
- [FEATURE] Send stop processing event with no whisper result
Full Changelog: v1.2.0.3...v1.2.0.4
Find the new UI here:
https://github.com/Sharrnah/whispering-ui/releases/latest
After download of the UI, place the exe and ttf into the root of your whispering tiger folder. (where the other .bat files, README.md etc. are located.)
v1.2.0.3
Standalone Release File (2.31 GB):
Download Server:
Changelog:
- [TASK] Make A.I. Model device for NLLB200 configurable
- [BUGFIX] Add missing default settings
- [BUGFIX] Wrong model download links
- [BUGFIX] Signal only works in main thread of the main interpreter
- This results in the A.I. models to be downloaded on startup instead of when the model is requested.
Find the new UI here:
https://github.com/Sharrnah/whispering-ui/releases/latest
After download of the UI, place the exe and ttf into the root of your whispering tiger folder. (where the other .bat files, README.md etc. are located.)
v1.2.0.2
Standalone Release File (2.31 GB):
Download Server:
Changelog:
- [BUGFIX] silero init error with invalid settings
- [TASK] Set OSC IP to a better default
- [TASK] allow setting phrase_time_limit, pause and energy to be read from settings
Find the new UI here:
https://github.com/Sharrnah/whispering-ui/releases/latest
After download of the UI, place the exe and ttf into the root of your whispering tiger folder. (where the other .bat files, README.md etc. are located.)
v1.2.0.1
Standalone Release File (2.31 GB):
Download Server:
Changelog:
- [BUGFIX] Some more stability improvement
- [BUGFIX] in combination with the new UI throwing: Invalid language.
Find the new UI here:
https://github.com/Sharrnah/whispering-ui/releases/latest
After download of the UI, place the exe and ttf into the root of your whispering tiger folder. (where the other .bat files, README.md etc. are located.)
v1.2.0.0
Standalone Release File (2.31 GB):
Download Server:
Changelog:
- [TASK] Updated libraries
- Including Whisper Project which now features a large.v2 model.
- [BUGFIX] Improvements about the general stability
- [BUGFIX] TTS Silero loading on CPU and fallback if CUDA is not available
- [BUGFIX] TTS Silero error if internet connection has issues. (Was caused by a forced online check which is now disabled)
- [TASK] Added preprocessing of the text send to Silero. (So now numbers can be spoken and multiple punctuation's don't freak out the TTS)
- [TASK] Websocket remote only shows the working Silero V3 models.
- [BUGFIX] fixed issue with multiline text language recognition
- [BUGFIX] CLI argument and settings-file options fallback.
- [TASK] Improved websocket transfer of specific messages so they are not send to all clients anymore. (prevents multiple browser to play TTS etc.)
- [TASK] Some general preparations for the upcoming new UI.
v1.1.0.0
Standalone Release File (2.30 GB):
Download Server:
Changelog:
- [FEATURE] Added TTS (Text 2 Speech) using Silero
- [FEATURE] Added model download retry, fallback and checksum check.
- [FEATURE] Added FLAN-T5 conditioning.
- [TASK] Code restructuring.
Text 2 Speech example:
fvzyuMpe.mp4
v1.0.7.1
Standalone Release File (2.30 GB):
Download Server:
Changelog:
- [BUGFIX] translate to speaker if flan-t5 question processing is disabled
- [TASK] Added OSC-auto-processing option (To toggle OSC temporarily while app is running)
About FLAN-T5:
flan_process_only_questions
and flan_whisper_answer
can be enabled, to have FLAN-T5 only answer spoken questions.
That means the from whisperAI recognized text should include a question-typical word and a question-mark.
Since FLAN-T5 can do much more, there might be more possibilities to use this A.I. model in the future.
v1.0.7.0
Standalone Release File (2.30 GB):
Download Server:
Changelog:
- [FEATURE] Added experimental FLAN-T5 AI. supporting automatic answering, continuation to questions or phrases, spoken or written. (see more on https://analyticsindiamag.com/google-ai-introduces-flan-t5-a-new-open-source-language-model/).
- [FEATURE] Added LID language classifier for auto-detecting the language of text.
- [FEATURE] Added NLLB200 text translator. Supporting around 200 languages in a single model.
- [FEATURE] Added config file. (To support more settings without having to add much more Command-line flags)
- [FEATURE] Added bottom_align HTML parameter to websocket clients. (To make it easier to align streaming overlays at the bottom of the image)
- [TASK] Updated dependencies
- [CHANGE] (Breaking change if used as command-line flag!) renamed
m2m100_size
andm2m100_device
totxt_translator_size
andtxt_translator_device
accordingly,
About FLAN-T5:
flan_process_only_questions
and flan_whisper_answer
can be enabled, to have FLAN-T5 only answer spoken questions.
That means the from whisperAI recognized text should include a question-typical word and a question-mark.
Since FLAN-T5 can do much more, there might be more possibilities to use this A.I. model in the future.