v0.4.0 Release
Release v0.4.0
I was going to add documentation for GPT-SoVITS, the upgrade script, and the installation scripts before releasing this version. I also plan to have the installation script detect if the user needs a proxy to download models from huggingface before releasing v0.4.0. However, I realized that I would never release v0.4.0 if I chose to do those things, and v0.4.0 would get bigger and bigger every day.
So yeah, another 2 weeks have passed (after five pre-releases), and here is the v0.4.0 release.
🚀 What's New
💬 Text Input in the Browser
You can now interact with the AI directly by typing in the Browser.
🎉 GPT SoVITS Support
Added GPT SoVITS support by @YveMU in PR #40.
⚙️ Auto Installation Script (Experimental)
Introduced an experimental auto-installation script to simplify setup. This script:
- is cross-platform (at least it's intended to be)
- Creates a miniconda environment in the project directory (and the miniconda is also installed to the project directory).
- Installs FFmpeg and the correct Python version in the miniconda environment.
- Automatically configures dependencies for FunASR, edgeTTS, and ollama (excluding the ollama installation itself).
⚡ ASR/TTS Preloading & Caching
ASR and TTS models now preload when the server launches (default but optional), significantly reducing the wait time when opening the webpage.
🖱️ Pointer Interaction Toggle
Added a Pointer Interactive Button to prevent Live2D from following your cursor.
🔧 Adjustable VAD Confidence Threshold
Introduced a Voice Activation Detection (VAD) Confidence Threshold field:
- Configure how confident the AI must be in detecting speech.
- Example: At 98%, the AI will only listen when it's 98% certain you're speaking.
✨ Special Character Filtering
By default, TTS will no longer vocalize special characters like emojis. (you can re-enable this in conf.yaml
.)
🔄 What's Changed
- Voice interruption turned off by default: You can turn it back on with the "Voice Interruption Button" button. This change is motivated by the following prevalent issue
- the AI got interrupted by background noise
- the system will go crazy when you interrupt yourself (interrupt before AI says anything).
- Default TTS: FunASR is now the default TTS.
- ASR/TTS Visibility: The server shows the active ASR and TTS on launch.
- New Prompt: Added a fun English prompt for discussing nuclear proliferation.
🎉 New Contributors
Thanks to our new contributor:
📜 Full Changelog
View the complete list of changes: v0.3.1...v0.4.0