Release v0.4.0

I was going to add documentation for GPT-SoVITS, the upgrade script, and the installation scripts before releasing this version. I also plan to have the installation script detect if the user needs a proxy to download models from huggingface before releasing v0.4.0. However, I realized that I would never release v0.4.0 if I chose to do those things, and v0.4.0 would get bigger and bigger every day.

So yeah, another 2 weeks have passed (after five pre-releases), and here is the v0.4.0 release.

🚀 What's New

💬 Text Input in the Browser

You can now interact with the AI directly by typing in the Browser.

🎉 GPT SoVITS Support

Added GPT SoVITS support by @YveMU in PR #40.

⚙️ Auto Installation Script (Experimental)

Introduced an experimental auto-installation script to simplify setup. This script:

is cross-platform (at least it's intended to be)
Creates a miniconda environment in the project directory (and the miniconda is also installed to the project directory).
Installs FFmpeg and the correct Python version in the miniconda environment.
Automatically configures dependencies for FunASR, edgeTTS, and ollama (excluding the ollama installation itself).

⚡ ASR/TTS Preloading & Caching

ASR and TTS models now preload when the server launches (default but optional), significantly reducing the wait time when opening the webpage.

🖱️ Pointer Interaction Toggle

Added a Pointer Interactive Button to prevent Live2D from following your cursor.

🔧 Adjustable VAD Confidence Threshold

Introduced a Voice Activation Detection (VAD) Confidence Threshold field:

Configure how confident the AI must be in detecting speech.
Example: At 98%, the AI will only listen when it's 98% certain you're speaking.

✨ Special Character Filtering

By default, TTS will no longer vocalize special characters like emojis. (you can re-enable this in conf.yaml.)

🔄 What's Changed

Voice interruption turned off by default: You can turn it back on with the "Voice Interruption Button" button. This change is motivated by the following prevalent issue
- the AI got interrupted by background noise
- the system will go crazy when you interrupt yourself (interrupt before AI says anything).
Default TTS: FunASR is now the default TTS.
ASR/TTS Visibility: The server shows the active ASR and TTS on launch.
New Prompt: Added a fun English prompt for discussing nuclear proliferation.

🎉 New Contributors

Thanks to our new contributor:

@YveMU for their first contribution in PR #40.

📜 Full Changelog

View the complete list of changes: v0.3.1...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0 Release