Skip to content

v0.4.0 Release

Compare
Choose a tag to compare
@t41372 t41372 released this 28 Nov 03:50
· 27 commits to main since this release
c160418

Release v0.4.0

I was going to add documentation for GPT-SoVITS, the upgrade script, and the installation scripts before releasing this version. I also plan to have the installation script detect if the user needs a proxy to download models from huggingface before releasing v0.4.0. However, I realized that I would never release v0.4.0 if I chose to do those things, and v0.4.0 would get bigger and bigger every day.

So yeah, another 2 weeks have passed (after five pre-releases), and here is the v0.4.0 release.

🚀 What's New

💬 Text Input in the Browser

You can now interact with the AI directly by typing in the Browser.

🎉 GPT SoVITS Support

Added GPT SoVITS support by @YveMU in PR #40.

⚙️ Auto Installation Script (Experimental)

Introduced an experimental auto-installation script to simplify setup. This script:

  • is cross-platform (at least it's intended to be)
  • Creates a miniconda environment in the project directory (and the miniconda is also installed to the project directory).
  • Installs FFmpeg and the correct Python version in the miniconda environment.
  • Automatically configures dependencies for FunASR, edgeTTS, and ollama (excluding the ollama installation itself).

⚡ ASR/TTS Preloading & Caching

ASR and TTS models now preload when the server launches (default but optional), significantly reducing the wait time when opening the webpage.

🖱️ Pointer Interaction Toggle

Added a Pointer Interactive Button to prevent Live2D from following your cursor.

🔧 Adjustable VAD Confidence Threshold

Introduced a Voice Activation Detection (VAD) Confidence Threshold field:

  • Configure how confident the AI must be in detecting speech.
  • Example: At 98%, the AI will only listen when it's 98% certain you're speaking.

✨ Special Character Filtering

By default, TTS will no longer vocalize special characters like emojis. (you can re-enable this in conf.yaml.)


🔄 What's Changed

  • Voice interruption turned off by default: You can turn it back on with the "Voice Interruption Button" button. This change is motivated by the following prevalent issue
    • the AI got interrupted by background noise
    • the system will go crazy when you interrupt yourself (interrupt before AI says anything).
  • Default TTS: FunASR is now the default TTS.
  • ASR/TTS Visibility: The server shows the active ASR and TTS on launch.
  • New Prompt: Added a fun English prompt for discussing nuclear proliferation.

🎉 New Contributors

Thanks to our new contributor:


📜 Full Changelog

View the complete list of changes: v0.3.1...v0.4.0