Releases: t41372/Open-LLM-VTuber
v0.2.3-beta
I haven't really touched the project for two weeks now. I thought I was gonna make the new release once I fixed Piper, but that didn't really happen cause I got sick (and feeling very lazy when I'm not). However, there have been some important bug fixes since the last release, so I guess I will just make a new release in case anyone downloads zip files from releases instead of cloning them from git.
New Feature:
- Audio dubbing for a different language through translation: you can now talk to the LLM in English (and the LLM thinks in English) while hearing a Japanese (or any other language, really) audio. This is implemented by adding a translation layer right before the TTS, so nothing is translated to Japanese other than TTS.
Bug fix:
- Missing lines: Under certain circumstances, the audio for some sentences will be ignored even if the corresponding text appears on the screen. This fix is quite important cause it's surprisingly easy to trigger...
Unfixed bug that was supposed to be fixed:
- Piper TTS is currently not working. It never worked, really. I just didn't test it enough before releasing it. It turns out the current way of interacting with Piper TTS has a big problem: it mixes up audio for different sentences and crashes stuff. This shouldn't be a problem if the PiperTTS custom filename flag for the cli interface works as expected, but Piper TTS has been kinda dead and unmaintained for months now with hundreds of opening issues.
v0.2.2-beta
Added features
- New docker support with Nvidia GPU passthrough
- Add: GroqWhisperASR
Bug Fix:
- Fix: Local interruption with "i".
- Fix: unicorn missing websocket
- Fix: MeloTTS nltk download issue
Changes
- removed dependency: halo
v0.2.1-beta
v0.2.0-beta Voice Interruption
Voice interruption!
OLM.interrupt.demo.3.2024-09-02.3.43.34.mp4
Some notable differences about this version (compared to the previous version):
Implemented:
- voice interruption
- Implemented buttons to turn on/off the microphone and the voice interruption.
Changes:
- to use live2d, you must use the mic in the browser now. The
MIC_IN_BROWSER
option is now deprecated and useless. - The
LIVE2D
option in the config.yaml is deprecated and useless now. - to use live2d, just run the server and open
localhost:12393
or whatever port you use with your browser. When the page is loaded, everything is loaded. When the page is closed, the session will end. - you can no longer click the live2d figure to make it do some weird reactions
Bug fix:
- Fixed: Live2D lips have some bugs; sometimes, they won't move. Now using the "RaSan147/pixi-live2d-display" fork instead of the original "guansss/pixi-live2d-display" library.
v0.1.0 Pre-voice-interruption release
I redesigned the backend structure while implementing the voice interruption feature and introduced many breaking changes. It might be a good idea to release a version before the voice interruption version (which would be v0.2.0) gets merged into the main branch so people can easily find and download the pre-interruption version.
Also, I didn't really do versioning because I didn't think about it. I guess now is a good time to do so.
Some notable differences about this version (compare to the next version to be released):
- no voice interruption
- support live2d while the microphone is NOT in the browser (but in the terminal on the server side)
- requires users to run the
server.py
, open the browser page, and launchmain.py
for live2d features. - live2d lips have some bugs and sometimes the lips won't move
- you can still click the live2d figure to make it do some weird reactions
- two of the buttons at the buttom of the page was not implemented and was grey
- can't think of any...