Personal assistant

This is a personal assistant with a twist: a strong sarcastic personality. He might refuse to answer or provide witty and humorous responses.

This is a pet project to exploit the great HuggingFace libraries and available open-source models. My objective was to run the personal assistant in the GPU and fully local, with I achieved with just 6 GB of GPU memory.

Current model stack:

microsoft/Phi-3.5-mini-instruct as LLM
openai/whisper-base as the speech-to-text model
microsoft/speecht5_tts as the text-to-speech model
"microsoft/speecht5_hifigan" as vocoder. Phi-3.5 and Whisper were quantized to 4 bits to fit my GPU.

WIP log

Speech to text models tried:

Whisper: great quality but it is very slow. Unfortunately, it does not fit in my GPU, and quantization is only supported on CPU.
- Changed to Whisper-base.
facebook/wav2vec2-large-960h: the transcription is with very bad quality, I was not able to make it work.
deepspeech not supported with my stack -- didn't check why, but pip can't satisfy requirements

Text to speech models tried:

suno/bark: uses almost all my GPU resources. Works most of the times, but sometimes it gives random output.

TODO:

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
audio_helper.py		audio_helper.py
llm.py		llm.py
main.py		main.py
prompt.wav		prompt.wav
requirements.txt		requirements.txt
stt.py		stt.py
tts.py		tts.py
voice_test.ipynb		voice_test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Personal assistant

WIP log

About

Releases

Packages

Languages

rscmendes/pAssistant

Folders and files

Latest commit

History

Repository files navigation

Personal assistant

WIP log

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages