This is a module for the MagicMirror².
How it works 👉 https://nikro.me/articles/professional/crafting-our-ai-assistant/
Goal of the module is to create a custom interactive widget that uses Open AI tools:
- Whisper - self-hosted model for voice-to-text transcription.
- LangChain - intended to be used with ChatGPT API, to process the requests.
- Picovoice -> Porcupine - is used for offline (self-hosted) word trigger (accent on the privacy).
- also... mimic3 :)
Idea is the following:
- Wake word (Porcupine).
- ...record query (show a sexy animation, will be done later)
- ...pass to self-hosted Whisper
- ...transcribe voice-to-text
- Show the question as transcribed rendered-text (in the module render)
- ...pass through LangChain to ChatGPT
- ...pass the textual reply back to the module and render on-screen
- ...use TTS (mimic3) - self-hosted on the network, to throw back a wav file to play.
To use this module, add the following configuration block to the modules array in the config/config.js
file:
var config = {
modules: [
{
module: 'MMM-WhisperGPT',
config: {
// See below for configurable options
picovoiceKey: 'xxx',
picovoiceWord: 'JARVIS',
picovoiceSilenceTime: 3,
picovoiceSilenceThreshold: 600,
audioDeviceIndex: 3,
openAiKey: 'xxx',
openAiSystemMsg: 'xxx',
whisperUrl: '192.168.1.5:9000/asr',
whisperMethod: 'openai-whisper',
mimic3Url: '192.168.1.6:59125'
}
}
]
}
Option | Required? | Description |
---|---|---|
picovoiceKey |
Required | Picovoice access key - you have to register to obtain it - this is used for trigger word. |
picovoiceWord |
Optional | Picovoice trigger word, i.e. BUMBLEBEE, JARVIS, etc. Defaults to JARVIS. |
picovoiceSilenceTime |
Optional | Silence period - defaults to 3 (3 seconds). |
picovoiceSilenceThreshold |
Optional | This is usually background noise * THIS NUMBER. Default value is 1.1 (aka 10%). |
audioDeviceIndex |
Optional | Audio device - i.e. 3 - those will be printed out when you're using debug mode. Defaults to 0. |
whisperUrl |
Required | URL (or IP?) to self-hosted instance of the Whisper. |
whisperMethod |
Optional | Whisper method: openai-whisper or faster-whisper. Defaults to: faster-whisper. |
whisperLanguage |
Optional | Defaults to: en. |
openAiKey |
Required | API Key of OpenAI. |
openAiSystemMsg |
Optional | System msg - how the AI should behave. |
mimic3Url |
Required | Mimic3 URL (server), with protocol, port, without /api/tts |
mimic3Voice |
Optional | Mimic3 Voice - default: en_US/cmu-arctic_low%23gka |
debug |
Optional | If you want to debug, default is: false. |
Picovoice / Porcupine is used for the "Trigger" word. It's a self-hosted small AI / Neural Network (NN). Picovoice offers a range of services, including a license for this offline AI. It only sends usage statistics, not the actual audio conversations.
Whisper is an open-source product from OpenAI. It's a Large Language Model (LLM) AI that handles speech-to-text (transcription). In my personal case, I have it self-hosted on my local network.
I used this: https://github.com/ahmetoner/whisper-asr-webservice
ChatGPT is another product from OpenAI. It's a Large Language Model (LLM) AI. You will need to register and get an API Key to use it.
LangChain is a library built around LLMs that allows for extra functionality, such as long-term memory.
Mycroft's Mimic3 is a Text-to-Speech (TTS) system based on a Large Language Model (LLM). It offers realistic TTS that can run on somewhat resource-restricted systems. I initially tried to set it up on my OrangePi, but instead, I installed it on the same machine with Whisper and use it via the network.
I used this docker-compose.yml 😉
version: '3.7'
services:
mimic3:
image: mycroftai/mimic3
ports:
- 59125:59125
volumes:
- .:/home/mimic3/.local/share/mycroft/mimic3
stdin_open: true
tty: true
- If your audio doesn't work - check if you're using alsa or pulseaudio. You might need to install
mpg123
. You can install it using the commandsudo apt-get install mpg123
. - You might also need to install
lame
for audio encoding. You can install it using the commandsudo apt-get install lame
.