-
Notifications
You must be signed in to change notification settings - Fork 14
2. Settings
There are multiple settings that you can change by modifying variables in the SETTINGS.json database of nala.py. Here is a brief list of them along with what they can be changed to and their descriptions.
Variable | Options | Description |
---|---|---|
alarm | True, False | whether the alarm is turned on or off at the designated time |
alarm time | 8 | the time the alarm would go off at (in 24 hour time, 8 = 8AM, 13 = 1 PM) if the alarm action is turned on. |
greeting | True, False | if True, then Nala will greet you every time you login and get the weather (default). If False, she will not do this. |
end | 1531914937.172238 | the last time that you updated the database (this is useful for understanding sessions) |
transcription_type | ‘sphinx’, ‘google’ | The type of transcription. Default is ‘sphinx’ (and if google and the path to the environment variable cannot be found, it will revert back to sphinx). |
wake_type | 'sphinx', 'snowboy', 'porcupine' | Wakeword detector used to detect user queries. Default is ‘porcupine’ as it is the most accurate wakeword detector. |
query_time | 2 | Time in seconds of each query when Nala is activated. The default query time is 2 seconds (from trial-and-error). |
multi_query | True, False | Multi-query capability allows you to separate queries with AND in the transcript, so it doesn’t stop after one query. Default is True. |
query_save | True, False | Ability to save queries once they have been propagated. Otherwise, they are deleted. This is useful if you want to cache query data or build a dataset. Default is True. |
register_face | True, False | Store face when user registers to authenticate later with facial recognition. Default is True. |
sleep_time | 30 | The time (in minutes) that Nala will sleep if you trigger the “Go to sleep” action query. Default is 30 minutes. |
query_json | True, False | Save .json queries as well in the data/queries folder to match audio (e.g. sample.wav --> sample.json with query info) |
budget | 30 | Budget user has to go out with friends (for actions). |
genre | 'classical' | Type of music genre user prefers (for actions). |
You also need to set some environment variables if you'd like to use a few actions. Specifically, it requires access to the root account to do things like shutdown or restart the computer. If you don't want to do this, it's totally fine too, you just won't be able to do these commands.
You can modify the voice type by changing the speak.py script in the ./actions/folder. Nala uses the Pyttsx3 library and the default is Fiona ('com.apple.speech.synthesis.voice.fiona'). You just need to modify this script for the voice to change across all Nala experiences.
import sys
import pyttsx3 as pyttsx
def say(text):
engine = pyttsx.init()
engine.setProperty('voice','com.apple.speech.synthesis.voice.fiona')
engine.say(text)
engine.runAndWait()
say(str(sys.argv[1]))
Check out the Pyttsx3 documentation for more information on how to modify things like the speech rate and pitch.
Nala by default queries wake words with PocketSphinx to not drive up costs (as google charges $0.006/query). Therefore, as you code more actions into Nala you may need to retrain a new transcription model based on a new language model.
This is quite easy to do. Here are some quick instructions.
- First, we need to create a text document (.txt) with keywords to train the language model. These are the words that the transcription model will be trained to recognize. The fewer the words in the master corpus, the better accuracy you’ll likely achieve. Here are the words the current model uses that you can add onto; you should keep these core words in there if you'd like to use the default actions. You can find this text file at this path: ./data/models/nala_2.txt.
play music
get the weather
get social
get coffee
get the news
get sports
get food
get ice cream
get beers
get beer
get social
get food
get nightlife
find a bar
plan trip
set alarm
stop alarm
make a poem
make a joke
record audio
record video
open atom
open sublime
open spotify
open twitter
open linkedin
open facebook
open github
chill out
exercise
I love you
search
be grateful
meditate
shut down
restart
log out
sleep
- Now that we have our text corpus, we can go to the LMTool page. Simply click on the “Browse…” button, select the corpus.txt file you created, then click “COMPILE KNOWLEDGE BASE” and download all the files (Figure 7.3.5.2). In this case, there is a TAR4311.tgz file that I can download at the top easily into the downloads folder.
- Now that we have all the required files, all we need to do is load them in PocketSphinx via the ./data/models/ps_transcribe.py script. You just need to change the 4437.lm and 4437.dic files to whatever .lm and .dic file you just trained. Now you're good to go with a new transcription model!
from os import environ, path
import sys
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *
# Get all the directories right
def transcribe(HOSTDIR, SAMPLE):
# fix host directory if it doesn't contain a '/'
if HOSTDIR[-1] != '/':
HOSTDIR = HOSTDIR+'/'
SAMPLEDIR = HOSTDIR+SAMPLE
MODELDIR = HOSTDIR+"data/models"
DATADIR = HOSTDIR+"data/wakewords"
# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', MODELDIR+'/en-us')
config.set_string('-lm', MODELDIR+'/4437.lm')
config.set_string('-dict', MODELDIR+'/4437.dic')
decoder = Decoder(config)
# Decode streaming data.
decoder = Decoder(config)
decoder.start_utt()
stream = open(SAMPLEDIR, 'rb')
while True:
buf = stream.read(1024)
if buf:
decoder.process_raw(buf, False, False)
else:
break
decoder.end_utt()
print ('Best hypothesis segments: ', [seg.word for seg in decoder.seg()])
output=[seg.word for seg in decoder.seg()]
output.remove('<s>')
output.remove('</s>')
transcript = ''
for i in range(len(output)):
if i == 0:
transcript=transcript+output[i]
else:
transcript=transcript+' '+output[i]
transcript=transcript.lower()
print('transcript: '+transcript)
return transcript
The best wakeword engine to use is Porcupine (as shown in the following figure taken from this repo). What's great is that Porcupine is completely opensource for mac, windows, and linux operating systems.
If you'd like to train a new wakeword, you are first going to need to clone Porcupine's repository:
git clone https://github.com/Picovoice/Porcupine.git
cd Porcupine
Porcupine enables developers to build models for any wake word. This is done using Porcupine's optimizer utility. It finds optimal model hyper-parameters for a given hotword and stores these parameters in a, so-called, keyword file. You could create your own keyword file using the Porcupine's optimizer from the command line
tools/optimizer/${SYSTEM}/${MACHINE}/pv_porcupine_optimizer -r resources/ -w ${WAKE_WORD} \
-p ${TARGET_SYSTEM} -o ${OUTPUT_DIRECTORY}
In the above example replace ${SYSTEM} and ${TARGET_SYSTEM} with current and target (runtime) operating systems (linux, mac, or windows). ${MACHINE} is the CPU architecture of current machine (x86_64 or i386). ${WAKE_WORD} is the chosen wake word. Finally, ${OUTPUT_DIRECTORY} is the output directory where keyword file will be stored.
Now that you have your wake word, all you need to do is edit the ./data/models/porcupine file so that it matches with your wake word of interest and Nala will now respond to that wake word.
If you instead would like to train models with PocketSphinx or Snowboy, check out each of their documentation.