-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modularized STT implementation #118
Conversation
@@ -1,6 +1,8 @@ | |||
import yaml | |||
import sys | |||
import speaker | |||
import stt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we combine these into a single import? I prefer that practice, and all it requires is that line 32 use stt.PocketSpinxSTT()
.
This is great! I had this on my TODO list for the next few weeks, so thank you very much for sending over the pull request! I've gone through and added some comments, but once those are addressed, I don't think we'll be far away from merging this in. As an aside: once this is merged in, can you send a pull request over to the docs site with amendments? |
Sure, all of these comments make sense. I'll try to get around to making these changes in the next few days and send an updated pull request. And yes, post-merge I'll send a pull request for the docs. |
Just a suggestion regarding Google's STT v2 implementation. FLAC files are only required with v1. V2 additionally allows wav or mp3. You can even stream audio directly to the Google Speech API. (It works simple enough with nodejs, but I haven't been able to stream using Python just yet.) Google Speech API v2 works very similar to Wit.Ai so if you'll be enhancing your work, why not also include implements for Wit.Ai? Ref: https://www.npmjs.org/package/node-record-lpcm16 |
@willondubs thanks for the suggestion - I didn't know that v2 accepted .wav files. This allowed me to eliminate the implicit dependency on either ffmpeg or avlib. wit.ai looks nice, as well. I think it should be fairly easy for someone to integrate with their API after this change. @crm416, I believe I've addressed your comments. Upon running populate.py, the user will now be prompted to choose their STT engine (or hit enter to default to 'sphinx'). If the user chooses 'google', he or she will then be prompted for an API key, which is added to profile["keys"]. We will default to PocketSphinx if no STT engine is specified in the profile. |
Thanks for the update, @astahlman . I'm going to run this through testing this weekend, and we'll merge it in if it looks good! Great work |
Great! |
LGTM! Thanks for the fine work and revisions.
LGTM! Thanks for the fine work and revisions. |
@astahlman Can you send over a pull requests to the docs repo outlining the changes here? |
@willondubs @astahlman thanks for the great work - has anyone already integrated the wit.ai as STT in Jasper? |
This was asked for in a comment at PR jasperproject#118
@bsinfo523 Check out Pull Request #273. |
any idea on how to renew the google api key automatically after it burns out the 50 queries per day quota? |
Overview
This commit abstracts out the Speech To Text engine into a new module: stt.py. Users now have the option to specify a Google API key in their profile. If this key is present, Jasper will rely on the Google Speech API to transcribe audio during the active listen phase. The default behavior still uses the PocketSphinx engine for audio transcription.
Motivation
There is a stark difference between the performance of the Google Speech API and the PocketSphinx performance. I rarely ever need to repeat myself anymore.
Testing
Prerequisites
The new STT implementation requires a Google API key to be present in profile.yml
To obtain an API key:
This implementation also requires that either the
ffmpeg
oravconv
audio utility be present on your $PATH. To install on RPi, simply runAcknowledgements
This was inspired by @fritz-fritz 's fork