Modularized STT implementation #118

astahlman · 2014-07-27T00:42:48Z

Overview

This commit abstracts out the Speech To Text engine into a new module: stt.py. Users now have the option to specify a Google API key in their profile. If this key is present, Jasper will rely on the Google Speech API to transcribe audio during the active listen phase. The default behavior still uses the PocketSphinx engine for audio transcription.

Motivation

There is a stark difference between the performance of the Google Speech API and the PocketSphinx performance. I rarely ever need to repeat myself anymore.

Testing

Manual testing on both OS X and Raspberry Pi using the PocketSphinx engine as well as the Google Speech API
Updated tests, boot/test.py and client/test.py both pass.

Prerequisites

The new STT implementation requires a Google API key to be present in profile.yml

To obtain an API key:

Join the Chromium Dev group
Create a project through the Google Developers console
Select your project. In the sidebar, navigate to "APIs & Auth." Activate the Speech API.
Under "APIs & Auth," navigate to "Credentials." Create a new key for public API access.
Copy your API key and run
```
cd client/; python populate.py
```
When prompted, paste this key for access to the Speech API.

This implementation also requires that either the ffmpeg or avconv audio utility be present on your $PATH. To install on RPi, simply run

sudo apt-get install libav-tools

Acknowledgements

This was inspired by @fritz-fritz 's fork

Conflicts: client/local_mic.py

charliermarsh · 2014-07-28T00:09:28Z

client/main.py

@@ -1,6 +1,8 @@
 import yaml
 import sys
 import speaker
+import stt


Can we combine these into a single import? I prefer that practice, and all it requires is that line 32 use stt.PocketSpinxSTT().

charliermarsh · 2014-07-28T00:22:33Z

This is great! I had this on my TODO list for the next few weeks, so thank you very much for sending over the pull request! I've gone through and added some comments, but once those are addressed, I don't think we'll be far away from merging this in.

As an aside: once this is merged in, can you send a pull request over to the docs site with amendments?

astahlman · 2014-07-28T14:39:39Z

Sure, all of these comments make sense. I'll try to get around to making these changes in the next few days and send an updated pull request. And yes, post-merge I'll send a pull request for the docs.

willondubs · 2014-07-29T01:46:19Z

Just a suggestion regarding Google's STT v2 implementation. FLAC files are only required with v1. V2 additionally allows wav or mp3. You can even stream audio directly to the Google Speech API. (It works simple enough with nodejs, but I haven't been able to stream using Python just yet.) Google Speech API v2 works very similar to Wit.Ai so if you'll be enhancing your work, why not also include implements for Wit.Ai? Ref: https://www.npmjs.org/package/node-record-lpcm16

astahlman · 2014-07-30T03:49:16Z

@willondubs thanks for the suggestion - I didn't know that v2 accepted .wav files. This allowed me to eliminate the implicit dependency on either ffmpeg or avlib. wit.ai looks nice, as well. I think it should be fairly easy for someone to integrate with their API after this change.

@crm416, I believe I've addressed your comments. Upon running populate.py, the user will now be prompted to choose their STT engine (or hit enter to default to 'sphinx'). If the user chooses 'google', he or she will then be prompted for an API key, which is added to profile["keys"]. We will default to PocketSphinx if no STT engine is specified in the profile.

shbhrsaha · 2014-08-07T06:32:13Z

Thanks for the update, @astahlman . I'm going to run this through testing this weekend, and we'll merge it in if it looks good! Great work

Holzhaus · 2014-08-09T12:11:44Z

Great!

LGTM! Thanks for the fine work and revisions.

shbhrsaha · 2014-08-10T04:10:34Z

LGTM! Thanks for the fine work and revisions.

charliermarsh · 2014-08-22T00:58:11Z

@astahlman Can you send over a pull requests to the docs repo outlining the changes here?

bsinfo523 · 2014-12-31T01:51:10Z

@willondubs @astahlman thanks for the great work - has anyone already integrated the wit.ai as STT in Jasper?

This was asked for in a comment at PR jasperproject#118

Holzhaus · 2015-01-02T10:04:37Z

@bsinfo523 Check out Pull Request #273.

himuura · 2017-08-02T16:05:36Z

any idea on how to renew the google api key automatically after it burns out the 50 queries per day quota?

astahlman added 6 commits July 23, 2014 21:21

Google STT works on OS X

6fb17a6

Work on Google STT on RPi.

d7b9149

Cleanup before pull request.

673db3b

Fixing unit tests.

34f9b12

Merge upstream changes

874f888

Conflicts: client/local_mic.py

Cleanup before pull request.

3f5c1cc

charliermarsh reviewed Jul 28, 2014
View reviewed changes

astahlman added 2 commits July 29, 2014 19:44

Addressed CR comments.

e3dcbec

Prompt user to populate stt_engine in profile

86f8bf4

shbhrsaha added a commit that referenced this pull request Aug 10, 2014

Merge pull request #118 from astahlman/master

8112caf

LGTM! Thanks for the fine work and revisions.

shbhrsaha merged commit 8112caf into jasperproject:master Aug 10, 2014

shbhrsaha mentioned this pull request Aug 10, 2014

fix musicmode with additional speaker argument #122

Closed

Holzhaus added a commit to Holzhaus/jasper-client that referenced this pull request Aug 10, 2014

Merge changes from jasperproject#118

bac0bf9

Holzhaus mentioned this pull request Aug 13, 2014

Voice detection detecting wrong phrases #29

Closed

Holzhaus added a commit to Holzhaus/jasper-client that referenced this pull request Jan 2, 2015

Added witai-stt-engine to client.stt module

5cce608

This was asked for in a comment at PR jasperproject#118

Holzhaus mentioned this pull request Jan 2, 2015

Added witai-stt-engine to client.stt module #273

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modularized STT implementation #118

Modularized STT implementation #118

astahlman commented Jul 27, 2014

charliermarsh Jul 28, 2014

charliermarsh commented Jul 28, 2014

astahlman commented Jul 28, 2014

willondubs commented Jul 29, 2014

astahlman commented Jul 30, 2014

shbhrsaha commented Aug 7, 2014

Holzhaus commented Aug 9, 2014

shbhrsaha commented Aug 10, 2014

charliermarsh commented Aug 22, 2014

bsinfo523 commented Dec 31, 2014

Holzhaus commented Jan 2, 2015

himuura commented Aug 2, 2017

Modularized STT implementation #118

Modularized STT implementation #118

Conversation

astahlman commented Jul 27, 2014

Overview

Motivation

Testing

Prerequisites

Acknowledgements

charliermarsh Jul 28, 2014

Choose a reason for hiding this comment

charliermarsh commented Jul 28, 2014

astahlman commented Jul 28, 2014

willondubs commented Jul 29, 2014

astahlman commented Jul 30, 2014

shbhrsaha commented Aug 7, 2014

Holzhaus commented Aug 9, 2014

shbhrsaha commented Aug 10, 2014

charliermarsh commented Aug 22, 2014

bsinfo523 commented Dec 31, 2014

Holzhaus commented Jan 2, 2015

himuura commented Aug 2, 2017