"Alexa style" continuous speech instruction #1

mph070770 · 2016-05-13T10:03:08Z

Hi - great software!

I have your demo working with Ubuntu. What I'd like to do is detect the keyword in continuous speech in a similar way to the Amazon echo. Is that possible? For example, this:

"Alexa, turn on the lights"

instead of

"Alexa" [ding] "turn on the lights"

Ideally, I'd also want to know where in the audio the keyword was spoken so that it can be removed from audio before I send it to an online engine (such as api.ai or AVS).

Any suggestions would be great.

Thanks

xuchen · 2016-05-13T17:01:47Z

The [ding] sound is actually a callback function you can define yourself. Here's an idea:

keep an audio buffer and a global variable is_triggered = False
when triggered, set is_triggered = True in your callback
send any audio after this point in your buffer to AVS for speech recognition.

Does it make sense?

chenguoguo · 2016-05-14T03:56:26Z

What Xuchen said was correct. You may have to play with the audio buffer a
little bit, to make sure you send all the audio after hotword detection to
the ASR.

Guoguo

On Sat, May 14, 2016 at 1:01 AM, xuchen [email protected] wrote:

The [ding] sound is actually a callback function you can define yourself.
Here's an idea:

keep an audio buffer and a global variable is_triggered = False

when triggered, set is_triggered = True in your callback

send any audio after this point in your buffer to AVS for speech
recognition.

Does it make sense?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#1 (comment)

chenguoguo · 2016-05-15T08:18:58Z

Looks like it has been resolved, so closing this.

mph070770 · 2016-05-15T10:55:56Z

Thanks for the feedback. Are you suggesting a new audio buffer or utilising the ring buffer?

chenguoguo · 2016-05-16T00:56:06Z

Re-opening this since there's on-going discussion... Let me write in more details.

In order to remove the [ding] sound, you only have to modify the callback function as Xuchen said. You do not need another buffer. If your ASR server does online decoding, then you can start transmitting your audio data to the server right after the triggering of the hotword.
You may need another buffer if:
2.1. there is a delay in hotword detection. In this case, you need a buffer to keep some data before the triggering of the hotword, so that you will have a "complete" sentence for your ASR.
2.2 your ASR server can only do offline decoding. In this case you need a buffer for the whole sentence after the triggering of the hotword. You will have to detect the end of the sentence (I can explain more on this if necessary), and then send the whole sentence to your ASR server (this may not be your case).

Does this solve your problem?

chenguoguo · 2016-05-18T10:32:23Z

Closing this as it has been integrated into AlexaPI. See:

https://youtu.be/wLbsAQDmN-c

https://github.com/sammachin/AlexaPi/pull/85

jwhite · 2016-12-31T00:36:44Z

I don't think is closed. This issue is related to continuous detection using a buffer. Alexa-Pi only uses the hotword record method at this time as far as I can tell.

chenguoguo · 2016-12-31T00:48:18Z

OK re-open it. What I suggested above should still stand.

dmc6297 · 2017-03-07T16:21:30Z

I did this by customizing the snowboy_index.js. In the processDetectionResult function I set a "command" flag once the hotword is detected and emit all chunks until silence is detected. Another script builds a buffer from all the chunks and sends them to Microsoft LUIS for recognition.

So you can say "Alexa turn off the lights" all in one phrase without pausing.

_write(chunk, encoding, callback) {
var parent = this;
    const index = this.nativeInstance.RunDetection(chunk);

    this.processDetectionResult(index, chunk);
    if(parent.bufferingCommand == true)
    {
        this.emit('chunk', chunk, encoding);
    }
    return callback();
}

evancohen · 2017-03-07T20:58:29Z

@dmc6297 you might want to check out Sonus. There's an implementation on the audio-buffer branch which uses a ringbuffer + stream transformation (basically what @chenguoguo described in this thread).

The only drawback with my ring buffer implementation is that it doesn't perform super well on low powered devices (Like the Pi Zero, where detection lag increases by about 1/3 of a second).

Stan92 · 2017-03-08T18:19:28Z

Hi,

I'm looking for something like this too using nodejs but less sophisticated :-)

@evancohen, I've seen your project it seems it could probably satisfy my needs (except for MS Cognitive Services).

There are several steps that I can manage using 2 "audio buffers" (one for snowboy, one for Bing).
But I think I'm not on the good path.

This is the workflow I'd like to implement.
I have several hotwords
- For local actions (Time, Light, etc...)
- 1 for activating online action (Go Online)
- 1 for stopping online action (Bye)

a) if it's "Time, Light,..." then I run my "local action"
b) if "Go Online" is detected then I say to the user I'm listening

c.1) if the word/sentence doesn't not exist within Snowboy Model and I'm in "listening mode" I would like to send the word/sentence online (using MS Cognitive Services).

c.2) if the word/sentence exists within the Model and I'm in "listening mode", I don't want to send the data online.

d) if it's "Bye", any word/sentence will be sent online until the user says "Go Online"
e) When a silence of x seconds is detected, I need to back "offline" (means any word/sentence will be sent online until the user says the "Go Online"

Stan92 · 2017-03-12T10:56:23Z

@dmc6297 I tried your customized snowboy_index.js but it doesn't work for me.
When I save the chunk into a buffer, the final file (I concatenate the buffer into a array of bytes), the wav file is inaudible.

    detector.on('chunk', function (chunk, encoding) {
        if (chunk){
            buffers.push(chunk);
            if ((new Date()-timeStart)/1000 > timerInSecond ) {
                detector.bufferingCommand=false;
                getText(buffers); 
            }
        }
    });

The getText transforms the buffer into an array of bytes and sends it to an api
var bytes = Buffer.concat(buffers);

Could you please give me a hand?
Thanks

dmc6297 · 2017-03-13T17:55:04Z

@Stan92 The data is pcm audio, you will need to prepend a wav header to the buffer, or convert to another format. This is how I made it work.

Start the command buffer

detector.on('commandStart', function (hotwordChunk) {
audioCommandBuffer = new Buffer(5000);

var samplesLength = 10000;

var header = new Buffer(1024);
header.write('RIFF',0);

//file length
header.writeUInt32LE(32 + samplesLength * 2,4);
header.write('WAVE',8);

//format chunk idnetifier
header.write('fmt ',12);

//format chunk length
header.writeUInt32LE(16,16);

//sample format (raw)
header.writeUInt16LE(1,20);

//Channel Count
header.writeUInt16LE(detector.numChannels(),22);

//sample rate
header.writeUInt32LE(detector.sampleRate(),24);

//byte rate
//header.writeUInt32LE(detector.sampleRate() * 4,28);
header.writeUInt32LE(32000,28);

//block align (channel count * bytes per sample)
header.writeUInt16LE(2,32);

//bits per sample
header.writeUInt16LE(16,34);

//data chunk identifier
header.write('data',36);

//data chunk length
header.writeUInt32LE(15728640,40);

audioCommandBuffer = header.slice(0,50);

//Comment this out to omit the hotword chunk of audio
audioCommandBuffer = Buffer.concat([audioCommandBuffer,hotwordChunk]);

});

Append to the buffer

detector.on('chunk', function (chunk, encoding) {
audioCommandBuffer = Buffer.concat([audioCommandBuffer,chunk]);
});

And to output the buffer to a file

detector.on('commandStop', function () {
fs.writeFile('/home/pi/Speech/audio.wav',audioCommandBuffer);
});

Stan92 · 2017-03-13T18:23:08Z

@dmc6297 ... I don't know how to thank you... :-).. I'll make a try asap
Thanks once again

zikphil · 2017-10-15T01:45:20Z

Hey you guys, I think this thread is exactly what I am trying to do but in Python. On top of being able to say the full sentence without stopping, I'd also like the capability to keep a 3seconds buffer before HWD kicks-in so I can say stuff like "Goodnight Snowboy". or "What do you think Snowboy" through Google Speech API. Any suggestions on how to achieve that?

chenguoguo · 2017-10-16T00:59:22Z

As you said you can maintain a buffer before the hotword, and when the hotword is detected, you send the buffer to Google Speech API, and see if there's anything meaningful there.

sintetico82 · 2017-10-22T20:43:39Z

Someone can write an example for nodejs?

evancohen · 2017-10-22T21:30:41Z

https://github.com/evancohen/sonus/tree/audio-buffer ^ This branch has an example that uses a ring buffer

On Sun, Oct 22, 2017 at 1:43 PM sintetico82 ***@***.***> wrote: Someone can write an example for nodejs? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABJJHWbG6Ak6tQgz072bXlxlVwIRIyFiks5su6j7gaJpZM4Id35U> .

-- //mobile

同步源更新

uchagani · 2018-03-03T04:39:17Z

@zikphil Were you able to get this working? I am trying to do the same thing. Any help is appreciated. thanks.

chenguoguo closed this as completed May 15, 2016

chenguoguo reopened this May 16, 2016

chenguoguo closed this as completed May 18, 2016

chenguoguo reopened this Dec 31, 2016

chanffdavid mentioned this issue Jan 19, 2017

libsnowboy-detect.a has error on Debian 8 AMD 64 with AVS sample project #108

Closed

chenguoguo mentioned this issue May 2, 2017

Snowboy for Android crashes when two models are specified #178

Closed

boyce-xx mentioned this issue Jun 7, 2017

KITT wake word in alexa-client-sdk #215

Open

BeanStalka mentioned this issue Jun 10, 2017

swig/Android/Make instructions unclear, need help #220

Open

viju85 mentioned this issue Jul 9, 2017

Android - Issue when trying to integrate with new app #200

Closed

TyMarc mentioned this issue Sep 28, 2017

Multiple hotwords for Android #283

Closed

chenguoguo mentioned this issue Oct 23, 2017

NodeJS example for write speech in wav after hotword detection #298

Open

gbCambridge mentioned this issue Dec 4, 2017

Failure on Build in Stretch #323

Closed

offbye mentioned this issue Feb 3, 2018

Run python demo.py alexa.umdl on Respberry 2 meet errors #359

Open

chenguoguo pushed a commit that referenced this issue Feb 12, 2018

Merge pull request #1 from Kitt-AI/master

4968709

同步源更新

SundyJin mentioned this issue Apr 25, 2018

A crash issue on Android 7.1.1 #431

Open

praharshbhatt mentioned this issue Jun 10, 2018

Native crash on snowboyJNI_delete ( Android ) #168

Open

visayamv mentioned this issue Jun 28, 2018

Kitt.AI crash on Ubuntu 16.04 and Alexa SDK #459

Open

macy7 mentioned this issue Mar 4, 2019

Fatal signal 6 (SIGABRT), code -6 #541

Open

irux mentioned this issue Jul 21, 2019

Fatal signal 6 (SIGABRT), code -6 in tid 8320 #595

Open

G-ruchika mentioned this issue Feb 21, 2020

Patch Failed at 65 - Raspbian Buster 10 #639

Open

YitouQiongdiaosi mentioned this issue Sep 17, 2021

I run demo on my device,I got a crash #696

Closed

yulonglovecode mentioned this issue Nov 16, 2021

cant patch avs-kittai.patch to pi.sh and setup.sh #699

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Alexa style" continuous speech instruction #1

"Alexa style" continuous speech instruction #1

mph070770 commented May 13, 2016 •

edited

Loading

xuchen commented May 13, 2016

chenguoguo commented May 14, 2016

chenguoguo commented May 15, 2016

mph070770 commented May 15, 2016

chenguoguo commented May 16, 2016

chenguoguo commented May 18, 2016

jwhite commented Dec 31, 2016

chenguoguo commented Dec 31, 2016

dmc6297 commented Mar 7, 2017

evancohen commented Mar 7, 2017

Stan92 commented Mar 8, 2017

Stan92 commented Mar 12, 2017

dmc6297 commented Mar 13, 2017 •

edited

Loading

Stan92 commented Mar 13, 2017

zikphil commented Oct 15, 2017

chenguoguo commented Oct 16, 2017

sintetico82 commented Oct 22, 2017

evancohen commented Oct 22, 2017 via email

uchagani commented Mar 3, 2018

"Alexa style" continuous speech instruction #1

"Alexa style" continuous speech instruction #1

Comments

mph070770 commented May 13, 2016 • edited Loading

xuchen commented May 13, 2016

chenguoguo commented May 14, 2016

chenguoguo commented May 15, 2016

mph070770 commented May 15, 2016

chenguoguo commented May 16, 2016

chenguoguo commented May 18, 2016

jwhite commented Dec 31, 2016

chenguoguo commented Dec 31, 2016

dmc6297 commented Mar 7, 2017

evancohen commented Mar 7, 2017

Stan92 commented Mar 8, 2017

Stan92 commented Mar 12, 2017

dmc6297 commented Mar 13, 2017 • edited Loading

Stan92 commented Mar 13, 2017

zikphil commented Oct 15, 2017

chenguoguo commented Oct 16, 2017

sintetico82 commented Oct 22, 2017

evancohen commented Oct 22, 2017 via email

uchagani commented Mar 3, 2018

mph070770 commented May 13, 2016 •

edited

Loading

dmc6297 commented Mar 13, 2017 •

edited

Loading