Skip to content

Commit

Permalink
feat: add support for MP3 bitrates and hint boost (#363)
Browse files Browse the repository at this point in the history
  • Loading branch information
yoshi-automation authored and JustinBeckwith committed May 23, 2019
1 parent 7cd98b4 commit 9aa1bc2
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,11 @@ message RecognitionConfig {
// is replaced with a single byte containing the block length. Only Speex
// wideband is supported. `sample_rate_hertz` must be 16000.
SPEEX_WITH_HEADER_BYTE = 7;

// MP3 audio. Support all standard MP3 bitrates (which range from 32-320
// kbps). When using this encoding, `sample_rate_hertz` can be optionally
// unset if not known.
MP3 = 8;
}

// Encoding of audio data sent in all `RecognitionAudio` messages.
Expand Down Expand Up @@ -511,6 +516,16 @@ message SpeechContext {
// to add additional words to the vocabulary of the recognizer. See
// [usage limits](/speech-to-text/quotas#content).
repeated string phrases = 1;

// Hint Boost. Positive value will increase the probability that a specific
// phrase will be recognized over other similar sounding phrases. The higher
// the boost, the higher the chance of false positive recognition as well.
// Negative boost values would correspond to anti-biasing. Anti-biasing is not
// enabled, so negative boost will simply be ignored. Though `boost` can
// accept a wide range of positive values, most use cases are best served with
// values between 0 and 20. We recommend using a binary search approach to
// finding the optimal value for your use case.
float boost = 4;
}

// Contains audio data in the encoding specified in the `RecognitionConfig`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -398,7 +398,14 @@ const RecognitionConfig = {
* is replaced with a single byte containing the block length. Only Speex
* wideband is supported. `sample_rate_hertz` must be 16000.
*/
SPEEX_WITH_HEADER_BYTE: 7
SPEEX_WITH_HEADER_BYTE: 7,

/**
* MP3 audio. Support all standard MP3 bitrates (which range from 32-320
* kbps). When using this encoding, `sample_rate_hertz` can be optionally
* unset if not known.
*/
MP3: 8
}
};

Expand Down Expand Up @@ -631,6 +638,16 @@ const RecognitionMetadata = {
* to add additional words to the vocabulary of the recognizer. See
* [usage limits](https://cloud.google.com/speech-to-text/quotas#content).
*
* @property {number} boost
* Hint Boost. Positive value will increase the probability that a specific
* phrase will be recognized over other similar sounding phrases. The higher
* the boost, the higher the chance of false positive recognition as well.
* Negative boost values would correspond to anti-biasing. Anti-biasing is not
* enabled, so negative boost will simply be ignored. Though `boost` can
* accept a wide range of positive values, most use cases are best served with
* values between 0 and 20. We recommend using a binary search approach to
* finding the optimal value for your use case.
*
* @typedef SpeechContext
* @memberof google.cloud.speech.v1p1beta1
* @see [google.cloud.speech.v1p1beta1.SpeechContext definition in proto format]{@link https://github.com/googleapis/googleapis/blob/master/google/cloud/speech/v1p1beta1/cloud_speech.proto}
Expand Down
6 changes: 3 additions & 3 deletions packages/google-cloud-speech/synth.metadata
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"updateTime": "2019-05-21T11:25:25.116275Z",
"updateTime": "2019-05-23T11:22:43.094942Z",
"sources": [
{
"generator": {
Expand All @@ -12,8 +12,8 @@
"git": {
"name": "googleapis",
"remote": "https://github.com/googleapis/googleapis.git",
"sha": "32a10f69e2c9ce15bba13ab1ff928bacebb25160",
"internalRef": "249058354"
"sha": "f792303254ea54442d03ca470ffb38930bda7806",
"internalRef": "249516437"
}
},
{
Expand Down

0 comments on commit 9aa1bc2

Please sign in to comment.