Skip to content

Commit

Permalink
Merge pull request #502 from deepgram/feat/keyterms+nova-3
Browse files Browse the repository at this point in the history
Feat/keyterms+nova 3
  • Loading branch information
naomi-lgbt authored Feb 11, 2025
2 parents 203733a + f48103f commit 3c477f2
Show file tree
Hide file tree
Showing 84 changed files with 1,681 additions and 131 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ deepgram: DeepgramClient = DeepgramClient("", ClientOptionsFromEnv())

## STEP 2 Call the transcribe_url method on the prerecorded class
options: PrerecordedOptions = PrerecordedOptions(
model="nova-2",
model="nova-3",
smart_format=True,
)
response = deepgram.listen.rest.v("1").transcribe_url(AUDIO_URL, options)
Expand Down Expand Up @@ -134,7 +134,7 @@ dg_connection.on(LiveTranscriptionEvents.Error, on_error)
dg_connection.on(LiveTranscriptionEvents.Close, on_close)

options: LiveOptions = LiveOptions(
model="nova-2",
model="nova-3",
punctuate=True,
language="en-US",
encoding="linear16",
Expand Down
4 changes: 3 additions & 1 deletion deepgram/clients/agent/v1/websocket/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@ class Listen(BaseResponse):
This class defines any configuration settings for the Listen model.
"""

model: Optional[str] = field(default="nova-2")
model: Optional[str] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)


@dataclass
Expand Down
5 changes: 4 additions & 1 deletion deepgram/clients/listen/v1/rest/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,9 @@ class PrerecordedOptions(BaseResponse): # pylint: disable=too-many-instance-att
intents: Optional[bool] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
keyterm: Optional[List[str]] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
keywords: Optional[Union[List[str], str]] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
Expand All @@ -92,7 +95,7 @@ class PrerecordedOptions(BaseResponse): # pylint: disable=too-many-instance-att
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
model: Optional[str] = field(
default="nova-2", metadata=dataclass_config(exclude=lambda f: f is None)
default="None", metadata=dataclass_config(exclude=lambda f: f is None)
)
multichannel: Optional[bool] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
Expand Down
5 changes: 4 additions & 1 deletion deepgram/clients/listen/v1/websocket/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,14 @@ class LiveOptions(BaseResponse): # pylint: disable=too-many-instance-attributes
keywords: Optional[Union[List[str], str]] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
keyterm: Optional[List[str]] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
language: Optional[str] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
model: Optional[str] = field(
default="nova-2", metadata=dataclass_config(exclude=lambda f: f is None)
default="None", metadata=dataclass_config(exclude=lambda f: f is None)
)
multichannel: Optional[bool] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/rest/direct_invocation/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def main():

# STEP 2 Call the transcribe_url method on the prerecorded class
options: PrerecordedOptions = PrerecordedOptions(
model="nova-2",
model="nova-3",
smart_format=True,
summarize="v2",
)
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/websocket/direct_invocation/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ def on_error(self, error, **kwargs):
liveClient.on(LiveTranscriptionEvents.Error, on_error)

# connect to websocket
options: LiveOptions = LiveOptions(model="nova-2", language="en-US")
options: LiveOptions = LiveOptions(model="nova-3", language="en-US")

if liveClient.start(options) is False:
print("Failed to connect to Deepgram")
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/websocket/microphone_inheritance/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ def main():
liveClient: MyLiveClient = MyLiveClient(ClientOptionsFromEnv())

options: LiveOptions = LiveOptions(
model="nova-2",
model="nova-3",
punctuate=True,
language="en-US",
encoding="linear16",
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/websocket/mute-microphone/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def on_error(self, error, **kwargs):
dg_connection.on(LiveTranscriptionEvents.Error, on_error)

options: LiveOptions = LiveOptions(
model="nova-2",
model="nova-3",
punctuate=True,
language="en-US",
encoding="linear16",
Expand Down
10 changes: 5 additions & 5 deletions examples/analyze/intent/conversation.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Thanks to ChatGPT and the advent of the LLM era, the conversational AI tech stac

While these AI agents hold immense potential, many customers have expressed their dissatisfaction with the current crop of voice AI vendors, citing roadblocks related to speed, cost, reliability, and conversational quality. That’s why we’re excited to introduce our own text-to-speech (TTS) API, Deepgram Aura, built for real-time, conversational voice AI agents.

Whether used on its own or in conjunction with our industry-leading Nova-2 speech-to-text API, we’ll soon provide developers with a complete speech AI platform, giving them the essential building blocks they need to build high throughput, real-time AI agents of the future.
Whether used on its own or in conjunction with our industry-leading Nova-3 speech-to-text API, we’ll soon provide developers with a complete speech AI platform, giving them the essential building blocks they need to build high throughput, real-time AI agents of the future.

We are thrilled about the progress our initial group of developers has made using Aura, so much so that we are extending limited access to a select few partners who will be free to begin integrating with Aura immediately. With their feedback, we’ll continue to enhance our suite of voices and API features, as well as ensure a smooth launch of their production-grade applications.

Expand Down Expand Up @@ -51,15 +51,15 @@ Here are some sample clips generated by one of the earliest iterations of Aura.

Our Approach
----------
For nearly a decade, we’ve worked tirelessly to advance the art of the possible in speech recognition and spoken language understanding. Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions. Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations.
For nearly a decade, we’ve worked tirelessly to advance the art of the possible in speech recognition and spoken language understanding. Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions. Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations.

And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency. With support for dozens of languages and custom model training, our technical team has trained and deployed thousands of speech AI models (more than anybody else) which we operate and manage for our customers each day using our own computing infrastructure.
And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency. With support for dozens of languages and custom model training, our technical team has trained and deployed thousands of speech AI models (more than anybody else) which we operate and manage for our customers each day using our own computing infrastructure.

We also have our own in-house data labeling and data ops team with years of experience building bespoke workflows to record, store, and transfer vast amounts of audio in order to label it and continuously grow our bank of high-quality data (millions of hours and counting) used in our model training.

These combined experiences have made us experts in processing and modeling speech audio, especially in support of streaming use cases with our real-time STT models. Our customers have been asking if we could apply the same approach for TTS, and we can.

So what can you expect from Aura? Delivering the same market-leading value and performance as Nova-2 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones. And that’s really what end users need and what our customers have been asking us to build.
So what can you expect from Aura? Delivering the same market-leading value and performance as Nova-3 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones. And that’s really what end users need and what our customers have been asking us to build.

"Deepgram is a valued partner, providing our customers with high throughput speech-to-text that delivers unrivaled performance without tradeoffs between quality, speed, and cost. We're excited to see Deepgram extend their speech AI platform and bring this approach to the text-to-speech market." - Richard Dumas, VP AI Product Strategy at Five9

Expand All @@ -68,4 +68,4 @@ What's Next
----------
As we’ve discussed, scaled voice agents are a high throughput use case, and we believe their success will ultimately depend on a unified approach to audio, one that strikes the right balance between natural voice quality, responsiveness, and cost-efficiency. And with Aura, we’re just getting started. We’re looking forward to continuing to work with customers like Asurion and partners like Five9 across speech-to-text AND text-to-speech as we help them define the future of AI agents, and we invite you to join us on this journey.

We expect to release generally early next year, but if you’re working on any real-time AI agent use cases, join our waitlist today to jumpstart your development in production as we continue to refine our model and API features with your direct feedback.
We expect to release generally early next year, but if you’re working on any real-time AI agent use cases, join our waitlist today to jumpstart your development in production as we continue to refine our model and API features with your direct feedback.
10 changes: 5 additions & 5 deletions examples/analyze/legacy_dict_intent/conversation.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Thanks to ChatGPT and the advent of the LLM era, the conversational AI tech stac

While these AI agents hold immense potential, many customers have expressed their dissatisfaction with the current crop of voice AI vendors, citing roadblocks related to speed, cost, reliability, and conversational quality. That’s why we’re excited to introduce our own text-to-speech (TTS) API, Deepgram Aura, built for real-time, conversational voice AI agents.

Whether used on its own or in conjunction with our industry-leading Nova-2 speech-to-text API, we’ll soon provide developers with a complete speech AI platform, giving them the essential building blocks they need to build high throughput, real-time AI agents of the future.
Whether used on its own or in conjunction with our industry-leading Nova-3 speech-to-text API, we’ll soon provide developers with a complete speech AI platform, giving them the essential building blocks they need to build high throughput, real-time AI agents of the future.

We are thrilled about the progress our initial group of developers has made using Aura, so much so that we are extending limited access to a select few partners who will be free to begin integrating with Aura immediately. With their feedback, we’ll continue to enhance our suite of voices and API features, as well as ensure a smooth launch of their production-grade applications.

Expand Down Expand Up @@ -51,15 +51,15 @@ Here are some sample clips generated by one of the earliest iterations of Aura.

Our Approach
----------
For nearly a decade, we’ve worked tirelessly to advance the art of the possible in speech recognition and spoken language understanding. Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions. Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations.
For nearly a decade, we’ve worked tirelessly to advance the art of the possible in speech recognition and spoken language understanding. Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions. Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations.

And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency. With support for dozens of languages and custom model training, our technical team has trained and deployed thousands of speech AI models (more than anybody else) which we operate and manage for our customers each day using our own computing infrastructure.
And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency. With support for dozens of languages and custom model training, our technical team has trained and deployed thousands of speech AI models (more than anybody else) which we operate and manage for our customers each day using our own computing infrastructure.

We also have our own in-house data labeling and data ops team with years of experience building bespoke workflows to record, store, and transfer vast amounts of audio in order to label it and continuously grow our bank of high-quality data (millions of hours and counting) used in our model training.

These combined experiences have made us experts in processing and modeling speech audio, especially in support of streaming use cases with our real-time STT models. Our customers have been asking if we could apply the same approach for TTS, and we can.

So what can you expect from Aura? Delivering the same market-leading value and performance as Nova-2 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones. And that’s really what end users need and what our customers have been asking us to build.
So what can you expect from Aura? Delivering the same market-leading value and performance as Nova-3 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones. And that’s really what end users need and what our customers have been asking us to build.

"Deepgram is a valued partner, providing our customers with high throughput speech-to-text that delivers unrivaled performance without tradeoffs between quality, speed, and cost. We're excited to see Deepgram extend their speech AI platform and bring this approach to the text-to-speech market." - Richard Dumas, VP AI Product Strategy at Five9

Expand All @@ -68,4 +68,4 @@ What's Next
----------
As we’ve discussed, scaled voice agents are a high throughput use case, and we believe their success will ultimately depend on a unified approach to audio, one that strikes the right balance between natural voice quality, responsiveness, and cost-efficiency. And with Aura, we’re just getting started. We’re looking forward to continuing to work with customers like Asurion and partners like Five9 across speech-to-text AND text-to-speech as we help them define the future of AI agents, and we invite you to join us on this journey.

We expect to release generally early next year, but if you’re working on any real-time AI agent use cases, join our waitlist today to jumpstart your development in production as we continue to refine our model and API features with your direct feedback.
We expect to release generally early next year, but if you’re working on any real-time AI agent use cases, join our waitlist today to jumpstart your development in production as we continue to refine our model and API features with your direct feedback.
Loading

0 comments on commit 3c477f2

Please sign in to comment.