Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/keyterms+nova 3 #502

Merged
merged 3 commits into from
Feb 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ deepgram: DeepgramClient = DeepgramClient("", ClientOptionsFromEnv())

## STEP 2 Call the transcribe_url method on the prerecorded class
options: PrerecordedOptions = PrerecordedOptions(
model="nova-2",
model="nova-3",
smart_format=True,
)
response = deepgram.listen.rest.v("1").transcribe_url(AUDIO_URL, options)
Expand Down Expand Up @@ -134,7 +134,7 @@ dg_connection.on(LiveTranscriptionEvents.Error, on_error)
dg_connection.on(LiveTranscriptionEvents.Close, on_close)

options: LiveOptions = LiveOptions(
model="nova-2",
model="nova-3",
punctuate=True,
language="en-US",
encoding="linear16",
Expand Down
4 changes: 3 additions & 1 deletion deepgram/clients/agent/v1/websocket/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@ class Listen(BaseResponse):
This class defines any configuration settings for the Listen model.
"""

model: Optional[str] = field(default="nova-2")
model: Optional[str] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)


@dataclass
Expand Down
5 changes: 4 additions & 1 deletion deepgram/clients/listen/v1/rest/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,9 @@ class PrerecordedOptions(BaseResponse): # pylint: disable=too-many-instance-att
intents: Optional[bool] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
keyterm: Optional[List[str]] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
keywords: Optional[Union[List[str], str]] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
Expand All @@ -92,7 +95,7 @@ class PrerecordedOptions(BaseResponse): # pylint: disable=too-many-instance-att
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
model: Optional[str] = field(
default="nova-2", metadata=dataclass_config(exclude=lambda f: f is None)
default="None", metadata=dataclass_config(exclude=lambda f: f is None)
)
multichannel: Optional[bool] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
Expand Down
5 changes: 4 additions & 1 deletion deepgram/clients/listen/v1/websocket/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,14 @@ class LiveOptions(BaseResponse): # pylint: disable=too-many-instance-attributes
keywords: Optional[Union[List[str], str]] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
keyterm: Optional[List[str]] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
language: Optional[str] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
)
model: Optional[str] = field(
default="nova-2", metadata=dataclass_config(exclude=lambda f: f is None)
default="None", metadata=dataclass_config(exclude=lambda f: f is None)
)
multichannel: Optional[bool] = field(
default=None, metadata=dataclass_config(exclude=lambda f: f is None)
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/rest/direct_invocation/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def main():

# STEP 2 Call the transcribe_url method on the prerecorded class
options: PrerecordedOptions = PrerecordedOptions(
model="nova-2",
model="nova-3",
smart_format=True,
summarize="v2",
)
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/websocket/direct_invocation/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ def on_error(self, error, **kwargs):
liveClient.on(LiveTranscriptionEvents.Error, on_error)

# connect to websocket
options: LiveOptions = LiveOptions(model="nova-2", language="en-US")
options: LiveOptions = LiveOptions(model="nova-3", language="en-US")

if liveClient.start(options) is False:
print("Failed to connect to Deepgram")
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/websocket/microphone_inheritance/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ def main():
liveClient: MyLiveClient = MyLiveClient(ClientOptionsFromEnv())

options: LiveOptions = LiveOptions(
model="nova-2",
model="nova-3",
punctuate=True,
language="en-US",
encoding="linear16",
Expand Down
2 changes: 1 addition & 1 deletion examples/advanced/websocket/mute-microphone/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def on_error(self, error, **kwargs):
dg_connection.on(LiveTranscriptionEvents.Error, on_error)

options: LiveOptions = LiveOptions(
model="nova-2",
model="nova-3",
punctuate=True,
language="en-US",
encoding="linear16",
Expand Down
10 changes: 5 additions & 5 deletions examples/analyze/intent/conversation.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Thanks to ChatGPT and the advent of the LLM era, the conversational AI tech stac

While these AI agents hold immense potential, many customers have expressed their dissatisfaction with the current crop of voice AI vendors, citing roadblocks related to speed, cost, reliability, and conversational quality. That’s why we’re excited to introduce our own text-to-speech (TTS) API, Deepgram Aura, built for real-time, conversational voice AI agents.

Whether used on its own or in conjunction with our industry-leading Nova-2 speech-to-text API, we’ll soon provide developers with a complete speech AI platform, giving them the essential building blocks they need to build high throughput, real-time AI agents of the future.
Whether used on its own or in conjunction with our industry-leading Nova-3 speech-to-text API, we’ll soon provide developers with a complete speech AI platform, giving them the essential building blocks they need to build high throughput, real-time AI agents of the future.

We are thrilled about the progress our initial group of developers has made using Aura, so much so that we are extending limited access to a select few partners who will be free to begin integrating with Aura immediately. With their feedback, we’ll continue to enhance our suite of voices and API features, as well as ensure a smooth launch of their production-grade applications.

Expand Down Expand Up @@ -51,15 +51,15 @@ Here are some sample clips generated by one of the earliest iterations of Aura.

Our Approach
----------
For nearly a decade, we’ve worked tirelessly to advance the art of the possible in speech recognition and spoken language understanding. Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions. Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations.
For nearly a decade, we’ve worked tirelessly to advance the art of the possible in speech recognition and spoken language understanding. Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions. Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations.

And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency. With support for dozens of languages and custom model training, our technical team has trained and deployed thousands of speech AI models (more than anybody else) which we operate and manage for our customers each day using our own computing infrastructure.
And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency. With support for dozens of languages and custom model training, our technical team has trained and deployed thousands of speech AI models (more than anybody else) which we operate and manage for our customers each day using our own computing infrastructure.

We also have our own in-house data labeling and data ops team with years of experience building bespoke workflows to record, store, and transfer vast amounts of audio in order to label it and continuously grow our bank of high-quality data (millions of hours and counting) used in our model training.

These combined experiences have made us experts in processing and modeling speech audio, especially in support of streaming use cases with our real-time STT models. Our customers have been asking if we could apply the same approach for TTS, and we can.

So what can you expect from Aura? Delivering the same market-leading value and performance as Nova-2 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones. And that’s really what end users need and what our customers have been asking us to build.
So what can you expect from Aura? Delivering the same market-leading value and performance as Nova-3 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones. And that’s really what end users need and what our customers have been asking us to build.

"Deepgram is a valued partner, providing our customers with high throughput speech-to-text that delivers unrivaled performance without tradeoffs between quality, speed, and cost. We're excited to see Deepgram extend their speech AI platform and bring this approach to the text-to-speech market." - Richard Dumas, VP AI Product Strategy at Five9

Expand All @@ -68,4 +68,4 @@ What's Next
----------
As we’ve discussed, scaled voice agents are a high throughput use case, and we believe their success will ultimately depend on a unified approach to audio, one that strikes the right balance between natural voice quality, responsiveness, and cost-efficiency. And with Aura, we’re just getting started. We’re looking forward to continuing to work with customers like Asurion and partners like Five9 across speech-to-text AND text-to-speech as we help them define the future of AI agents, and we invite you to join us on this journey.

We expect to release generally early next year, but if you’re working on any real-time AI agent use cases, join our waitlist today to jumpstart your development in production as we continue to refine our model and API features with your direct feedback.
We expect to release generally early next year, but if you’re working on any real-time AI agent use cases, join our waitlist today to jumpstart your development in production as we continue to refine our model and API features with your direct feedback.
10 changes: 5 additions & 5 deletions examples/analyze/legacy_dict_intent/conversation.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Thanks to ChatGPT and the advent of the LLM era, the conversational AI tech stac

While these AI agents hold immense potential, many customers have expressed their dissatisfaction with the current crop of voice AI vendors, citing roadblocks related to speed, cost, reliability, and conversational quality. That’s why we’re excited to introduce our own text-to-speech (TTS) API, Deepgram Aura, built for real-time, conversational voice AI agents.

Whether used on its own or in conjunction with our industry-leading Nova-2 speech-to-text API, we’ll soon provide developers with a complete speech AI platform, giving them the essential building blocks they need to build high throughput, real-time AI agents of the future.
Whether used on its own or in conjunction with our industry-leading Nova-3 speech-to-text API, we’ll soon provide developers with a complete speech AI platform, giving them the essential building blocks they need to build high throughput, real-time AI agents of the future.

We are thrilled about the progress our initial group of developers has made using Aura, so much so that we are extending limited access to a select few partners who will be free to begin integrating with Aura immediately. With their feedback, we’ll continue to enhance our suite of voices and API features, as well as ensure a smooth launch of their production-grade applications.

Expand Down Expand Up @@ -51,15 +51,15 @@ Here are some sample clips generated by one of the earliest iterations of Aura.

Our Approach
----------
For nearly a decade, we’ve worked tirelessly to advance the art of the possible in speech recognition and spoken language understanding. Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions. Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations.
For nearly a decade, we’ve worked tirelessly to advance the art of the possible in speech recognition and spoken language understanding. Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions. Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations.

And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency. With support for dozens of languages and custom model training, our technical team has trained and deployed thousands of speech AI models (more than anybody else) which we operate and manage for our customers each day using our own computing infrastructure.
And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency. With support for dozens of languages and custom model training, our technical team has trained and deployed thousands of speech AI models (more than anybody else) which we operate and manage for our customers each day using our own computing infrastructure.

We also have our own in-house data labeling and data ops team with years of experience building bespoke workflows to record, store, and transfer vast amounts of audio in order to label it and continuously grow our bank of high-quality data (millions of hours and counting) used in our model training.

These combined experiences have made us experts in processing and modeling speech audio, especially in support of streaming use cases with our real-time STT models. Our customers have been asking if we could apply the same approach for TTS, and we can.

So what can you expect from Aura? Delivering the same market-leading value and performance as Nova-2 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones. And that’s really what end users need and what our customers have been asking us to build.
So what can you expect from Aura? Delivering the same market-leading value and performance as Nova-3 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones. And that’s really what end users need and what our customers have been asking us to build.

"Deepgram is a valued partner, providing our customers with high throughput speech-to-text that delivers unrivaled performance without tradeoffs between quality, speed, and cost. We're excited to see Deepgram extend their speech AI platform and bring this approach to the text-to-speech market." - Richard Dumas, VP AI Product Strategy at Five9

Expand All @@ -68,4 +68,4 @@ What's Next
----------
As we’ve discussed, scaled voice agents are a high throughput use case, and we believe their success will ultimately depend on a unified approach to audio, one that strikes the right balance between natural voice quality, responsiveness, and cost-efficiency. And with Aura, we’re just getting started. We’re looking forward to continuing to work with customers like Asurion and partners like Five9 across speech-to-text AND text-to-speech as we help them define the future of AI agents, and we invite you to join us on this journey.

We expect to release generally early next year, but if you’re working on any real-time AI agent use cases, join our waitlist today to jumpstart your development in production as we continue to refine our model and API features with your direct feedback.
We expect to release generally early next year, but if you’re working on any real-time AI agent use cases, join our waitlist today to jumpstart your development in production as we continue to refine our model and API features with your direct feedback.
Loading