diff --git a/README.md b/README.md index 5100c7cf..17dae665 100755 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ The best way to get started with Riva is to start with the tutorials. -## Tutorials +## Tutorials | Domain | Tutorial | Key Words | Github URL | |--------|----------|-----------|------------| @@ -15,12 +15,12 @@ The best way to get started with Riva is to start with the tutorials. | ASR | How to pretrain a Riva ASR Language Modeling (n-gram) with TAO Toolkit | ASR, Customization, Language Model pretraining, n-gram, TAO Toolkit | [Riva ASR - Customization - Language Model (n-gram) pretraining with TAO Toolkit](asr-python-advanced-tao-ngram-pretrain.ipynb) | | ASR | How to fine-tune a Riva ASR Acoustic Model (Citrinet) with TAO Toolkit | ASR, Customization, Acoustic Model fine-tuning, Citrinet, TAO Toolkit | [Riva ASR - Customization - Acoustic Model (Citrinet) fine-tuning with TAO Toolkit](asr-python-advanced-finetune-am-citrinet-tao-finetuning.ipynb) | | ASR | How to deploy custom Acoustic Model (Citrinet) trained with TAO Toolkit on Riva | ASR, Customization, Acoustic Model deployment, Citrinet | [Riva ASR - Customization - Acoustic Model (Citrinet) deployment on Riva](asr-python-advanced-finetune-am-citrinet-tao-deployment.ipynb) | -| ASR | The Making of RIVA German ASR Service | ASR, New Language Adaptation, German | [Riva ASR - German](New-language-adaptation/German) | -| ASR | The Making of RIVA Hindi ASR Service | ASR, New Language Adaptation, Hindi | [Riva ASR - Hindi](New-language-adaptation/Hindi) | -| ASR | The Making of RIVA Mandarin ASR Service | ASR, New Language Adaptation, Mandarin | [Riva ASR - Mandarin](New-language-adaptation/Mandarin) | +| ASR | The Making of RIVA German ASR Service | ASR, New Language Adaptation, German | [Riva ASR - German](New-language-adaptation/German) | +| ASR | The Making of RIVA Hindi ASR Service | ASR, New Language Adaptation, Hindi | [Riva ASR - Hindi](New-language-adaptation/Hindi) | +| ASR | The Making of RIVA Mandarin ASR Service | ASR, New Language Adaptation, Mandarin | [Riva ASR - Mandarin](New-language-adaptation/Mandarin) | | TTS | How do I use Riva TTS APIs with out-of-the-box models? | TTS, API Basics | [Riva TTS - API Basics](tts-python-basics.ipynb) | | TTS | How do I customize Riva TTS audio output with SSML? | TTS, Customization, SSML, Pitch, Rate, Pronunciation | [Riva TTS - Customization - Customization with SSML](tts-python-advanced-customization-with-ssml.ipynb) | -| TTS | How to train Riva TTS models (FastPitch and HiFiGAN) with TAO Toolkit | TTS, Customization, FastPitch, HiFiGAN, Training, TAO Toolkit | [Riva TTS - Customization - FastPitch and HiFiGAN training with TAO Toolkit](tts-python-advanced-pretrain-tts-tao-training.ipynb) | +| TTS | How to train Riva TTS models (FastPitch and HiFiGAN) with TAO Toolkit | TTS, Customization, FastPitch, HiFiGAN, Training, TAO Toolkit | [Riva TTS - Customization - FastPitch and HiFiGAN training with TAO Toolkit](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/resources/texttospeech_notebook/files) | | TTS | How to Deploy a custom TTS Models (FastPitch and HiFi-GAN) trained with TAO Toolkit Riva | TTS, Customization, FastPitch, HiFiGAN, Deployment | [Riva TTS - Customization - FastPitch and HiFiGAN deployment on Riva](tts-python-advanced-pretrain-tts-tao-deployment.ipynb) | | Deploy | How to Deploy Riva at Scale on AWS with EKS | Deploy, AWS EKS | [Riva - Deploy - AWS EKS](deploy-eks.md) | @@ -30,41 +30,41 @@ The best way to get started with Riva is to start with the tutorials. This section covers the Requirements and Setup needed to run all Riva Tutorials. #### Requirements -Before you try running the NVIDIA Riva tutorials, ensure you meet the following requirements: -- [Python 3](https://www.python.org/download/releases/3.0/) +Before you try running the NVIDIA Riva tutorials, ensure you meet the following requirements: +- [Python 3](https://www.python.org/download/releases/3.0/) #### Setup -1. Clone the NVIDIA Riva tutorials repository. -``git clone https://github.com/nvidia-riva/tutorials.git`` +1. Clone the NVIDIA Riva tutorials repository. +``git clone https://github.com/nvidia-riva/tutorials.git`` ``cd tutorials`` -2. Create a Python virtual environment - We will be using this virtual environment to install all the depencies needed for Riva tutorials. +2. Create a Python virtual environment - We will be using this virtual environment to install all the depencies needed for Riva tutorials. ``python3 -m venv venv-riva-tutorials`` -3. Activate the Python virtual environment we just created. +3. Activate the Python virtual environment we just created. ``. venv-riva-tutorials/bin/activate`` -4. Install Jupyter notebook. -``pip3 install jupyter`` +4. Install Jupyter notebook. +``pip3 install jupyter`` -5. Create an IPython kernel - The Riva tutorials Jupyter notebooks will be using this kernel in the next step. +5. Create an IPython kernel - The Riva tutorials Jupyter notebooks will be using this kernel in the next step. ``ipython kernel install --user --name=venv-riva-tutorials`` -6. Start the Jupyter notebooks server. -``jupyter notebook --allow-root --port 8888`` -If you have a browser installed on your machine, the notebook should automatically open. If you do not have a browser, copy/paste the URL from the command. +6. Start the Jupyter notebooks server. +``jupyter notebook --allow-root --port 8888`` +If you have a browser installed on your machine, the notebook should automatically open. If you do not have a browser, copy/paste the URL from the command. Once you open a Riva tutorial notebook on a browser, choose the `venv-riva-tutorials` kernel by `Kernel` -> `Change kernel` -> `venv-riva-tutorials` ### Running the Riva Client #### Requirements -Before you try running the Riva client, ensure you meet the following requirements: +Before you try running the Riva client, ensure you meet the following requirements: - You have access and are logged into NVIDIA NGC. For step-by-step instructions, refer to the [NGC Getting Started Guide](https://docs.nvidia.com/ngc/ngc-overview/index.html#registering-activating-ngc-account). -- [Python 3](https://www.python.org/download/releases/3.0/) +- [Python 3](https://www.python.org/download/releases/3.0/) #### Setup -1. [Optional] If using the `venv-riva-tutorials` (or another) Python virtual environment, activate it. +1. [Optional] If using the `venv-riva-tutorials` (or another) Python virtual environment, activate it. ``. /venv-riva-tutorials/bin/activate`` 2. Install `nvidia-riva-client` using `pip`. diff --git a/tts-python-advanced-pretrain-tts-tao-training.ipynb b/tts-python-advanced-pretrain-tts-tao-training.ipynb deleted file mode 100644 index c6876ca4..00000000 --- a/tts-python-advanced-pretrain-tts-tao-training.ipynb +++ /dev/null @@ -1,779 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "# How to train Riva TTS models (FastPitch and HiFiGAN) with TAO Toolkit" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This tutorial walks you through the steps to train Riva TTS models (FastPitch and HiFiGAN) from scratch with LJSpeech dataset using TAO Toolkit." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## NVIDIA Riva Overview\n", - "\n", - "NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications that are customized for your use case and deliver real-time performance.
\n", - "Riva offers a rich set of speech and natural language understanding services such as:\n", - "\n", - "- Automated speech recognition (ASR)\n", - "- Text-to-Speech synthesis (TTS)\n", - "- A collection of natural language processing (NLP) services, such as named entity recognition (NER), punctuation, and intent classification.\n", - "\n", - "In this tutorial, we will customize the Riva TTS pipeline by training Riva TTS models with NVIDIA's TAO Toolkit. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## NVIDIA TAO Toolkit Overview\n", - "\n", - "NVIDIA Train Adapt Optimize (TAO) Toolkit is a python-based AI toolkit for transfer learning that takes purpose-built pre-trained AI models and customizes them on your own data. TAO enables developers with limited AI expertise to create highly accurate AI models for production deployments. \n", - "TAO follows zero coding paradigm. There is no need to write any code to train models with TAO. Training can be done by just running a few commands with the TAO command-line interface. \n", - "\n", - "Riva supports fine-tuning with TAO. The fine-tuned TAO model can easily be deployed for real-time inference on the Riva Speech Skills server.\n", - "\n", - "For more information about the NVIDIA TAO framework, refer to the documentation [here](https://docs.nvidia.com/tao/tao-toolkit/text/overview.html).\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Text to Speech" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Text to Speech (TTS) is often the last step in building a conversational AI model. A TTS model converts text into audible speech. The main objective is to synthesize reasonable and natural speech for given text. Since there are no universal standards to measure quality of synthesized speech, you will need to listen to some inferred speech to tell whether a TTS model is well trained.\n", - "\n", - "TTS consists of two models: [FastPitch](https://arxiv.org/pdf/2006.06873.pdf) and [HiFi-GAN](https://arxiv.org/pdf/2010.05646.pdf).\n", - "\n", - "* FastPitch is spectrogram model generates a Mel spectrogram from text input\n", - "* HiFiGAN is a vocoder model to generate an audio output from the Mel spectrograms generated using FastPitch" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "## TTS using TAO\n", - "\n", - "In this tutorial, we will train RIVA TTS models (FastPitch and HiFiGAN) on LJSpeech from scratch." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Installing and setting up TAO\n", - "\n", - "Install TAO inside a Python virtual environment. We recommend performing this step first and then launching the tutorial from the virtual environment.\n", - "\n", - "In addition to installing the TAO Python package, ensure you meet the following software requirements:\n", - "\n", - "1. `python` 3.8.13\n", - "2. `docker-ce` > 19.03.5\n", - "3. `docker-API` 1.40\n", - "4. `nvidia-container-toolkit` > 1.3.0-1\n", - "5. `nvidia-container-runtime` > 3.4.0-1\n", - "6. `nvidia-docker2` > 2.5.0-1\n", - "7. `nvidia-driver` >= 470.57" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Installing TAO is a simple `pip` install." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "! pip install nvidia-tao" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "After installing TAO, the next step is to setup the mounts for TAO. The TAO launcher uses Docker containers under the hood, and **for our data and results directory to be visible to Docker, they need to be mapped**. The launcher can be configured using the config file `~/.tao_mounts.json`. Apart from the mounts, you can also configure additional options like the environment variables and the amount of shared memory available to the TAO launcher.
\n", - "\n", - "Replace the `FIXME` variables with the required paths enclosed in `\"\"` as a string.\n", - "\n", - "`IMPORTANT NOTE:` The following code creates a sample `~/.tao_mounts.json` file. Here, we can map directories in which we save the data, specs, results, and cache. You should configure it for your specific use case so these directories are correctly visible to the Docker container." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# please define these paths on your local host machine\n", - "import os\n", - "\n", - "os.environ[\"HOST_DATA_DIR\"] = FIXME\n", - "os.environ[\"HOST_SPECS_DIR\"] = FIXME\n", - "os.environ[\"HOST_RESULTS_DIR\"] = FIXME" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "! mkdir -p $HOST_DATA_DIR\n", - "! mkdir -p $HOST_SPECS_DIR\n", - "! mkdir -p $HOST_RESULTS_DIR" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Mapping up the local directories to the TAO docker.\n", - "import json\n", - "import os\n", - "mounts_file = os.path.expanduser(\"~/.tao_mounts.json\")\n", - "tao_configs = {\n", - " \"Mounts\":[\n", - " {\n", - " \"source\": os.environ[\"HOST_DATA_DIR\"],\n", - " \"destination\": \"/data\"\n", - " },\n", - " {\n", - " \"source\": os.environ[\"HOST_SPECS_DIR\"],\n", - " \"destination\": \"/specs\"\n", - " },\n", - " {\n", - " \"source\": os.environ[\"HOST_RESULTS_DIR\"],\n", - " \"destination\": \"/results\"\n", - " },\n", - " {\n", - " \"source\": os.path.expanduser(\"~/.cache\"),\n", - " \"destination\": \"/root/.cache\"\n", - " }\n", - " ],\n", - " \"DockerOptions\": {\n", - " \"shm_size\": \"16G\",\n", - " \"ulimits\": {\n", - " \"memlock\": -1,\n", - " \"stack\": 67108864\n", - " }\n", - " }\n", - "}\n", - "# Writing the mounts file.\n", - "with open(mounts_file, \"w\") as mfile:\n", - " json.dump(tao_configs, mfile, indent=4)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can check the Docker image versions and the tasks that it performs. You can also check by issuing `tao --help` or:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "! tao info --verbose" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Set Relevant Paths" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# NOTE: The following paths are set from the perspective of the TAO Docker.\n", - "\n", - "# The data is saved here:\n", - "DATA_DIR = \"/data\"\n", - "SPECS_DIR = \"/specs\"\n", - "RESULTS_DIR = \"/results\"\n", - "\n", - "# Set your encryption key and use the same key for all commands:\n", - "KEY = 'tlt_encode'" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The command structure for the TAO interface can be broken down as follows: `tao `
\n", - "\n", - "Let's see this in further detail." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "### Downloading Specs\n", - "TAO's conversational AI toolkit works off of spec files which make it easy to edit hyperparameters on the fly. We can proceed to downloading the spec files. You may choose to modify/rewrite these specs or even individually override them through the launcher. You can download the default spec files by using the `download_specs` command.
\n", - "\n", - "The `-o` argument indicates the folder where the default specification files will be downloaded. The `-r` argument instructs the script on where to save the logs. **Ensure the `-o` points to an empty folder.**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# download spec files for FastPitch\n", - "! tao spectro_gen download_specs \\\n", - " -r $RESULTS_DIR/spectro_gen \\\n", - " -o $SPECS_DIR/spectro_gen" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# download spec files for HiFiGAN\n", - "! tao vocoder download_specs \\\n", - " -r $RESULTS_DIR/vocoder \\\n", - " -o $SPECS_DIR/vocoder" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Download Data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this tutorial we will use the popular LJSpeech dataset. Let's download it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "! wget -O $HOST_DATA_DIR/ljspeech.tar.bz2 https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "After downloading, untar the dataset and move it to the correct directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "! tar -xvf $HOST_DATA_DIR/ljspeech.tar.bz2\n", - "! rm -rf $HOST_DATA_DIR/ljspeech\n", - "! mv LJSpeech-1.1 $HOST_DATA_DIR/ljspeech" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Using your own dataset\n", - "\n", - "If you want to use your own dataset, you'll have to organize your own dataset following the LJSpeech format." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Pre-Processing" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This step downloads audio to text file lists from NVIDIA for LJSpeech and generates the manifest files. If you use your own dataset, you'll have to generate three files: `ljs_audio_text_train_filelist.txt`, `ljs_audio_text_val_filelist.txt`, and `ljs_audio_text_test_filelist.txt` yourself. Those files correspond to your train `/ val /` test split. For each text file, the number of rows should be equal to the number of samples in this split. Each row should look similar to:\n", - "\n", - "```\n", - "DUMMY/.wav|\n", - "```\n", - "\n", - "An example row is:\n", - "\n", - "```\n", - "DUMMY/LJ045-0096.wav|Mrs. De Mohrenschildt thought that Oswald,\n", - "```\n", - "\n", - "After having those three files in your `data_dir`, run the following command as you would do for the LJSpeech dataset:\n", - "\n", - "Be patient! This step can take several minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "! tao spectro_gen dataset_convert \\\n", - " -e $SPECS_DIR/spectro_gen/dataset_convert_ljs.yaml \\\n", - " -r $RESULTS_DIR/spectro_gen/dataset_convert \\\n", - " data_dir=$DATA_DIR/ljspeech \\\n", - " dataset_name=ljspeech" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Training " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The TAO interface enables you to configure the training parameters from the command-line interface.
\n", - "\n", - "The process of opening the training script, finding the parameters of interest (which might be spread across multiple files), and making the changes needed, is being replaced by a simple command-line interface.\n", - "\n", - "For example, if the number of epochs are needed to be modified along with a change in the learning rate, you can add `trainer.max_epochs=10` and `optim.lr=0.02` and train the model. Sample commands are given below.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For training TTS models in TAO, we use the `tao spectro_gen train` and `tao vocoder train` commands with the following arguments:\n", - "
    \n", - "
  • `-e`: Path to the spec file
  • \n", - "
  • `-g`: Number of GPUs to use
  • \n", - "
  • `-r`: Path to the results folder
  • \n", - "
  • `-k`: User specified encryption key to use while saving/loading the model
  • \n", - "
  • Any overrides to the spec file. For example, `trainer.max_epochs`.
  • \n", - "
\n", - "\n", - "NOTE: In order to get a TTS pipeline, you need to train **BOTH** FastPitch (`spectro_gen`) and HiFi-GAN (vocoder). For HiFi-GAN, since it's universal for a specific language, you might just download the pretrained weights from NGC and it will give you good performance." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Training FastPitch" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Prior is needed for FastPitch training. If an empty folder is provided, prior will generate on-the-fly.\n", - "! mkdir -p $RESULTS_DIR/spectro_gen/train/prior_folder" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you provided an empty prior folder, this may take some time." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!tao spectro_gen train \\\n", - " -e $SPECS_DIR/spectro_gen/train.yaml \\\n", - " -g 1 \\\n", - " -k $KEY \\\n", - " -r $RESULTS_DIR/spectro_gen/train \\\n", - " train_dataset=$DATA_DIR/ljspeech/ljspeech_train.json \\\n", - " validation_dataset=$DATA_DIR/ljspeech/ljspeech_val.json \\\n", - " prior_folder=$RESULTS_DIR/spectro_gen/train/prior_folder \\\n", - " trainer.max_epochs=5" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Training HiFi-GAN" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Instead of passing `trainer.max_epochs`, HiFi-GAN requires the definition of `trainer.max_steps`. Defining `trainer.max_epochs` for HiFi-GAN has no effect." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!tao vocoder train \\\n", - " -e $SPECS_DIR/vocoder/train.yaml \\\n", - " -g 1 \\\n", - " -k $KEY \\\n", - " -r $RESULTS_DIR/vocoder/train \\\n", - " train_dataset=$DATA_DIR/ljspeech/ljspeech_train.json \\\n", - " validation_dataset=$DATA_DIR/ljspeech/ljspeech_val.json \\\n", - " trainer.max_steps=10000" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### TTS model export" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "With TAO, you can also export your model in a format that can deployed using NVIDIA Riva; a highly performant application framework for multi-modal conversational AI services using GPUs. The same command for exporting to ONNX can be used here. The only small variation is the configuration for `export_format` in the spec file." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Export to RIVA" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!tao spectro_gen export \\\n", - " -e $SPECS_DIR/spectro_gen/export.yaml \\\n", - " -g 1 \\\n", - " -k $KEY \\\n", - " -m $RESULTS_DIR/spectro_gen/train/checkpoints/trained-model.tlt \\\n", - " -r $RESULTS_DIR/spectro_gen/export \\\n", - " export_format=RIVA \\\n", - " export_to=spectro_gen.riva" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!tao vocoder export \\\n", - " -e $SPECS_DIR/vocoder/export.yaml \\\n", - " -g 1 \\\n", - " -k $KEY \\\n", - " -m $RESULTS_DIR/vocoder/train/checkpoints/trained-model.tlt \\\n", - " -r $RESULTS_DIR/vocoder/export \\\n", - " export_format=RIVA \\\n", - " export_to=vocoder.riva" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Export to ONNX (Export to ONNX is not needed for RIVA)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!tao spectro_gen export \\\n", - " -e $SPECS_DIR/spectro_gen/export.yaml \\\n", - " -g 1 \\\n", - " -k $KEY \\\n", - " -m $RESULTS_DIR/spectro_gen/train/checkpoints/trained-model.tlt \\\n", - " -r $RESULTS_DIR/spectro_gen/export \\\n", - " export_format=ONNX \\\n", - " export_to=spectro_gen.eonnx" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!tao vocoder export \\\n", - " -e $SPECS_DIR/vocoder/export.yaml \\\n", - " -g 1 \\\n", - " -k $KEY \\\n", - " -m $RESULTS_DIR/vocoder/train/checkpoints/trained-model.tlt \\\n", - " -r $RESULTS_DIR/vocoder/export \\\n", - " export_format=ONNX \\\n", - " export_to=vocoder.eonnx" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "## TTS Inference with TAO Toolkit\n", - "\n", - "In this section, we are going to run inference on the trained TTS models. As previously mentioned, since there are no universal standards to measure quality of synthesized speech, you will need to listen to some inferred speech to tell whether a TTS model is well trained. Therefore, we do not provide `evaluate` functionality in TAO Toolkit for TTS but only provide `infer` functionality.\n", - "\n", - "The inference in the following cells is not optimized for real-time performance. For real-time inference and best latency, you should deploy this model using RIVA. Refer to the [How to deploy custom TTS models (FastPitch and HiFiGAN) trained with TAO Toolkit on Riva](tts-python-advanced-pretrain-tts-tao-deployment.ipynb) tutorial." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### TTS Inference with TLT checkpoint\n", - "\n", - "In this section, we will run inference on the `.tlt` checkpoint trained with TAO Toolkit." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Generate spectrogram\n", - "\n", - "The first step for inference is generating a spectrogram. That's a NumPy array (saved as `.npy` file) for a sentence which can be converted to voice by a vocoder. We use the FastPitch model we just trained to generate a spectrogram.\n", - "\n", - "You may have to work with the `infer.yaml` file to set the texts you want for inference." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!tao spectro_gen infer \\\n", - " -e $SPECS_DIR/spectro_gen/infer.yaml \\\n", - " -g 1 \\\n", - " -k $KEY \\\n", - " -m $RESULTS_DIR/spectro_gen/train/checkpoints/trained-model.tlt \\\n", - " -r $RESULTS_DIR/spectro_gen/infer \\\n", - " output_path=$RESULTS_DIR/spectro_gen/infer/spectro" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Generate sound file\n", - "\n", - "The second step for inference is generating a `.wav` sound file based on a spectrogram you generated in the previous step." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!tao vocoder infer \\\n", - " -e $SPECS_DIR/vocoder/infer.yaml \\\n", - " -g 1 \\\n", - " -k $KEY \\\n", - " -m $RESULTS_DIR/vocoder/train/checkpoints/trained-model.tlt \\\n", - " -r $RESULTS_DIR/vocoder/infer \\\n", - " input_path=$RESULTS_DIR/spectro_gen/infer/spectro \\\n", - " output_path=$RESULTS_DIR/vocoder/infer/wav" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import IPython.display as ipd\n", - "# change path of the file here\n", - "ipd.Audio(os.environ[\"HOST_RESULTS_DIR\"] + '/vocoder/infer/wav/0.wav')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Debug\n", - "\n", - "If the above sound file does not have good quality, you probably need to first figure out whether it's a FastPitch or HiFi-GAN problem. Then, retrain or fine-tune the problematic model. For this purpose, you can download pre-trained HiFi-GAN from NVIDIA NGC and (1) generate the spectrogram with your trained FastPitch (2) generate the `.wav` file with NVIDIA pretrained HiFi-GAN. If the `.wav` file generated in this manner is good, you know your HiFi-GAN is not well-trained. Otherwise, the problem is with FastPitch." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### TTS Inference using ONNX" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "TAO Toolkti also provides the capability to run inference with the exported `.eonnx` model. The commands are very similar to the inference command for `.tlt` models. Again, the inputs in the spec file used is just for demo purposes, you may choose to try out your custom input." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Generate spectrogram\n", - "\n", - "The first step for inference is generating a spectrogram. That's a NumPy array (saved as a `.npy` file) for a sentence which can be converted to voice by a vocoder. We use the FastPitch model we just trained to generate a spectrogram.\n", - "\n", - "You may have to work with the `infer.yaml` file to set the texts you want for inference." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!tao spectro_gen infer_onnx \\\n", - " -e $SPECS_DIR/spectro_gen/infer.yaml \\\n", - " -g 1 \\\n", - " -k $KEY \\\n", - " -m $RESULTS_DIR/spectro_gen/export/spectro_gen.eonnx \\\n", - " -r $RESULTS_DIR/spectro_gen/infer_onnx \\\n", - " output_path=$RESULTS_DIR/spectro_gen/infer_onnx/spectro" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Generate the Sound File\n", - "\n", - "The second step for inference is generating a `.wav` sound file based on the spectrogram you generated in the previous step." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!tao vocoder infer_onnx \\\n", - " -e $SPECS_DIR/vocoder/infer.yaml \\\n", - " -g 1 \\\n", - " -k $KEY \\\n", - " -m $RESULTS_DIR/vocoder/export/vocoder.eonnx \\\n", - " -r $RESULTS_DIR/vocoder/infer_onnx \\\n", - " input_path=$RESULTS_DIR/spectro_gen/infer_onnx/spectro \\\n", - " output_path=$RESULTS_DIR/vocoder/infer_onnx/wav" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If everything works properly, the `.wav` file below should sound exactly the same as the `.wav` file in the previous section." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import IPython.display as ipd\n", - "# change path of the file here\n", - "ipd.Audio(os.environ[\"HOST_RESULTS_DIR\"] + '/vocoder/infer_onnx/wav/0.wav')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## What's Next?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can use TAO to build custom models for your own applications, or you could [deploy the custom model to NVIDIA Riva](tts-python-advanced-pretrain-tts-tao-deployment.ipynb)" - ] - } - ], - "metadata": { - "interpreter": { - "hash": "741d73fab70d7eb29e7b56260ebaa567f0620f4d2780830ca385f600e5120e14" - }, - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.10" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tts-python-advanced-pretrain-tts-tao-deployment.ipynb b/tts-python-tao-deployment.ipynb similarity index 52% rename from tts-python-advanced-pretrain-tts-tao-deployment.ipynb rename to tts-python-tao-deployment.ipynb index af3a220e..71c543aa 100644 --- a/tts-python-advanced-pretrain-tts-tao-deployment.ipynb +++ b/tts-python-tao-deployment.ipynb @@ -4,52 +4,28 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\n", - "\n", - "# How to Deploy a custom TTS Models (FastPitch and HiFi-GAN) trained with TAO Toolkit Riva" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This tutorial walks you through the steps to deploy custom TTS models (FastPitch and HiFiGAN) trained with TAO Toolkit on RIVA for real-time inference." + "# TAO - TTS FastPitch/HiFi-GAN Riva Deployment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## NVIDIA Riva Overview\n", - "\n", - "NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications that are customized for your use case and deliver real-time performance.
\n", - "Riva offers a rich set of speech and natural language understanding services such as:\n", + "[Train Adapt Optimize (TAO) Toolkit](https://developer.nvidia.com/tao-toolkit) provides the capability to export your model in a format that can deployed using [NVIDIA Riva](https://developer.nvidia.com/riva), a highly performant application framework for multi-modal conversational AI services using GPUs. \n", "\n", - "- Automated speech recognition (ASR)\n", - "- Text-to-Speech synthesis (TTS)\n", - "- A collection of natural language processing (NLP) services, such as named entity recognition (NER), punctuation, and intent classification.\n", - "\n", - "In this tutorial, we will deploy Riva TTS models (FastPitch and HiFiGAN) trained with TAO Toolkit on RIVA.\n", - "To understand the basics of Riva TTS APIs, refer to [How do I use Riva TTS APIs with out-of-the-box models?](https://github.com/nvidia-riva/tutorials/blob/dev/22.04/tts-python-basics.ipynb).
\n", - "\n", - "For more information about Riva, refer to the [Riva developer documentation](https://developer.nvidia.com/riva)." + "This tutorial explores taking 2 .riva models, the result of `tao spectro_gen` and `tao vocoder` commands, and leveraging the Riva ServiceMaker framework to aggregate all the necessary artifacts for Riva deployment to a target environment. Once the models are deployed in Riva, you can issue inference requests to the server. We will demonstrate how quick and straightforward this whole process is. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Train, Adapt and Optimize (TAO) Toolkit\n", - "\n", - "[Train Adapt Optimize (TAO) Toolkit](https://developer.nvidia.com/tao-toolkit) provides the capability to export your model in a format that can deployed using [NVIDIA Riva](https://developer.nvidia.com/riva), a highly performant application framework for multi-modal conversational AI services using GPUs.\n", - "\n", - "This tutorial explores taking 2 `.riva` models, the result of `tao spectro_gen` and `tao vocoder` commands ([finetune notebook](tts-python-advanced-pretrain-tts-tao-training.ipynb)), and leveraging the Riva ServiceMaker framework to aggregate all the necessary artifacts for Riva deployment to a target environment. Once the model is deployed in Riva, you can issue inference requests to the server. We will demonstrate how quick and straightforward this whole process is.\n", - "\n", - "In this notebook, you will learn how to:\n", - "\n", - "* Use Riva ServiceMaker to take TAO exported `.riva` files and convert it to `.rmir`\n", - "* Deploy the model(s) locally on the Riva server\n", - "* Send inference requests from a demo client using Riva API bindings" + "---\n", + "## Learning Objectives\n", + "In this notebook, you will learn how to: \n", + "- Use Riva ServiceMaker to take a TAO exported .riva and convert it to .rmir\n", + "- Deploy the model(s) locally on the Riva Server\n", + "- Send inference requests from a demo client using Riva API bindings." ] }, { @@ -57,34 +33,16 @@ "metadata": {}, "source": [ "---\n", - "## Speech generation with Riva TTS APIs\n", - "\n", - "The Riva TTS service is based on a two-stage pipeline: Riva first generates a mel spectrogram using the first model (FastPitch), then generates speech using the second model (HiFiGAN). This pipeline forms a text-to-speech system that enables you to synthesize natural sounding speech from raw transcripts without any additional information such as patterns or rhythms of speech.\n", - "\n", - "Refer to the [Riva TTS documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-overview.html) for more information." + "## Pre-requisites" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Requirements and setup\n", - "\n", - "Before we get started, ensure you have:\n", - "\n", - "- Installed `docker-ce` to instantiate the riva docker containers\n", - "- access to NVIDIA NGC and are able to download the Riva Quick Start [resources](https://ngc.nvidia.com/catalog/resources/nvidia:riva:riva_quickstart).\n", - "- a `.riva` model file that you want to deploy. You can obtain this from `tao export` (with `export_format=RIVA`). For more information on training and exporting a `.riva` model, refer to the [Speech Synthesis using TAO Toolkit]().\n", - "- Install numpy by running the cell below" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "! pip install numpy" + "To follow along, please make sure:\n", + "- You have access to NVIDIA NGC, and are able to download the Riva Quickstart [resources](https://ngc.nvidia.com/catalog/resources/nvidia:riva:riva_quickstart)\n", + "- Have a .riva model file that you wish to deploy. You can obtain this from `tao export` (with `export_format=RIVA`). Please refer the tutorial on *Speech Synthesis using Train Adapt Optimize (TAO) Toolkit* for more details on training and exporting a .riva model. The tutorial can be found at https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/resources/texttospeech_notebook" ] }, { @@ -93,7 +51,7 @@ "source": [ "---\n", "## Riva ServiceMaker\n", - "Riva ServiceMaker is a set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings)\n", + "Servicemaker is the set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings)\n", "for Riva deployment to a target environment. It has two main components:\n", "\n", "* `riva-build`\n", @@ -101,16 +59,29 @@ "\n", "### Riva-build\n", "\n", - "This step helps build a Riva-ready version of the model. It’s only output is an intermediate format (called an RMIR)\n", - "of an end-to-end pipeline for the supported services within Riva. Let's consider two TTS models.\n", + "This step helps build a Riva-ready version of the model. It’s only output is an intermediate format (called a RMIR)\n", + "of an end to end pipeline for the supported services within Riva. We are taking 2 TTS models in consideration -\n", "\n", "* [FastPitch](https://ngc.nvidia.com/catalog/models/nvidia:tao:speechsynthesis_english_fastpitch) (spectrogram generator)\n", - "* [HiFi-GAN](https://ngc.nvidia.com/catalog/models/nvidia:tao:speechsynthesis_hifigan) (vocoder)
\n", + "* [HiFi-GAN](https://ngc.nvidia.com/catalog/models/nvidia:tao:speechsynthesis_hifigan) (vocoder).
\n", "\n", - "`riva-build` is responsible for the combination of one or more exported models (`.riva` files) into a single file\n", - "containing an intermediate format called Riva Model Intermediate Representation (`.rmir`). This file contains a\n", + "`riva-build` is responsible for the combination of one or more exported models (.riva files) into a single file\n", + "containing an intermediate format called Riva Model Intermediate Representation (.rmir). This file contains a\n", "deployment-agnostic specification of the whole end-to-end pipeline along with all the assets required for the\n", - "final deployment and inference. For more information, refer to the [documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/service-tts.html#fastpitch-and-hifi-gan-pipeline-configuration)." + "final deployment and inference. Please checkout the [documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/service-tts.html#fastpitch-and-hifi-gan-pipeline-configuration) to find out more.\n", + "\n", + "### Riva-deploy\n", + "\n", + "The deployment tool takes as input one or more Riva Model Intermediate Representation (RMIR) files and\n", + "a target model repository directory. It creates an ensemble configuration specifying the pipeline for\n", + "the execution and finally writes all those assets to the output model repository directory." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For the purpose of this notebook, we will only be using the `riva-build` component." ] }, { @@ -119,7 +90,7 @@ "metadata": {}, "outputs": [], "source": [ - "# Important: Update these paths to point to the RIVA ServiceMaker docker and input models.\n", + "# IMPORTANT: UPDATE THESE PATHS \n", "\n", "# ServiceMaker Docker\n", "RIVA_SM_CONTAINER = \"\"\n", @@ -142,8 +113,8 @@ "metadata": {}, "outputs": [], "source": [ - "# Get the ServiceMaker docker\n", - "! docker pull $RIVA_SM_CONTAINER" + "# Download the auxillary files for RIVA to help enhance the quality of the audio output.\n", + "!ngc registry model download-version \"nvidia/tao/speechsynthesis_en_us_auxiliary_files:deployable_v1.0\" --dest $MODEL_LOC" ] }, { @@ -152,22 +123,26 @@ "metadata": {}, "outputs": [], "source": [ - "# Syntax: riva-build output-dir-for-rmir/model.rmir:key dir-for-riva/model.riva:key\n", - "! docker run --rm --gpus 0 -v $MODEL_LOC:/data $RIVA_SM_CONTAINER -- \\\n", - " riva-build speech_synthesis /data/tts.rmir:$KEY /data/$SPECTRO_GEN_MODEL_NAME:$KEY /data/$VOCODER_MODEL_NAME:$KEY" + "# Get the ServiceMaker docker\n", + "! docker pull $RIVA_SM_CONTAINER" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "### Riva-deploy\n", - "\n", - "The deployment tool takes as input one or more Riva Model Intermediate Representation (RMIR) files and\n", - "a target model repository directory. It creates an ensemble configuration specifying the pipeline for\n", - "the execution and finally writes all those assets to the output model repository directory.\n", - "\n", - "For the purpose of this tutorial, we will only be using the `riva-build` component." + "# For a multi-speaker model, please un-comment the command below and run the following command.\n", + "! mkdir -p $MODEL_LOC/rmir\n", + "! docker run --rm --gpus 0 -v $MODEL_LOC:/data $RIVA_SM_CONTAINER \\\n", + " riva-build speech_synthesis /data/rmir/new_speaker.rmir:$KEY \\\n", + " /data/$SPECTRO_GEN_MODEL_NAME:$KEY \\\n", + " /data/$VOCODER_MODEL_NAME:$KEY \\\n", + " --voice_name=new_speaker \\\n", + " --subvoices=ljspeech:0,new_voice:1 \\\n", + " --abbreviations_file=/data/speechsynthesis_en_us_auxiliary_files_vdeployable_v1.0/abbr.txt \\\n", + " --arpabet_file=/data/speechsynthesis_en_us_auxiliary_files_vdeployable_v1.0/cmudict-0.7b-nv0.01" ] }, { @@ -175,9 +150,9 @@ "metadata": {}, "source": [ "---\n", - "## Start the Riva Server\n", - "\n", - "Once the model repository is generated, we are ready to start the Riva server. From this step onwards you need to download the [Riva Quick Start resource](https://ngc.nvidia.com/catalog/resources/nvidia:riva:riva_quickstart) from NGC. Please follow the instructions [here] to download the Quick Start resource." + "## Start Riva Server\n", + "Once the model repository is generated, we are ready to start the Riva server. From this step onwards you need to download the Riva QuickStart Resource from NGC. \n", + "Set the path to the directory here:" ] }, { @@ -186,7 +161,7 @@ "metadata": {}, "outputs": [], "source": [ - "# Set the Riva Quick Start directory\n", + "# Set the Riva QuickStart directory\n", "RIVA_DIR = \"\"" ] }, @@ -194,11 +169,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Next, we modify the `config.sh` file to enable relevant Riva services (TTS for the FastPitch/HiFi-GAN models), provide the encryption key, and path to the model repository (`riva_model_loc`) generated in the previous step among other configurations. \n", + "Next, we modify config.sh to enable relevant Riva services (tts in this case for fastpitch/hifigan), provide the encryption key, and path to the model repository (`riva_model_loc`) generated in the previous step among other configurations. \n", "\n", - "For example, if above the model repository is generated at `$MODEL_LOC/models`, then you can specify `riva_model_loc` as the same directory as `MODEL_LOC`.
\n", + "For instance, if above the model repository is generated at `$MODEL_LOC/models`, then you can specify `riva_model_loc` as the same directory as `MODEL_LOC`
\n", "\n", - "Pretrained versions of models specified in `models_asr/nlp/tts` are fetched from NGC. Since we are using our custom model, we can comment it in `models_tts` (and any others that are not relevant to your use case).
" + "Pretrained versions of models specified in models_asr/nlp/tts are fetched from NGC. Since we are using our custom model, we can comment it in models_tts (and any others that are not relevant to your use case).
" ] }, { @@ -241,19 +216,14 @@ "# are inspected and optimized for deployment. The optimized versions are\n", "# stored in $riva_model_loc/models. The riva server exclusively uses these\n", "# optimized versions.\n", - "riva_model_loc=\"\" ## MAKE CHANGES HERE (Replace with MODEL_LOC)\n", - "\n", - "if [[ $riva_target_arch == \"arm64\" ]]; then\n", - " riva_model_loc=\"`pwd`/model_repository\"\n", - "fi\n", + "riva_model_loc=\"\" ## MAKE CHANGES HERE (Replace with MODEL_LOC) \n", "\n", "# The default RMIRs are downloaded from NGC by default in the above $riva_rmir_loc directory\n", "# If you'd like to skip the download from NGC and use the existing RMIRs in the $riva_rmir_loc\n", "# then set the below $use_existing_rmirs flag to true. You can also deploy your set of custom\n", "# RMIRs by keeping them in the riva_rmir_loc dir and use this quickstart script with the\n", "# below flag to deploy them all together.\n", - "use_existing_rmirs=false ## MAKE CHANGES HERE (Replace with true)\n", - "\n", + "use_existing_rmirs=false ## MAKE CHANGES HERE (Set to true)\n", "```" ] }, @@ -294,9 +264,9 @@ "source": [ "---\n", "## Run Inference\n", - "Once the Riva server is up-and-running with your models, you can send inference requests querying the server. \n", + "Once the Riva server is up and running with your models, you can send inference requests querying the server. \n", "\n", - "To send gRPC requests, you can install the Riva Python API bindings for the client by running the cell below. This is available as a `pip` [package](https://pypi.org/project/nvidia-riva-client/)." + "To send GRPC requests, you can install Riva Python API bindings for client. This is available as a pip .whl with the QuickStart.\n" ] }, { @@ -305,16 +275,16 @@ "metadata": {}, "outputs": [], "source": [ - "# Install the Client API Bindings\n", - "! pip install nvidia-riva-client" + "# Install client API bindings\n", + "! cd $RIVA_DIR && pip install nvidia-riva-client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Connect to the Riva Server and Run Inference\n", - "Now we can actually query the Riva server. The following cell queries the Riva server (using gRPC) to yield a result." + "### Connect to Riva server and run inference\n", + "Now we actually query the Riva server, let's get started. The following cell queries the riva server(using grpc) to yield a result." ] }, { @@ -323,32 +293,39 @@ "metadata": {}, "outputs": [], "source": [ + "import os\n", + "import soundfile\n", "import riva.client\n", "import IPython.display as ipd\n", "import numpy as np\n", "\n", - "server = \"localhost:50051\"\n", - "\n", + "server = \"localhost:50051\" # location of riva server\n", "auth = riva.client.Auth(uri=server)\n", - "client = riva.client.SpeechSynthesisService(auth)\n", + "tts_service = riva.client.SpeechSynthesisService(auth)\n", + "\n", + "\n", + "text = \"Is it recognize speech or wreck a nice beach?\"\n", + "language_code = \"en-US\" # currently required to be \"en-US\"\n", + "sample_rate_hz = 22050 # the desired sample rate\n", + "voice_name = \"new_speaker.new_voice\" # subvoice to generate the audio output.\n", + "data_type = np.int16 # For RIVA version < 1.10.0 please set this to np.float32\n", "\n", - "resp = client.synthesize(\n", - " text=\"Is it recognize speech or wreck a nice beach?\",\n", - " language_code=\"en-US\",\n", - " encoding=riva.client.AudioEncoding.LINEAR_PCM,\n", - " sample_rate_hz=22050,\n", - " # For a multispeaker model, please set uncomment the line below:\n", - " # voice_name = \"new_speaker.new_voice\",\n", - ")\n", - "audio_samples = np.frombuffer(resp.audio, dtype=np.int16)\n", - "ipd.Audio(audio_samples, rate=22050)" + "resp = tts_service.synthesize(text, voice_name=voice_name, language_code=language_code, sample_rate_hz=sample_rate_hz)\n", + "audio = resp.audio\n", + "meta = resp.meta\n", + "processed_text = meta.processed_text\n", + "predicted_durations = meta.predicted_durations\n", + "\n", + "audio_samples = np.frombuffer(resp.audio, dtype=data_type)\n", + "print(processed_text)\n", + "ipd.Audio(audio_samples, rate=sample_rate_hz)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "You can stop all Docker containers before shutting down the Jupyter kernel. **Caution: The following command will stop all running containers.**" + "You can stop all docker container before shutting down the jupyter kernel. **Caution: The following command will stop all running containers**" ] }, { @@ -362,9 +339,6 @@ } ], "metadata": { - "interpreter": { - "hash": "077e12def6a32a2831f4c2d38aa8beb0e767e75f30960b225b1e4928bdda737d" - }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", @@ -381,6 +355,11 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" + }, + "vscode": { + "interpreter": { + "hash": "36cf16204b8548560b1c020c4e8fb5b57f0e4c58016f52f2d4be01e192833930" + } } }, "nbformat": 4,