Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tutorial for ASR Adapter training #115

Merged
merged 36 commits into from
Jan 19, 2023
Merged
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
68cbdb0
Add asr adapter tutorial
titu1994 Jan 11, 2023
3004c38
Finalize tutorial notebook for adapters
titu1994 Jan 11, 2023
59d028d
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
a9e49b3
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
4f50a78
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
ff8ac21
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
b3ccb54
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
93e8b56
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
1caed00
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
0edeee1
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
b11e342
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
bcbb427
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
200304c
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
faf64af
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
2aa5086
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
636fa41
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
ebf3eb2
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
4d65ed8
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
9b36ad6
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
9930a06
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
75d69f8
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
fb307f9
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
b40f59e
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
81073e2
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
113ac21
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
813322a
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
89dc4d0
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
ee8bbc2
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
193e84e
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
c245701
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
91c4757
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
79766b3
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
f8a44d1
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
da29173
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
71d4777
Update asr-python-adapter-finetune-am-conformer-nemo.ipynb
svenchilton Jan 19, 2023
db44eba
Renamed Adapter fine-tuning tutorial, modified title header, added tr…
svenchilton Jan 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 181 additions & 0 deletions asr-finetune-conformer-ctc-adapter-nemo.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"http://developer.download.nvidia.com/notebooks/dlsw-notebooks/rivaasrasr-finetune-conformer-ctc-adapter-nemo/nvidia_logo.png\" style=\"width: 90px; float: right;\">\n",
"\n",
"# How to Customize a Riva ASR Acoustic Model (Conformer-CTC) with Adapters\n",
"This tutorial walks you through how to customize a Riva ASR acoustic model (Conformer-CTC) with Adapter modules, using NVIDIA NeMo."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## NVIDIA Riva Overview\n",
"\n",
"NVIDIA Riva is a GPU-accelerated SDK for building speech AI applications that are customized for your use case and deliver real-time performance. <br/>\n",
"Riva offers a rich set of speech and natural language understanding (NLU) services such as:\n",
"\n",
"- Automated speech recognition (ASR)\n",
"- Text-to-Speech synthesis (TTS)\n",
"- A collection of natural language processing (NLP) services, such as named entity recognition (NER), punctuation, and intent classification.\n",
"\n",
"In this tutorial, we will customize a Riva ASR acoustic model (Conformer) with Adapter modules, using NeMo. <br> \n",
"To understand the basics of Riva ASR APIs, refer to [Getting started with Riva ASR in Python](https://github.com/nvidia-riva/tutorials/blob/stable/asr-python-basics.ipynb). <br>\n",
"\n",
"For more information about Riva, refer to the [Riva developer documentation](https://developer.nvidia.com/riva)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Neural Module (NeMo)\n",
"NVIDIA Neural Module (NeMo) Toolkit is an open-source framework for building, training, and fine-tuning GPU-accelerated speech AI and NLU models with a simple Python interface. Developers, researchers, and software partners building intelligent conversational AI applications and services, can bring their own data to fine-tune pre-trained models instead of going through the hassle of training the models from scratch."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ASR with Adapters\n",
"\n",
"The following tutorial heavily references the [NeMo tutorial on ASR Domain Adaptation with Adapters](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb).\n",
"\n",
"We advise to keep both tutorials open side-by-side to refer to the contents effectively by using the **Table of Contents**."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What are Adapters?\n",
"\n",
"Adapters are trainable neural network modules that are attached to pretrained models, such that we freeze the weights of the original model and only train the adapter parameters. This reduces the amount of data required to customize a model substantially, while imposing the limitation that the model's vocabulary cannot be changed.\n",
"\n",
"In short,\n",
"\n",
"- Adapter modules form a residual bridge over the output of each layer they adapt, such that the model's original performance is not lost. \n",
"- The original parameters of the model are frozen in their entirety, so that we can recover the original model by disabling all adapters.\n",
"- We train only the new adapter parameters (an insignificant fraction of the total number of parameters). This allows fast experimentation with very little data and compute.\n",
"\n",
"-----\n",
"\n",
"Adapters are a straightforward concept, as shown in the following diagram. At their simplest, they are residual Feedforward layers that compress the input dimension ($D$) to a small bottleneck dimension ($H$), such that $R^D \\text{->} R^H$, compute an activation (such as ReLU), finally mapping $R^H \\text{->} R^D$ with another Feedforward layer. This output is then added to the input through a residual connection.\n",
"\n",
"<div align=\"center\">\n",
" <img src=\"https://mermaid.ink/img/pako:eNptkLFqwzAQhl9F3ORAPDSjA4EUx6RgXEjbycpwWOdG1JaMfEoakrx7ZcfpUKrlxH_fz4d0gcoqggTqxp6qAzoW76k0Ipx1-WI6z3sRxyuRF1GOZ3KisK6d3YG8GFdZ9hRJeLbMDRmqvkRGpDLrTuiUiEWUigBtlyIVqzBnEqZ66I39dcX6iKytKXeUf-wn-286QoFeBMvmu0PTD-EfyXaQpP9JFmP_1XN4S3kfD8W4ue6o18pjc52gYQlzaMm1qFX4msuQSOADtSQhCdfaOupZgjS3QPpOIdNGabYOkhqbnuaAnu3b2VSQsPP0gFKNnw7bibr9AJkZdXU\" height=100% />\n",
"</div>\n",
"\n",
"-----"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Advantages and Limitations of Adapter Training\n",
"\n",
"Adapters can be used with limited amounts of training data and compute budget, however, they impose the restriction that the model's original vocabulary must be used. Therefore, a new character set vocabulary/tokenizer cannot be used.\n",
"\n",
"Refer to the **Advantages of Adapters** and **Limitations of Adapters** sections in the [NeMo tutorial](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb) for further details."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Preparing the Acoustic Encoder for Adapter Training\n",
"\n",
"Pre-trained models do not automatically support Adapter modules, so we must prepare the acoustic encoder prior to creating the model.\n",
"The steps for this procedure are detailed in the **Prepare the \"base\" model** section of the [NeMo tutorial](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Preparing the Model and Dataset for Adaptation\n",
"\n",
"There are very few differences between fine-tuning and adapter training with respect to the preparation of the model, data loaders, or optimization method.\n",
"\n",
"One significant difference is that there is no need to change the vocabulary of the pre-trained network, therefore, there is no need to construct a new tokenizer or update the character vocabulary. If your customization dataset contains tokens that requires such changes (for example tokens that don't exist in the model), then you will need to either preprocess the text to remove such tokens, or instead perform fine-tuning (which may require substantially more data).\n",
"\n",
"Another minor difference is the use of spectrogram augmentation (SpecAugment). During fine-tuning, the amount of data is very little compared to when the model was pre-trained with thousands of hours of speech, so SpecAugment may have been used to prevent overfitting. For adapters, we recommend first disabling spec augmentation and trying to see if results are good enough, then adding it back slowly, as needed, to see if any improvements can be obtained.\n",
"\n",
"Refer to the **Setup training and evaluation of the model** section, and particularly the **Setup Spectrogram Augmentation** section of the [NeMo tutorial](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb) for further details."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Creating and Training an Adapter\n",
"\n",
"The **Adapters: Creation and Preparation** section of the [NeMo tutorial](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb) details how to create an config object for a **LinearAdapter**, add it to the pretrained model, then freeze the original parameters of the model to train just the new parameters.\n",
"\n",
"All of this is a few lines of code, which will initialize your model so that it is prepared to train the adapter on new speech data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Evaluating the Model\n",
"\n",
"Since adapters contain new parameters and are dynamic in nature, they can be enabled/disabled prior to evaluation. The **Evaluate the adapted model** section in the [NeMo tutorial](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb) showcases how you can load the model, enable or disable an adapter in the model, save the model and then evaluate it with some data. \n",
"\n",
"Adapters are trained such that if one disables all adapters, the model will revert back to the pretrained model, which allows you to select whether you want to do well on general speech (with original pretrained model) or do better on your adaptation domain (with an adapter enabled)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Export the Model to Riva\n",
"\n",
"The `nemo2riva` command-line tool provides the capability to export your `.nemo` model in a format that can be deployed using [NVIDIA Riva](https://www.nvidia.com/en-us/ai-data-science/products/riva/), a highly performant application framework for multi-modal conversational AI services using GPUs. A Python `.whl` file for `nemo2riva` is included in the [Riva Skills Quick Start](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/resources/riva_quickstart) resource folder. You can also install `nemo2riva` with `pip`, as shown in the [Conformer-CTC fine-tuning tutorial](https://github.com/nvidia-riva/tutorials/blob/main/asr-finetuning-conformer-ctc-nemo.ipynb).\n",
"\n",
"Using `nemo2riva`, you can export your trained NeMo model for use in Riva by following the instructions in the [Riva documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/model-overview.html?highlight=nemo2riva#model-development-with-nemo)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What's Next?\n",
"\n",
"You can use NeMo to build custom models for your own applications, or you could [deploy the custom model to NVIDIA Riva](https://github.com/nvidia-riva/tutorials/blob/main/asr-deployment-conformer-ctc.ipynb)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"vscode": {
"interpreter": {
"hash": "16a4caf9b5d7385b3583eafc1422122ec4165938306acb2efbd3097c15cf1709"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}