Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add parakeet finetuning tutorials #201

Merged
merged 5 commits into from
Nov 14, 2024

Conversation

jmayank1511
Copy link
Contributor

No description provided.

@rmittal-github rmittal-github force-pushed the parakeet_finetuning_demo branch from 82996de to 76018fd Compare October 11, 2024 05:23
@rmittal-github
Copy link
Collaborator

rebased and added a minor fix found in other thread, @nv-uvaidya could you review the new tutorial as well?

@myungjongk
Copy link

I think it is better to replace the finetuning tutorial file name from asr_finetune_parakeet_nemo.ipynb to asr-finetune-parakeet-nemo.ipynb.

"# How to Fine-Tune a Riva ASR Acoustic Model with NVIDIA NeMo\n",
"This tutorial walks you through how to fine-tune an NVIDIA Riva ASR acoustic model with NVIDIA NeMo.\n",
"\n",
"**Important**: If you plan to fine-tune an ASR acoustic model using the same tokenizer with which the model was trained, skip this tutorial and refer to the \"Sub-word Encoding CTC Model\" section (starting with the \"Load pre-trained model\" subsection) of the [NeMo ASR Language Finetuning tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/ASR_CTC_Language_Finetuning.ipynb)."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this line as we already handle finetuning with same tokenizer in this tutorial. Or we can just refer to NeMo's tutorial as additional finetuning resources.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"\n",
"Hybrid RNNT-CTC models is a group of models with both the RNNT and CTC decoders. Training a hybrid model would speedup the convergence for the CTC models and would enable the user to use a single model which works as both a CTC and RNNT model. This category can be used with any of the ASR models. Hybrid models uses two decoders of CTC and RNNT on the top of the encoder.\n",
"\n",
"NeMo uses `.yml` files to configure the training parameters. You may update them directly by editing the configuration file or from the command-line interface. For example, if the number of epochs needs to be modified, along with a change in the learning rate, you can add `trainer.max_epochs=100` and `optim.lr=0.02` and train the model.\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.yml can be replaced with .yaml.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"\n",
"NeMo uses `.yml` files to configure the training parameters. You may update them directly by editing the configuration file or from the command-line interface. For example, if the number of epochs needs to be modified, along with a change in the learning rate, you can add `trainer.max_epochs=100` and `optim.lr=0.02` and train the model.\n",
"\n",
"The following sample command uses the `speech_to_text_hybrid_rnnt_ctc_bpe.py` script in the `examples` folder to train/fine-tune a Parakeet-Hybrid ASR model for 1 epoch. For other ASR models like Citrinet, Conformer, you may find the appropriate config files in the NeMo GitHub repo under [examples/asr/conf/](https://github.com/NVIDIA/NeMo/tree/main/examples/asr/conf).\n"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speech_to_text_hybrid_rnnt_ctc_bpe.py needs to be replaced with speech_to_text_finetune.py.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"source": [
"#### Convert to Riva\n",
"\n",
"Convert the downloaded model to the `.riva` format. We will set the encryption key with `--key=nemotoriva`. Choose a different encryption key value when generating `.riva` models for production.\n",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can mention --onnx-opset 18 is needed for the Riva version 2.15.0 and above.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed for RNNT models

"outputs": [],
"source": [
"riva_file_path = ctc_model_path[:-5]+\".riva\"\n",
"!nemo2riva --key=nemotoriva --out $riva_file_path $ctc_model_path"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--onnx-opset 18 can be added.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@myungjongk myungjongk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@rmittal-github rmittal-github merged commit 915f9e7 into nvidia-riva:main Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants