These are tutorials on some deep-neural-network vocoders in Pytorch and Python.
Features of these tutorials:
- Pre-trained model is provided to produce audio samples.
- No painful installation of dependency. Just directly run the notebook on Google Colab.
- Very detailed implementations, for example, how to cache intermediate output in causal dilated convolution.
- Not only DNN but also DSP techniques are explained, e.g., linear prediction, overlap-add ...
All are hosted on the Google Colab platform.
Link | Chapter | |
---|---|---|
Introduction and basics | ||
chapter_1_introduction.ipynb | entry point and Python/Pytorch conventions | |
chapter_2_DSP_tools_Python.ipynb | selected DSP tools for speech processing | |
chapter_3_DSP_tools_in_DNN_Pytorch.ipynb | selected DSP tools implemented as layers in neural networks; | |
DSP-based Vocoder | ||
chapter_4_DSP-based_Vocoder | traditional DSP-based vocoder included in SPTK toolkit; | |
Neural vocoders | ||
chapter_5_DSP+DNN_NSF.ipynb | neural source-filter model | |
chapter_6_AR_WaveNet.ipynb | Autogressive WaveNet vocoder | |
chapter_7_AR_iLPCNet.ipynb | Autogressive iLPCNet | |
chapter_8_Flow_WaveGlow.ipynb | Flow-based WaveGlow model | |
chapter_9_GAN_HiFiGAN_NSFw/GAN.ipynb | HiFiGAN, and NSF + HiFiGAN | |
Appendix | ||
chapter_a1_Linear_prediction.ipynb | Details on a naive implementation of Linear Prediction; | |
chapter_a2_Music_NSF.ipynb | Application of NSF to music instrumental audios. | |
chapter_a3_pretrained_vocoders.ipynb | Pretrained neural vocoders on a few speech datasets. |
Click Open in Colab
will open the book. You can also download them from Google Drive.
Models and implementations are for the tutorial, therefore lacking intensive tuning and optimization. Neither am I good at that. If you have ideas on how to improve, your feedback is appreciated!
The above notebooks were used in ICASSP 2022 short course and ISCA Speech Processing Course in Crete.
@misc{Stylianou2022,
author = {Stylianou, Yannis and Tsiaras, Vassilis and Conkie, Alistair and Maiti, Soumi and Yamagishi, Junichi and Wang, Xin and Chen, Yutian and Slaney, Malcom and Petkov, Petko and Padinjaru, Shifas and Kafentzis, George},
mendeley-groups = {misc,self-arxiv},
title = {{ICASSP2022 Shortcouse: Inclusive Neural Speech Synthesis -iNSS}},
year = {2022}
}
By Xin Wang