Tutorials on DNN-based Vocoders

These are tutorials on some deep-neural-network vocoders in Pytorch and Python.

Features of these tutorials:

Pre-trained model is provided to produce audio samples.
No painful installation of dependency. Just directly run the notebook on Google Colab.
Very detailed implementations, for example, how to cache intermediate output in causal dilated convolution.
Not only DNN but also DSP techniques are explained, e.g., linear prediction, overlap-add ...

All are hosted on the Google Colab platform.

Link	Chapter
	Introduction and basics
	chapter_1_introduction.ipynb	entry point and Python/Pytorch conventions
	chapter_2_DSP_tools_Python.ipynb	selected DSP tools for speech processing
	chapter_3_DSP_tools_in_DNN_Pytorch.ipynb	selected DSP tools implemented as layers in neural networks;
	DSP-based Vocoder
	chapter_4_DSP-based_Vocoder	traditional DSP-based vocoder included in SPTK toolkit;
	Neural vocoders
	chapter_5_DSP+DNN_NSF.ipynb	neural source-filter model
	chapter_6_AR_WaveNet.ipynb	Autogressive WaveNet vocoder
	chapter_7_AR_iLPCNet.ipynb	Autogressive iLPCNet
	chapter_8_Flow_WaveGlow.ipynb	Flow-based WaveGlow model
	chapter_9_GAN_HiFiGAN_NSFw/GAN.ipynb	HiFiGAN, and NSF + HiFiGAN
	Appendix
	chapter_a1_Linear_prediction.ipynb	Details on a naive implementation of Linear Prediction;
	chapter_a2_Music_NSF.ipynb	Application of NSF to music instrumental audios.
	chapter_a3_pretrained_vocoders.ipynb	Pretrained neural vocoders on a few speech datasets.

Click Open in Colab will open the book. You can also download them from Google Drive.

Models and implementations are for the tutorial, therefore lacking intensive tuning and optimization. Neither am I good at that. If you have ideas on how to improve, your feedback is appreciated!

The above notebooks were used in ICASSP 2022 short course and ISCA Speech Processing Course in Crete.

@misc{Stylianou2022,
author = {Stylianou, Yannis and Tsiaras, Vassilis and Conkie, Alistair and Maiti, Soumi and Yamagishi, Junichi and Wang, Xin and Chen, Yutian and Slaney, Malcom and Petkov, Petko and Padinjaru, Shifas and Kafentzis, George},
mendeley-groups = {misc,self-arxiv},
title = {{ICASSP2022 Shortcouse: Inclusive Neural Speech Synthesis -iNSS}},
year = {2022}
}

By Xin Wang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Tutorials on DNN-based Vocoders

Files

README.md

Latest commit

History

README.md

File metadata and controls

Tutorials on DNN-based Vocoders