Official implementation of the paper:
Fast, High-Quality and Parameter-Efficient Articulatory Synthesis Using Differentiable DSP (SLT 2024)
DDSP code is based on https://github.com/sweetcocoa/ddsp-pytorch and https://intro2ddsp.github.io/intro.html
- Install conda
- Run
conda env create -f environment.yml
to create conda env
- Download paired EMA and speech data, such as HPRC.
- Resample wav to be 16 kHz, and EMA data to be 200 Hz.
- Use
data_prep/batch_invert.ipynb
or other methods to extract pitch and loudness from waveform at 200 Hz. - Use
data_split.ipynb
to generate jsons that define the test/val/train splits
- Edit the config yaml, which defines hyperparameters, training parameters, and file directories.
- Run
python vocoder/main.py --config yamls/config.yaml
from the source directory to train.
If you find this repository useful, please cite our work with the following BibTex entries:
@misc{louis24ddsp,
title={Fast, High-Quality and Parameter-Efficient Articulatory Synthesis using Differentiable DSP},
author={Yisi Liu, Bohan Yu, Drake Lin, Peter Wu, Cheol Jun Cho, Gopala Krishna Anumanchipalli},
year={2024},
eprint={2409.02451},
archivePrefix={arXiv},
primaryClass={eess.AS}
}