- [30/03/2024]: The evaluation code is updated.
- [07/02/2024]: The inference script is released.
- [06/02/2024]: The model weight is released.
conda create --name diffspeaker python=3.9
conda activate diffspeaker
Install MPI-IS. Follow the command in MPI-IS to install the package. Depending on if you have /usr/include/boost/
directories, The command is likely to be
git clone https://github.com/MPI-IS/mesh.git
cd mesh
sudo apt-get install libboost-dev
python -m pip install pip==20.2.4
BOOST_INCLUDE_DIRS=/usr/include/boost/ make all
python -m pip install --upgrade pip
Then install the rest of the dependencies.
cd ..
git clone https://github.com/theEricMa/DiffSpeaker.git
cd DiffSpeaker
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install imageio-ffmpeg
pip install -r requirements.txt
You can access the model parameters by clicking here. Place the checkpoints
folder into the root directory of your project. This folder includes the models that have been trained on the BIWI
and vocaset
datasets, utilizing wav2vec
and hubert
as the backbones.
For the BIWI model, use the script below to perform inference on your chosen audio files. Specify the audio file using the --example
argument.
sh scripts/demo/demo_biwi.sh
For the vocaset model, run the following script.
sh scripts/demo/demo_vocaset.sh
To obtain the metrics reported in the paper, use the scripts in scripts/diffusion/biwi_evaluation
and scripts/diffusion/vocaset_evaluation
. For example, to evaluate DiffSpeaker in BIWI dataset with the hubert backbone, use the following script.
sh scripts/diffusion/biwi_evaluation/diffspeaker_hubert_biwi.sh
mkdir experiments