Official implementation for paper https://doi.org/10.3390/electronics10030228.
Required packages: pandas, tqdm, pydub, parselmouth, numpy, pyquaternion, pytorch
ffmpeg is required for example_scripts/gen_all_models.sh
The model we selected is stored as pre_trained_model/unroll_1000.pt
. Parameters are compatible with class Generator
in models.py
.
- Clone this repository
- Download a dataset from https://www.dropbox.com/sh/j419kp4m8hkt9nd/AAC_pIcS1b_WFBqUp5ofBG1Ia?dl=0
- Create a directory named
dataset
and put two directoriesmotion/
andspeech/
underdataset/
python data_processing/prepare_data.py data
to split dataset to train, dev, and test set (as we used data in train set to scale input and output). Then runpython data_processing/create_vector.py
- Play with
generating.py
. Arguments of this function are: (1)wav file path, (2)output save path, (3)(optional)model parameters path. This function will process wav file and produce motion using parameters provided. You can specify the noise vector used to generate motions as one argument for functiongenerate_motion
. The result will be .npy format. E.g.,
python generating.py data/test/inputs/audio1094.wav test_output.npy -mp pre_trained_model/unroll_1000.pt
- If you want to see video, use
example_scripts/make_mute_mp4.py
. The first argument is to specify .npy file, the second argument is to set save path. However, the video is without audio.E.g.,
python example_scripts/make_mute_mp4.py test_output.npy test_video.mp4
- Attach audio to the mute video(ffmpeg required). Or you can find your own way.
ffmpeg -i test_video.mp4 -i data/test/inputs/audio1094.wav -c:v copy -map 0:v:0 -map 1:a:0 test_video.mp4
Notice GAN is not guaranteed to produce same result for every training, so your result could be different from ours.
- Following 1-4 in Directly use pre-trained model to generate motions.
python train.py
to train a new model. The architecture of model is defined inmodels.py
. Hyper-parameters are defined intrain.py
. Params of generator are periodically saved during training.- You may want to check your result.
example_scripts/gen_all_models.sh
is a function for generating videos with audio for all saved models for one audio input.
To cite our work, you can use the following BibTex reference
@article{wu2021modeling,
title={Modeling the conditional distribution of co-speech upper body gesture jointly using conditional-GAN and unrolled-GAN},
author={Wu, Bowen and Liu, Chaoran and Ishi, Carlos Toshinori and Ishiguro, Hiroshi},
journal={Electronics},
volume={10},
number={3},
pages={228},
year={2021},
publisher={Multidisciplinary Digital Publishing Institute}
}
For any question, please contact wu.bowen@irl.sys.es.osaka-u.ac.jp