To see examples of non verbal facial generation, go here.
The code contains a model to jointly and automatically generate the rhythmic head, facial, and gaze movements (non-verbal behaviors) of a virtual agent from acoustic speech features. The architecture is an Adversarial Encoder-Decoder. Head movements and gaze orientation are generated as 3D coordinates, while facial expressions are generated using action units based on the facial action coding system.
github.mp4
The video below is generated in English with speech synthesized from text (TTS). Please note, that the model is trained exclusively with natural French voices. To see examples with natural voices, go here.
- Clone the repository
- In a conda console, execute 'conda env create -f environment.yml' to create the right conda environment.
With this section, we directly recover the extracted and align features.
We extract the speech and visual features automatically from videos using existing tools, namely OpenFace and OpenSmile. You can of course use the code in the pre_processing folder to extract your own features from chosen videos. Please contact the authors to obtain the videos of Trueness and/or Cheese datasets.
- Create a directory './data'.
- Download files found in this drive for Trueness and in this drive for Cheese and place them in the repository.
Files with the suffix "moveSpeakerOnly" are those whose behaviors are set to 0 if the person doesn't speak.
- "params.cfg" is the configuration file to customize the model before training. To learn what section needs to be changed go see the configuration file.
- You can conserve the existing file or create a new one.
- In the conda console, train the model by executing:
python PATH/TO/PROJECT/generation/train.py -params PATH/TO/CONFIG/FILE.cfg [-id NAME_OF_MODEL]
You can visualize the created graphics during training in the repository [saved_path] of your config file. By default "./generation/saved_models".
In the conda console, generate behaviors by executing:
python PATH/TO/PROJECT/generation/generate.py -epoch [integer] -params PATH/TO/CONFIG/FILE.cfg -dataset [dataset]
The behaviors are generated in the form of 3D coordinates and intensity of facial action units. These are .csv files stored in the repository [output_path] of your config file. By default "./generation/data/output/MODEL_PATH".
- -epoch: during training, if you trained in 1000 epochs, recording every 100 epochs, you must enter a number within [100;200;300;400;500;600;700;800;900;1000].
- -params: path to the config file.
- -dataset: name of the considered dataset.
The objective evaluation of these models is conducted with measures such as dtw, curves visualization, visualization from PCA reduction, and jerk and acceleration measurements. In the conda console, evaluate model objectively by executing:
python generation/evaluate.py -params PATH/TO/CONFIG/FILE.cfg -epoch [integer] -[PARAMETERS]
- -params: path to the config file. PARAMETERS :
- -dtw
- -pca
- -curve
- -curveVideo
- -acceleration
- -jerk
You will find the results in the directory "./generation/evaluation".
To animate a virtual agent with the generated behaviors, we use the GRETA platform.
- Download and install GRETA with "gpl-grimaldi-release.7z" at https://github.com/isir/greta/releases/tag/v1.0.1.
- Open GRETA. Open the configuration "Greta - Record AU.xml" which is already present in GRETA.
- Use the block "AU Parser File Reader" and "Parser Capture Controller AU" to create the video from the .csv file generated.
You can directly concatenate the voices from the original videos to the Greta generated .avi videos.
input_video = "video_path.avi"
input_audio = "audio_path.wav"
output = "video_path_with_sound.mp4"
if(os.path.isfile(input_video) and os.path.isfile(input_audio)):
audio = mp.AudioFileClip(input_audio)
video = mp.VideoFileClip(input_video)
final = video.set_audio(audio)
final.write_videofile(output)