v2.1.0: Fine-tuning and parameter freezing, pitch expressiveness control, DS files training, minor featrure improvements and bug fixes
Fine-tuning and parameter freezing (#108, #120)
If you already have some pre-trained checkpoints, and you need to adapt them to other datasets with their functionalities unchanged, fine-tuning may save training steps and time. Configuration example:
finetune_enabled: true # the main switch to enable fine-tuning
finetune_ckpt_path: checkpoints/pretrained/model_ckpt_steps_320000.ckpt # path to your pre-trained checkpoint
finetune_ignored_params: # prefix rules to exclude specific parameters when loading the checkpoints
- model.fs2.encoder.embed_tokens # in case when the phoneme set is changed
- model.fs2.txt_embed # same as above
- model.fs2.spk_embed # in case when the speaker set is changed
finetune_strict_shapes: true # whether to raise an error when parameter shapes mismatch
Freezing part of the model parameter during training and fine-tuning may be able to save GPU memory, accelerate the training process or avoid catastrophic forgetting. Configuration example:
freezing_enabled: true # main switch to enable parameter freezing
frozen_params: # prefix rules to freeze specific parameters during training
- model.fs2.encoder
- model.fs2.pitch_embed
Please see the documentation for detailed usages of these two features.
Pitch expressiveness controlling mechanism (#97)
Expressiveness controls how freely the variance model generates pitch curves. By default, the variance model predicts pitch at a 100% expressiveness, which means completely following the style of the voice provider. Correspondingly, a 0% expressiveness will produce pitch completely close to the smoothened music score. Expressiveness can be freely adjusted from 0% to 100%, statically, or even dynamically on frame level.
Pitch expressiveness controlling is compatible with all variance models with a pitch predictor without re-training anything.
Control pitch expressiveness in CLI
python scripts/infer.py variance my_project.ds --exp my_pitch_exp --predict pitch --expr 0.8 # a value between 0 and 1
Control pitch expressiveness in DS files
{
"expr": 0.8 // static control
}
or
{
"expr": "0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0", // dynamic control
"expr_timestep": "0.005"
}
Expose pitch expressiveness control in ONNX models
python scripts/export.py variance --exp my_pitch_exp --expose_expr
This will add an additional input named expr
in my_pitch_exp.pitch.onnx.
DS files training (#132)
Using DS files to train variance models are now supported - this means users of voicebanks can tune projects of their own styles without recording any real singing voice. The only things needed to be done are: copy the DS files in ds/
folder in raw dataset directory, write a single-column transcriptions.csv to declare them and turn on the main switch of DS files binarization in the configuration file:
binarization_args:
prefer_ds: true # prefer loading from DS files
Please see the documentation for more detailed usages and information of DS files binarization.
Other minor feature improvements
- Support the state-of-the-art RMVPE pitch extractor (#118, #122)
- Show objective evaluation metrics on TensorBoard (#123, #127)
- Support composite LR schedulers (#125)
- Perform graceful exit on keyboard interrupt during binarization and inference (#119)
- Improve logging format of learning rate (#115)
- Add more documentation for old and new features
Major bug fixes
- Fixed wrong speaker ID assignment in fixed pitch shifting augmentation
- Fixed illegal access to
None
when training dur predictor - Fixed slur mistakes in a sample DS file
- Fixed wrong model loading logic when using
--mel
- Fixed noisy output of ONNX models on DirectML
- Fixed missing
spk_embed
input of multi-speaker duration predictor ONNX models
Some changes may not be listed above. See full change log: v2.0.0...v2.1.0