Release v2.1.0: Fine-tuning and parameter freezing, pitch expressiveness control, DS files training, minor featrure improvements and bug fixes · openvpi/DiffSinger

Fine-tuning and parameter freezing (#108, #120)

If you already have some pre-trained checkpoints, and you need to adapt them to other datasets with their functionalities unchanged, fine-tuning may save training steps and time. Configuration example:

finetune_enabled: true  # the main switch to enable fine-tuning
finetune_ckpt_path: checkpoints/pretrained/model_ckpt_steps_320000.ckpt  # path to your pre-trained checkpoint
finetune_ignored_params:  # prefix rules to exclude specific parameters when loading the checkpoints
  - model.fs2.encoder.embed_tokens  # in case when the phoneme set is changed
  - model.fs2.txt_embed  # same as above
  - model.fs2.spk_embed  # in case when the speaker set is changed
finetune_strict_shapes: true  # whether to raise an error when parameter shapes mismatch

Freezing part of the model parameter during training and fine-tuning may be able to save GPU memory, accelerate the training process or avoid catastrophic forgetting. Configuration example:

freezing_enabled: true  # main switch to enable parameter freezing
frozen_params:  # prefix rules to freeze specific parameters during training
  - model.fs2.encoder
  - model.fs2.pitch_embed

Please see the documentation for detailed usages of these two features.

Pitch expressiveness controlling mechanism (#97)

Expressiveness controls how freely the variance model generates pitch curves. By default, the variance model predicts pitch at a 100% expressiveness, which means completely following the style of the voice provider. Correspondingly, a 0% expressiveness will produce pitch completely close to the smoothened music score. Expressiveness can be freely adjusted from 0% to 100%, statically, or even dynamically on frame level.

Pitch expressiveness controlling is compatible with all variance models with a pitch predictor without re-training anything.

Control pitch expressiveness in CLI

python scripts/infer.py variance my_project.ds --exp my_pitch_exp --predict pitch --expr 0.8  # a value between 0 and 1

Control pitch expressiveness in DS files

{
  "expr": 0.8  // static control
}

{
  "expr": "0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0",  // dynamic control
  "expr_timestep": "0.005"
}

Expose pitch expressiveness control in ONNX models

python scripts/export.py variance --exp my_pitch_exp --expose_expr

This will add an additional input named expr in my_pitch_exp.pitch.onnx.

DS files training (#132)

Using DS files to train variance models are now supported - this means users of voicebanks can tune projects of their own styles without recording any real singing voice. The only things needed to be done are: copy the DS files in ds/ folder in raw dataset directory, write a single-column transcriptions.csv to declare them and turn on the main switch of DS files binarization in the configuration file:

binarization_args:
  prefer_ds: true  # prefer loading from DS files

Please see the documentation for more detailed usages and information of DS files binarization.

Other minor feature improvements

Support the state-of-the-art RMVPE pitch extractor (#118, #122)
Show objective evaluation metrics on TensorBoard (#123, #127)
Support composite LR schedulers (#125)
Perform graceful exit on keyboard interrupt during binarization and inference (#119)
Improve logging format of learning rate (#115)
Add more documentation for old and new features

Major bug fixes

Fixed wrong speaker ID assignment in fixed pitch shifting augmentation
Fixed illegal access to None when training dur predictor
Fixed slur mistakes in a sample DS file
Fixed wrong model loading logic when using --mel
Fixed noisy output of ONNX models on DirectML
Fixed missing spk_embed input of multi-speaker duration predictor ONNX models

Some changes may not be listed above. See full change log: v2.0.0...v2.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.1.0: Fine-tuning and parameter freezing, pitch expressiveness control, DS files training, minor featrure improvements and bug fixes