-
Notifications
You must be signed in to change notification settings - Fork 310
Voicebank development
This page is a work in progress. Contributions are welcomed
Anyone can make a voicebank with their voice and use it in OpenUtau. There are basically two types of voicebanks: UTAU concatenative voicebanks and Machine learning voicebanks.
To make a UTAU voicebank, you need to record all the syllables in a language and label them.
- The user inputs lyrics and notes
- The phonemizer converts the notes and lyrics into a list of concatenate units.
- For each concatenate unit, the resampler loads the corresponding sample from the voicebank, change the duration and pitch of the sample, and apply flags to them.
- The wavtool joins all the audio slices produced by resampler together and output the final audio.
Firstly, find a reclist suitable for your language. A reclist is a text file that has all syllables or phonemes and their combinations in a language. Here are some publically available reclists:
Phonemizer | Reclist |
---|---|
EN VCCV | Core American English VCCV by PaintedCz |
EN ARPA | ARPAsing resource website |
EN XSAMPA | Delta-style English reclists |
JA VCV & CVVC | Japanese reclists |
ZH CVVC | Hr.J Chinese CVVC by haru |
- English recorded with any of these methods have various pros and cons. You may want to try recording a language with a smaller set of vowels first, or recording and training an AI voicebank for DiffSinger.
Theoretically you can record a voicebank with any software you like. However, a dedicated reclist recorder application can automatically prompt you to record line by line and save them using the text of the line as file names. The recommended software is recstar
Many people record with a "guide BGM" so that their samples are at a consistent BPM and pitch. You can find some guideBGMs to start with here and here (Japanese site). ("#-Mora" is the number of syllables in each string)
You can record multiple subbanks for one voicebank. For example, you can record in multiple pitches to make its range wider, or record in multiple vocal modes to let the user choose between different singing styles. Each subbank is equivalent to a full single-pitch voicebank and should contain all the voice lines in your reclist.
You can start developing your voicebank with only one subbank, and add more subbanks in the future.
After recording, you will need to make oto.ini files.
oto.ini is a mark-up language that tells OpenUtau where each phoneme is in the voicebank and how to manipulate them.
The recommended way to make oto.ini for a voicebank is using vLabeler.
If you want a tutorial that covers how to oto most styles of voicebank, Yin's tutorial is commonly recommended by the community. It uses older tools than vLabeler, but the basic logic is the same.
Let's look at an example oto.ini line from Kasane Teto in vLabeler and dissect it.
- Yellow line: Left blank. The start of the phoneme.
- Green line: Overlap. Everything between the left blank and overlap will be blended with the previous note.
- Red line: Preutterance. The start of the note.
- Blue line: Consonant. Everything before this line will not be looped.
- White line: Right blank. The end of the phoneme. Everything between the consonant and right blank will be looped.
See also: Anatomy of the OTO on UtaForum.
Create a subfolder inside your OpenUtau's Singers
folder, and put your voice folders (wav and oto.ini) into it. You still need these files and informations in your voicebank:
character.txt is the file that tells OpenUtau this is a voicebank, and the name of the voicebank. Create a new text file in your voicebank named character.txt
and edit it. Here is a minimal example of your character.txt:
name=name_of_your_voicebank
If your voicebank contains multiple subbanks, you'll need to set them up in OpenUtau with Tools → Singers → Edit subbanks
where you can assign each subbank to a certain pitch range in a voice color.
Launch OpenUtau. In Tools → Singers
, click ⚙ → Default Phonemizer
and select the phonemizer that your voicebank supports. After the user chooses your voicebank, the phonemizer will be autometically chosen.
Subbanks and default phonemizer infomation are stored in character.yaml
inside the voicebank. See tech note ‐ character.yaml
In Tools → Singers
, click ⚙ → Publish Singer
. You'll get a zip file of your singer for distributing.
Machine learning voicebanks produce more fluent singing voice with less manual edits, but you'll need a GPU to train them. OpenUtau supports two engines that allow making voicebanks by yourself: NNSVS and DiffSinger. To make a machine learning voicebank, you need to record your singing voice, label them and train a machine learning model.
Record any song in this language with any recording software you like. Just ensure that:
- all the lyrics are in the language your voicebank supports
- your dataset contains all the phonemes in the language.
After recording, make phoneme-level labels for your voicebank. You can use vlabeler to make labels.
SVS Singing voice database - tutorial by PixPrucer
There are also automated tools that make labels for you:
After labelling your dataset, you can either train a DiffSinger voicebank or an ENUNU voicebank.