DeepForcedAligner

With this tool you can create accurate text-audio alignments given a bunch of audio files and their transcription. The alignments can for example be used to train text-to-speech models such as FastSpeech. In comparison to other forced alignment tools this repo has following advantages:

Multilingual: By design, the DFA is language-agnostic and can align both characters or phonemes.
Robustness: The alignment extraction is highly tolerant against text errors and silent characters.
Convenience: Easy installation with no extra dependencies. You can provide your own data in the standard LJSpeech format without special preprocessing (such as applying phonetic dictionaries, non-speech annotations etc.).

The approach is based on training a simple speech recognition model with CTC loss on mel spectrograms extracted from the wav files.

Installation

Running on Python >=3.6

pip install -r requirements.txt

Example Training and Extraction

Check out the following demo notebook for training and character duration extraction on the LJSpeech dataset:

(1) Download the LJSpeech dataset, set paths in config.yaml:

  dataset_dir: LJSpeech
  metadata_path: LJSpeech/metadata.csv

(2) Preprocess the data and train aligner:

  python preprocess.py
  python train.py

(3) Extract durations with latest model checkpoint (60k steps should be sufficient):

  python extract_durations.py

By default durations are put as numpy files into:

  output/durations

Each character duration correspons to one mel time step, which translates to hop_length / sample_rate seconds in the wav file.

Tensorboard

You can monitor the training with

  tensorboard dfa_checkpoints

Using Your Own Dataset

Just bring your dataset to the LJSpeech format. We recommend to clean and preprocess the text in the metafile.csv before running the DFA, e.g. lower-case, phonemization etc.

Using Preprocessed Mel Spectrograms

You can provide your own mel spectrogams by setting in the config.yaml:

  precomputed_mels: /path/to/mels

Make sure that the mel names match the ids in the metafile, e.g.

  00001.mel ---> 00001|First sample text

Name	Name	Last commit message	Last commit date
Latest commit cschaefer26 Avoids nan losses. Oct 7, 2021 d1f5656 · Oct 7, 2021 History 97 Commits
dfa	dfa	Avoids nan losses.	Oct 7, 2021
notebooks	notebooks	Minor notebook fix.	Oct 30, 2020
.gitignore	.gitignore	Added basic preprocessing.	Oct 14, 2020
LICENSE	LICENSE	Initial commit	Oct 13, 2020
README.md	README.md	Updated notebook.	Oct 30, 2020
config.yaml	config.yaml	Added notebook.	Oct 30, 2020
extract_durations.py	extract_durations.py	Avoids nan losses.	Oct 7, 2021
preprocess.py	preprocess.py	Moved preprocessing mel dim flag to config.	Oct 27, 2020
requirements.txt	requirements.txt	Updated requirements.	Oct 29, 2020
scratch_pred.py	scratch_pred.py	Merge branch 'main' into pred_from_config	Oct 22, 2020
train.py	train.py	Merge pull request #11 from as-ideas/small_changes	Oct 29, 2020
trainer.py	trainer.py	Merged preproc.	Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepForcedAligner

Installation

Example Training and Extraction

Tensorboard

Using Your Own Dataset

Using Preprocessed Mel Spectrograms

About

Releases

Packages

Contributors 2

Languages

License

spring-media/DeepForcedAligner

Folders and files

Latest commit

History

Repository files navigation

DeepForcedAligner

Installation

Example Training and Extraction

Tensorboard

Using Your Own Dataset

Using Preprocessed Mel Spectrograms

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages