Skip to content

☕🇧🇷 Scripts para o Kaldi em Português Brasileiro

License

Notifications You must be signed in to change notification settings

falabrasil/kaldi-br

Repository files navigation

FalaBrasil Scripts for Kaldi 🇧🇷

This repo contains instructions and scripts to train acoustic models using Kaldi over the datasets in Brazilian Portuguese (or just "general Portuguese"). You may also find some scripts for forced alignment and speaker diarization.

🗣️ :octocat: Looking for speech datasets in Brazilian Portuguese? Check out our "Speech Datasets" GitHub repo (based on DVC for storage): https://github.com/falabrasil/speech-datasets

📝 :octocat: Looking for text datasets in Brazilian Portuguese? Check out our "Text Datasets" GitHub repo: https://github.com/falabrasil/text-datasets

🎙️ 🦊 Looking for acoustic models (AM, probably for Vosk)? Check out the following GitLab repo (with LFS storage): https://gitlab.com/fb-resources/kaldi-br

🗒️ :octocat: 🦊 Looking for language models (LM)? Check out the following GitHub repo (notice there's a pair repo on GitLab for LFS storage): https://github.com/falabrasil/lm-br

📰 :octocat: 🦊 Looking for phonetic dictionaries (lexicon)? Check out the following GitHub repo (notice there's a pair repo on GitLab for LFS storage): https://github.com/falabrasil/dicts-br

🏷️ :octocat: 🐳 Wanna create your own phonetic dictionary? Check out our annotator tool's GitHub repo (there's also a dockerized version): https://github.com/falabrasil/annotator

☕ Looking for Kaldi installation instructions? Check out our install guide on INSTALL.md file or just go follow Kaldi documentation directly: https://github.com/kaldi-asr/kaldi

👣 If you're looking for a tutorial on data preparation and a step-by-step guide on how to train your own acoustic models from scratch using Kaldi, the best we can offer is this written tutorial.

Model training for speech recognition (Vosk + LapsBM)

See fb-lapsbm/ dir. Based on Mini-librispeech nnet3 recipe (local/chain/tuning/run_tdnn_1j.sh), adapted for a quick train exec over LapsBenchmark.

$ ./prep_lapsbm.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/s5/
$ ./run.sh

For online decoding, please check fb-lapsbm/local/vosk/ dir.

Model training for speech recognition (Vosk + Datasets)

See fb-falabrasil/ dir. This is expected to become the main recipe for Brazilian Portuguese, as we are planning on releasing the acoustic models as well.

Also based on Mini-librispeech recipe, same as above, but now it runs over all public speech datasets in Portugese (NOTE: not only "Brazilian" Portuguese!) we are aware of, which have been gathered here: https://github.com/falabrasil/speech-datasets

$ ./prep_falabrasil.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/s5/
$ ./run.sh

For online decoding, please check fb-falabrasil/local/vosk/ dir.

Model training for phonetic alignment (Gentle)

See fb-gentle/ dir. Based on ASpIRE nnet3 recipe.

$ ./prep_gentle.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/s5/
$ ./run.sh

⚠️ it didn't work. See README inside.

Model training for phonetic alignment (UFPAlign)

See fb-ufpalign/ dir. Based on LibriSpeech nnet3 recipe, in the hopes of future compatibility with MFA.

$ ./prep_ufpalign.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/s5/
$ ./run_all.sh

Speaker diarization (CallHome)

See fb-callhome/ dir. Based on CALLHOME v2 recipe. This uses pre-trained models on English data for inference only rather than training one from scratch.

$ ./prep_callhome.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/v2/
$ ./run.sh

Standalone clustering procedure based on pyannote.audio lib can also be found under utils/clustering/_diarization dir.

Citation

If you use these codes or want to mention the paper referred above, please cite us as one of the following:

Batista, C., Dias, A.L., Sampaio Neto, N. (2018) Baseline Acoustic Models for Brazilian Portuguese Using Kaldi Tools. Proc. IberSPEECH 2018, 77-81, DOI: 10.21437/IberSPEECH.2018-17.

@inproceedings{Batista18,
  author     = {Cassio Batista and Ana Larissa Dias and Nelson {Sampaio Neto}},
  title      = {{Baseline Acoustic Models for Brazilian Portuguese Using Kaldi Tools}},
  year       = {2018},
  booktitle  = {Proc. IberSPEECH 2018},
  pages      = {77--81},
  doi        = {10.21437/IberSPEECH.2018-17},
  url        = {http://dx.doi.org/10.21437/IberSPEECH.2018-17}
}

⚠️ This paper uses the outdated nnet2 recipes, while this repo has been updated to the chain models' recipe via nnet3 scripts. If you really want nnet2 scripts, you may find them on tag nnet2. Try running git tag.

Dias A.L., Batista C., Santana D., Neto N. (2020) Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools. In: Cerri R., Prati R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science, vol 12319. Springer, Cham. https://doi.org/10.1007/978-3-030-61377-8_44

@inproceedings{Dias20,
  author     = {Dias, Ana Larissa and Batista, Cassio and Santana, Daniel and Neto, Nelson},
  editor     = {Cerri, Ricardo and Prati, Ronaldo C.},
  title      = {Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools},
  booktitle  = {Intelligent Systems},
  year       = {2020},
  publisher  = {Springer International Publishing},
  address    = {Cham},
  pages      = {621--635},
  isbn       = {978-3-030-61377-8}
}

Batista, C., Dias, A.L. & Neto, N. Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit. EURASIP J. Adv. Signal Process. 2022, 11 (2022). https://doi.org/10.1186/s13634-022-00844-9

@article{Batista22,
  author     = {Batista, Cassio and Dias, Ana Larissa and Neto, Nelson},
  title      = {Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit},
  journal    = {EURASIP Journal on Advances in Signal Processing},
  year       = {2022},
  month      = {Feb},
  day        = {19},
  volume     = {2022},
  number     = {1},
  pages      = {11},
  issn       = {1687-6180},
  doi        = {10.1186/s13634-022-00844-9},
  url        = {https://doi.org/10.1186/s13634-022-00844-9}
}

FalaBrasil UFPA

Grupo FalaBrasil (2022) - https://ufpafalabrasil.gitlab.io/
Universidade Federal do Pará (UFPA) - https://portal.ufpa.br/
Cassio Batista - https://cassota.gitlab.io/