GitHub - choiHkk/FastSpeech2-cwt: with alignment learning and continuous wavelet transform

Introduction

FastSpeech2 오픈 소스와 한국어 데이터셋(KSS)을 사용해 빠르게 학습합니다.
기존 오픈소스는 MFA기반 preprocessing을 진행한 상태에서 학습을 진행하지만 본 레포지토리에서는 alignment learning 기반 학습을 진행하고 preprocessing으로 인해 발생할 수 있는 디스크 용량 문제를 방지하기 위해 data_utils.py로부터 학습 데이터가 feeding됩니다.
기존 오픈소스는 pitch를 그대로 사용하지만 논문에서는 pitch를 cwt를 통해 pitch spectrogram으로 변환하는 과정이 포함되어 있기 때문에 data_utils.py에 반영했습니다.
conda 환경으로 진행해도 무방하지만 본 레포지토리에서는 docker 환경만 제공합니다. 기본적으로 ubuntu에 docker, nvidia-docker가 설치되었다고 가정합니다.
GPU, CUDA 종류에 따라 Dockerfile 상단 torch image 수정이 필요할 수도 있습니다.
preprocessing 단계에서는 학습에 필요한 transcript와 stats 정도만 추출하는 과정만 포함되어 있습니다.
그 외의 다른 preprocessing 과정은 필요하지 않습니다.

Dataset

download dataset - https://www.kaggle.com/datasets/bryanpark/korean-single-speaker-speech-dataset
unzip /path/to/the/kss.zip -d /path/to/the/kss
mkdir /path/to/the/FastSpeech2-cwt/data/dataset
mv /path/to/the/kss.zip /path/to/the/FastSpeech2-cwt/data/dataset

Docker build

cd /path/to/the/FastSpeech2-cwt
docker build --tag FastSpeech2_cwt:latest .

Training

nvidia-docker run -it --name 'FastSpeech2-cwt' -v /path/to/FastSpeech2-cwt:/home/work/FastSpeech2-cwt --ipc=host --privileged FastSpeech2_cwt:latest
cd /home/work/FastSpeech2-cwt
cd /home/work/FastSpeech2-cwt/hifigan
unzip generator_universal.pth.tar.zip .
cd /home/work/FastSpeech2-cwt
ln -s /home/work/FastSpeech2-cwt/data/dataset/kss
python preprocess.py ./config/kss/preprocess.yaml
python train.py -p ./config/kss/preprocess.yaml -m ./config/kss/model.yaml -t ./config/kss/train.yaml
arguments

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
audio		audio
config/kss		config/kss
hifigan		hifigan
model		model
preprocessed_data/kss		preprocessed_data/kss
samples		samples
text		text
transformer		transformer
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
_cwt.ipynb		_cwt.ipynb
_note.ipynb		_note.ipynb
data_utils.py		data_utils.py
evaluate.py		evaluate.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Dataset

Docker build

Training

Tensorboard losses

Tensorboard Stats

Reference

About

Releases

Packages

Contributors 3

Languages

License

choiHkk/FastSpeech2-cwt

Folders and files

Latest commit

History

Repository files navigation

Introduction

Dataset

Docker build

Training

Tensorboard losses

Tensorboard Stats

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages