Bachelor Thesis -
Phoneme classification and alignment
through recognition on TIMIT

My bachelor thesis on Phoneme recognition and alignment on the TIMIT dataset

Abstract

In this work we explore a hybrid between ANNs and DTW for phoneme alignment on the TIMIT dataset. The idea is to use the output probabilities of a neural phoneme recognition model together with a probability-based DTW in order to align phonemes. For phoneme recognition we achieve 18.1% FER which is an 4.0% improvement over the state-of-the-art. Our alignment results in a 86.3% phoneme boundary accuracy with a 20ms tolerance. Furthermore phoneme classification based on recordings of single phonemes is being tried resulting in an accuracy of 66.68%. Apart from that we introduce the CyclicPlateauScheduler, a new learning rate scheduler combining triangular cyclic learning rates with ReduceLROnPlateau.

CNN experiments

The code for the initial CNN experiments can be found here

Getting Started

Installation

Install dependencies using pip install torch torchaudio pytorch-lightning torchmetrics tensorboard pandas librosa soundfile matplotlib seaborn spafe dtaidistance levenshtein
Split the dataset into train, validation and test part by navigating to the src folder and executing python dataset/divide_dataset.py.

Configuration

You can adjust several global variables in the settings.py file. Specific training parameters and the main code are located in main.py.

Execution

To run the training and testing process, execute main.py. Detailed information about the current training is displayed in the terminal and logged in the Lightning logs directory, which can be viewed using TensorBoard for further analysis.

Main contributions of this work

CyclicPlateau scheduler

Introduced a scheduler that combines cyclic learning rates with Learning Rate Reduction on Plateau to get the benefits of both techniques. Cyclic learning rates reduce the risk of getting stuck in poor local minima by exploring a wider range of solutions, while Learning Rate Reduction on Plateau fine-tunes convergence by lowering the learning rate when validation loss stagnates, enabling precise optimization.

Phoneme-boundary weighted loss

Developed a custom variant of the cross-entropy loss function that assigns higher weight to phoneme boundaries, enhancing the model's ability to accurately detect precise transitions between phonemes.

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bachelor Thesis -
Phoneme classification and alignment
through recognition on TIMIT

Abstract

CNN experiments

Getting Started

Installation

Configuration

Execution

Main contributions of this work

CyclicPlateau scheduler

Phoneme-boundary weighted loss

About

Packages

Languages

License

lischilpp/bachelor-thesis-phoneme-recognition-alignment

Folders and files

Latest commit

History

Repository files navigation

Bachelor Thesis - Phoneme classification and alignment through recognition on TIMIT

Abstract

CNN experiments

Getting Started

Installation

Configuration

Execution

Main contributions of this work

CyclicPlateau scheduler

Phoneme-boundary weighted loss

About

Topics

Resources

License

Stars

Watchers

Forks

Packages 0

Languages

Bachelor Thesis -
Phoneme classification and alignment
through recognition on TIMIT

Packages