Skip to content

My bachelor thesis on Phoneme recognition and alignment on the TIMIT dataset

License

Notifications You must be signed in to change notification settings

lischilpp/bachelor-thesis-phoneme-recognition-alignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Bachelor Thesis -
Phoneme classification and alignment
through recognition on TIMIT

My bachelor thesis on Phoneme recognition and alignment on the TIMIT dataset

Link to publication

Abstract

In this work we explore a hybrid between ANNs and DTW for phoneme alignment on the TIMIT dataset. The idea is to use the output probabilities of a neural phoneme recognition model together with a probability-based DTW in order to align phonemes. For phoneme recognition we achieve 18.1% FER which is an 4.0% improvement over the state-of-the-art. Our alignment results in a 86.3% phoneme boundary accuracy with a 20ms tolerance. Furthermore phoneme classification based on recordings of single phonemes is being tried resulting in an accuracy of 66.68%. Apart from that we introduce the CyclicPlateauScheduler, a new learning rate scheduler combining triangular cyclic learning rates with ReduceLROnPlateau.

CNN experiments

The code for the initial CNN experiments can be found here

Getting Started

Installation

  1. Install dependencies using pip install torch torchaudio pytorch-lightning torchmetrics tensorboard pandas librosa soundfile matplotlib seaborn spafe dtaidistance levenshtein
  2. Split the dataset into train, validation and test part by navigating to the src folder and executing python dataset/divide_dataset.py.

Configuration

You can adjust several global variables in the settings.py file. Specific training parameters and the main code are located in main.py.

Execution

To run the training and testing process, execute main.py. Detailed information about the current training is displayed in the terminal and logged in the Lightning logs directory, which can be viewed using TensorBoard for further analysis.

Main contributions of this work

CyclicPlateau scheduler

Introduced a scheduler that combines cyclic learning rates with Learning Rate Reduction on Plateau to get the benefits of both techniques. Cyclic learning rates reduce the risk of getting stuck in poor local minima by exploring a wider range of solutions, while Learning Rate Reduction on Plateau fine-tunes convergence by lowering the learning rate when validation loss stagnates, enabling precise optimization.

Phoneme-boundary weighted loss

Developed a custom variant of the cross-entropy loss function that assigns higher weight to phoneme boundaries, enhancing the model's ability to accurately detect precise transitions between phonemes.

Packages

No packages published

Languages