Multimodal Pathological Voice Classification

This repository contains the code used in the AI CUP 2023 Spring Multimodal Pathological Voice Classification Competition, where we achieved 8th place in the public ranking and 1st place in the private ranking, with scores of 0.657057 and 0.641098, respectively.

Getting the Code

You can download all the files in this repository by cloning it using the following command:

git clone https://github.com/jwliao1209/Multimodal-Pathological-Voice-Classification.git

Proposed Pipeline

The feature extraction process consists of two key steps:

Global Feature Extraction: We utilize Fast Fourier Transform (FFT) to extract frequency features and compute statistical indicators, constructing global features.
Local Feature Extraction: A pre-trained deep learning model is used to extract local features, followed by dimensionality reduction using Principal Component Analysis (PCA) to retain relevant feature combinations.

For the model training phase, we apply machine learning-based tree models such as Random Forest and LightGBM, along with a transformer-based deep learning model called TabPFN. An ensemble method is then used to combine the predicted probabilities from these models, resulting in the final output.

Requirements

To set up the environment, run the following commands:

conda create --name audio python=3.10
conda activate audio
pip install -r requirements.txt

Data Preprocessing

To preprocess the dataset, run the command:

python process_data.py

Training

To start training the models, run the command:

bash train.py

Inference

For inference, run the following command:

python inference.py

Operating system and device

We implemented the code on an environment running Ubuntu 22.04.1, utilizing a 12th Generation Intel(R) Core(TM) i7-12700 CPU, along with a single NVIDIA GeForce RTX 4090 GPU equipped with 24 GB of dedicated memory.

Citation

@misc{multimodal_pathological_voice_classification_2023,
    title  = {Multimodal Pathological Voice Classification},
    author = {Jia-Wei Liao, Chun-Hsien Chen, Shu-Cheng Zheng, Yi-Cheng Hung},
    url    = {https://github.com/jwliao1209/Multimodal-Pathological-Voice-Classification},
    year   = {2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Multimodal Pathological Voice Classification

Getting the Code

Proposed Pipeline

Requirements

Data Preprocessing

Training

Inference

Operating system and device

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Multimodal Pathological Voice Classification

Getting the Code

Proposed Pipeline

Requirements

Data Preprocessing

Training

Inference

Operating system and device

Citation