Skip to content

Commit

Permalink
add --folder arg to resample module
Browse files Browse the repository at this point in the history
  • Loading branch information
bagustris committed May 29, 2024
1 parent 683975a commit 8d72dd6
Show file tree
Hide file tree
Showing 3 changed files with 91 additions and 43 deletions.
92 changes: 55 additions & 37 deletions ini_file.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,32 @@
# Overview of options for the nkululeko framework

* To be specified in a .ini file, [config parser syntax](https://zetcode.com/python/configparser/)
* Kind of all (well, most) values have defaults
* Kind of all (well, most) values have defaults

## Contents
- [Overview of options for the nkululeko framework](#overview-of-options-for-the-nkululeko-framework)
- [Contents](#contents)
- [Sections](#sections)
- [EXP](#exp)
- [DATA](#data)
- [AUGMENT](#augment)
- [SEGMENT](#segment)
- [FEATS](#feats)
- [MODEL](#model)
- [EXPL](#expl)
- [PREDICT](#predict)
- [EXPORT](#export)
- [CROSSDB](#crossdb)
- [PLOT](#plot)
- [RESAMPLE](#resample)
- [REPORT](#report)

- [Overview of options for the nkululeko framework](#overview-of-options-for-the-nkululeko-framework)
* [Contents](#contents)
* [Sections](#sections)
* [EXP](#exp)
* [DATA](#data)
* [AUGMENT](#augment)
* [SEGMENT](#segment)
* [FEATS](#feats)
* [MODEL](#model)
* [EXPL](#expl)
* [PREDICT](#predict)
* [EXPORT](#export)
* [CROSSDB](#crossdb)
* [PLOT](#plot)
* [RESAMPLE](#resample)
* [REPORT](#report)

## Sections

### EXP

* **root**: experiment root folder
* **root**: experiment root folder
* root = ./results/
* **type**: the kind of experiment
* type = classification
Expand Down Expand Up @@ -53,13 +55,14 @@
* databases = ['emodb', 'timit']

### DATA

* **type**: just a flag now to mark continuous data, so it can be binned to categorical data (using *bins* and *labels*)
* type = continuous
* **databases**: list of databases to be used in the experiment
* databases = ['emodb', 'timit']
* **tests**: Datasets to be used as test data for the given best model. The databases do NOT have to appear in the **databases** field!
* tests = ['emovo']
* **root_folders**: specify an additional configuration specifically for all entries starting with a dataset name, acting as global defaults.
* **root_folders**: specify an additional configuration specifically for all entries starting with a dataset name, acting as global defaults.
* root_folders = data_roots.ini
* **db_name**: path with audformatted repository for each database listed in 'databases*. If this path is not absolute, it will be treated relative to the experiment folder.
* emodb = /home/data/audformat/emodb/
Expand Down Expand Up @@ -117,7 +120,7 @@
* target = emotion
* **labels**: for classification experiments: the names of the categories (is also used for regression when binning the values)
* labels = ['anger', 'boredom', 'disgust', 'fear', 'happiness', 'neutral', 'sadness']
* **bins**: array of integers to be used for binning continuous data
* **bins**: array of integers to be used for binning continuous data
* bins = [-100, 40, 50, 60, 70, 100]
* **no_reuse**: don't re-use any tables, but start fresh
* no_reuse = False
Expand All @@ -139,6 +142,7 @@
* check_vad = True

### AUGMENT

* **augment**: select the methods to augment: either *traditional* or *random_splice*
* augment = ['traditional', 'random_splice']
* choices are:
Expand All @@ -152,18 +156,20 @@
* augmentations = Compose([AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.05),Shift(p=0.5),BandPassFilter(min_center_freq=100.0, max_center_freq=6000),])

### SEGMENT

* **sample_selection**: select the samples to segment: either *train*, *test*, or *all*
* sample_selection = all
* **segment_result**: name of the segmented data table as a result
* segment_target = segmented.csv
* **method**: select the model
* **method**: select the model
* method = [silero](https://github.com/snakers4/silero-vad)
* **min_length**: the minimum length of rest samples (in seconds)
* min_length = 2
* **max_length**: the maximum length of segments; longer ones are cut here. (in seconds)
* max_length = 10

### FEATS

* **type**: a comma-separated list of types of features; they will be column-wise concatenated
* type = ['os']
* possible values:
Expand Down Expand Up @@ -200,7 +206,7 @@
* **auddim**: [audEERING emotion model dimensions](https://arxiv.org/abs/2203.07378), wav2vec2.0 model finetuned on [MSPPodcast](https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html) arousal, dominance, valence
* **agender**: [audEERING age and gender model embeddings](https://arxiv.org/abs/2306.16962), wav2vec2.0 model finetuned on [several age databases](https://github.com/audeering/w2v2-age-gender-how-to), embeddings
* **agender.model** = ./agender/ (*path to the audEERING model folder*)
* **agender_agender**: [audEERING age and gender model age and gender predictions](https://arxiv.org/abs/2306.16962), wav2vec2.0 model finetuned on [several age and gendeer databases](https://github.com/audeering/w2v2-age-gender-how-to): age, female, male, child
* **agender_agender**: [audEERING age and gender model age and gender predictions](https://arxiv.org/abs/2306.16962), wav2vec2.0 model finetuned on [several age and gendeer databases](https://github.com/audeering/w2v2-age-gender-how-to): age, female, male, child
* **clap**: [Laion's Clap embedding](https://github.com/LAION-AI/CLAP)
* **xbow**: [open crossbow](https://github.com/openXBOW) features codebook computed from open smile features
* **xbow.model** = *path to xbow root folder (containing xbow.jar)*
Expand All @@ -224,35 +230,37 @@
* **standard**: z-transformation (mean of 0 and std of 1) based on the training set
* **robust**: robust scaler
* **speaker**: like *standard* but based on individual speaker sets (also for the test)
* **bins**: convert feature values into 0, .5 and 1 (for low, mid and high)
* **bins**: convert feature values into 0, .5 and 1 (for low, mid and high)
* **set**: name of opensmile feature set, e.g. eGeMAPSv02, ComParE_2016, GeMAPSv01a, eGeMAPSv01a
* set = eGeMAPSv02
* **level**: level of opensmile features
* level = functional
* possible values:
* **functional**: aggregated over the whole utterance
* **lld**: low-level descriptor: framewise
* **balancing**: balance the features with respect to [class distribution](https://imbalanced-learn.org/stable/)
* **balancing**: balance the features with respect to [class distribution](https://imbalanced-learn.org/stable/)
* balancing=smote
* possible values:
* **ros**: simply repeat random samples from the minority classes
* **smote**: *invent* new minority samples by little changes from the existing ones
* **adasyn**: similar to smote, but resulting in uneven class distributions
* **adasyn**: similar to smote, but resulting in uneven class distributions

### MODEL

* **type**: type of classifier
* type = svm
* possible values:
* **bayes**: Naive Bayes classifier
* **gmm**: Gaussian mixture classifier
* **bayes**: Naive Bayes classifier
* **gmm**: Gaussian mixture classifier
* GMM_components = 4
* GMM_covariance_type = [full | tied | diag | spherical](https://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_covariances.html)
* **knn**: k nearest neighbor classifier
* **knn**: k nearest neighbor classifier
* K_val = 5
* KNN_weights = uniform | distance
* **knn_reg**: K nearest neighbor regressor
* **tree**: Classification tree classifier
* **tree**: Classification tree classifier
* **tree_reg**: Classification tree regressor
* **svm**: Support Vector Machine
* **svm**: Support Vector Machine
* C_val = 0.001
* kernel = rbf # ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’
* **xgb**:XG-Boost
Expand Down Expand Up @@ -303,17 +311,20 @@
* device = 0
* **patience**: Number of epochs to wait if the result gets better (for early stopping)
* patience = 5
* **pretrained_model**: Base model for finetuning/transfer learning. Variants of wav2vec2, Hubert, and WavLM are tested to work. Default is facebook/wav2vec2-large-robust-ft-swbd-300h.
* **pretrained_model**: Base model for finetuning/transfer learning. Variants of wav2vec2, Hubert, and WavLM are tested to work. Default is facebook/wav2vec2-large-robust-ft-swbd-300h.
* pretrained_model = microsoft/wavlm-base
* **push_to_hub: For finetuning, whether model will be pushed to Huggingface. Default is `False`.
* push_to_hub = True

### EXPL

* **model**: Which model to use to estimate feature importance.
* model = ['log_reg'] # can be all models from the [MODEL](#MODEL) section, If they are combined, the mean result is used.
* **max_feats**: Maximal number of important features
* model = ['log_reg'] # can be all models from the [MODEL](#model) section, If they are combined, the mean result is used.
* **max_feats**: Maximal number of important features
* max_feats = 10
* **sample_selection**: Which sample set/split to use for feature importance, sample distribution, spotlight and feature distributions
* sample_selection = all # either all, train or test
* **feature_distributions**: plot distributions for features and analyze importance
* **feature_distributions**: plot distributions for features and analyze importance
* feature_distributions = True
* **permutation**: use [feature permutation](https://scikit-learn.org/stable/modules/permutation_importance.html) to determine the best features. Make sure to test the models before.
* permutation = True
Expand All @@ -326,7 +337,7 @@
* **plot_tree**: Plot a decision tree for classification (Requires model = tree)
* plot_tree = False
* **value_counts**: plot distributions of target for the samples and speakers (in the *image_dir*)
* value_counts = [['gender'], ['age'], ['age', 'duration']]
* value_counts = [['gender'], ['age'], ['age', 'duration']]
* **column.bin_reals**: If the column variable is real numbers (instead of categories), should it be binned? for any value in *value_counts* as well as the target variable
* age.bin_reals = True
* **dist_type**: type of plot for value counts, either histogram or density estimation (kde)
Expand All @@ -335,13 +346,16 @@
* spotlight = False
* **shap**: comopute [SHAP](https://shap.readthedocs.io/en/latest/) values
* shap = False
### [PREDICT](#predict)

### [PREDICT](#predict)

* **targets**: Speaker/speech characteristics to be predicted by some models
* targets = ['gender', 'age', 'snr', 'arousal', 'valence', 'dominance', 'pesq', 'mos']
* **sample_selection**: which split: [train, test, all]
* sample_selection = all

### EXPORT

* **target_root**: New root directory for the database, will be created
* target_root = ./exported_data/
* **orig_root**: Path to folder that is parent to the original audio files
Expand All @@ -352,10 +366,12 @@
* segments_as_files = False

### CROSSDB

* **train_extra**: add a additional training partition to all experiments in [the cross database series](http://blog.syntheticspeech.de/2024/01/02/nkululeko-compare-several-databases/). This extra data should be described [in a root_folders file](http://blog.syntheticspeech.de/2022/02/21/specifying-database-disk-location-with-nkululeko/)
* train_extra = ['addtrain_db_1', 'addtrain_db_2']

### PLOT

* **name**: special name as a prefix for all plots (stored in *img_dir*).
* name = my_special_config_within_the_experiment
* **epochs**: whether to make a plot each for every epoch result.
Expand All @@ -374,14 +390,16 @@
* format = png

### RESAMPLE

* **sample_selection**: which split: [train, test, all]
* sample_selection = all
* **replace**: whether samples should be replaced right where they are, or copies done and a new dataframe given
* replace = False
* replace = False
* **target**: the name of the new dataframe, if replace==false
* target = data_resampled.csv

### REPORT

* **show**: print the report at the end
* show = False
* **fresh**: start a new report
Expand Down
4 changes: 4 additions & 0 deletions nkululeko/models/model_tuned.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,9 @@ def train(self):
)
# criterion = torch.nn.CrossEntropyLoss()

# set push_to_hub value, default false
push = self.util.config_val("MODEL", "push_to_hub", False)

class Trainer(transformers.Trainer):
def compute_loss(
self,
Expand Down Expand Up @@ -266,6 +269,7 @@ def compute_loss(
load_best_model_at_end=True,
remove_unused_columns=False,
report_to="none",
push_to_hub=push,
)

trainer = Trainer(
Expand Down
38 changes: 32 additions & 6 deletions nkululeko/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,22 +11,32 @@

from nkululeko.constants import VERSION
from nkululeko.experiment import Experiment
from nkululeko.utils.files import find_files


def main(src_dir):
parser = argparse.ArgumentParser(
description="Call the nkululeko RESAMPLE framework.")
description="Call the nkululeko RESAMPLE framework."
)
parser.add_argument("--config", default=None,
help="The base configuration")
parser.add_argument("--file", default=None,
help="The input audio file to resample")
parser.add_argument("--replace", action="store_true",
help="Replace the original audio file")
parser.add_argument(
"--folder",
default=None,
help="The input directory containing audio files and subdirectories to resample",
)
parser.add_argument(
"--replace", action="store_true", help="Replace the original audio file"
)

args = parser.parse_args()

if args.file is None and args.config is None:
print("ERROR: Either --file or --config argument must be provided.")
if args.file is None and args.folder is None and args.config is None:
print(
"ERROR: Either --file, --folder, or --config argument must be provided."
)
exit()

if args.file is not None:
Expand All @@ -42,6 +52,20 @@ def main(src_dir):
util.debug(f"Resampling audio file: {args.file}")
rs = Resampler(df_sample, not_testing=True, replace=args.replace)
rs.resample()
elif args.folder is not None:
# Load all audio files in the directory and its subdirectories into a DataFrame
files = find_files(args.folder, relative=True, ext=["wav"])
files = pd.Series(files)
df_sample = pd.DataFrame(index=files)
df_sample.index = audformat.utils.to_segmented_index(
df_sample.index, allow_nat=False
)

# Resample the audio files
util = Util("resampler", has_config=False)
util.debug(f"Resampling audio files in directory: {args.folder}")
rs = Resampler(df_sample, not_testing=True, replace=args.replace)
rs.resample()
else:
# Existing code for handling INI file
config_file = args.config
Expand All @@ -66,6 +90,7 @@ def main(src_dir):

if util.config_val("EXP", "no_warnings", False):
import warnings

warnings.filterwarnings("ignore")

# Load the data
Expand All @@ -74,7 +99,8 @@ def main(src_dir):
# Split into train and test
expr.fill_train_and_tests()
util.debug(
f"train shape : {expr.df_train.shape}, test shape:{expr.df_test.shape}")
f"train shape : {expr.df_train.shape}, test shape:{expr.df_test.shape}"
)

sample_selection = util.config_val(
"RESAMPLE", "sample_selection", "all")
Expand Down

0 comments on commit 8d72dd6

Please sign in to comment.