This is a fork from the MTG-Jamendo Dataset that aims on fine-tuning PaSST on it.
From the scripts/passt run preprocess.py:
python3 src/preprocess.py --in_folder /path/to/mtg/jamendo/dataset/audio/folders --out-folder /where/to/store/the/preprocessed/files
This will, using all your CPU threads, convert the original audios into 32KHz, mono, 10s, WAV segments. Using the --remove
flag will remove the original dataset files as soon as they get processed.
From the scripts/passt run get_dicts.py:
python3 get_dicts.py --in-folder /where/you/stored/the/preprocessed/files
This is a modified version of the original code from the MTG dataset. Here I check if the audios exist instead of the mel spectrograms. If the audios don't exist they will be skipped, allowing for training on a subset. The script will create dicts in .picke format, inside the splits folders, that will be used by the DataLoader. Their keys are indexes, from 0 to N, where N is the number of audio segments. Their values store other dicts, containing the path to the audio in the format folder_inside_the_dataset/track_id
and the tags for that track in one-hot encoding.
python3 main.py --in-folder /where/you/stored/the/preprocessed/files --batch-size B