GitHub - matthewmcq/transient-classifier: Uses Constant Q Transform assisted transient detection along with KMeans clustering with silhouette score to identify different types of transients in an audio file based off Mel Spectrogram analysis and Wavelet features

Preprocessing:

Audio files are loaded in with librosa.load() (librosa is the most popular audio/signal processing python library).
We then time stretch the signal by a factor of 3/4 to allow for better transient detection of quick subsequent drums.
We use the Constant-Q Transform to define an onset envelope for the audio signal and plug that into librosa.onset_detect to detect transients with greater accuracy than just using the unaltered signal amplitudes.
Once the transients have been located, we record the onset frames and then convert their frame number to the corresponding time in seconds at which the transient occurs.
We then use the last frame in onset_frames to denote offet_frames (i.e. the end of the transient), where the space between onset and offset frames would yield the time between transient events (what we care about for this research).

For each transient, we compute the wavelet transform and the Mel-frequency cepstral coefficients. This allows us to group transients based on their spectral composition.
We then normalize the wavelet coefficients with zero padding, as some transients encode more frequency information than others. Finally, we concatenate the MFCC and wavelet coefficients to create an aggregated feature representation for each transient.

Once we have the spectral features, we use the scikit-learn StandardScalar() object to allow for KMeans clustering to work with the two different features (essentially, this normalizes the features so that MFCC and wavelet coefficients are “worth” the same amount during clustering).
We then apply Principal Component Analysis to aid in KMeans clustering.
From here, we compute the KMeans clusters from min_cluster to max_cluster number of unique labels/clusters. This allows for better generalization between audio files and eliminates the need to manually count the number of unique sounds, which is generally not feasible or efficient.
We then compare the silhouette scores of each clustering to identify which number of KMeans clusters most accurately corresponds to the number of distinct transients in the audio signal.
Finally, we graph the cluster distributions and replot the waveform with each transient colored according to its KMeans cluster. For this example, the yellow transients correspond to the woodpecker drums

We then locate the cluster that corresponds to the drum (with two clusters this would just be the second label to occur) and record all onset_times corresponding to that label along with the time between onsets
Finally we determine the centroid of each cluster and write that to an mp3 for qualitative assessment of the classifiers accuracy for what a cluster is

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
drums		drums
transient_examples		transient_examples
.DS_Store		.DS_Store
128653_XC_Celeus_elegans_ONSET_DATA.csv		128653_XC_Celeus_elegans_ONSET_DATA.csv
169158_ML_Blythipicus_pyrrhotis_ONSET_DATA.csv		169158_ML_Blythipicus_pyrrhotis_ONSET_DATA.csv
335541_XC_Picoides_dorsalis_ONSET_DATA.csv		335541_XC_Picoides_dorsalis_ONSET_DATA.csv
Celeus elegans.mp3		Celeus elegans.mp3
LICENSE		LICENSE
README.md		README.md
XC19801 - Pale-headed Woodpecker - Gecinulus grantia_ONSET_DATA.csv		XC19801 - Pale-headed Woodpecker - Gecinulus grantia_ONSET_DATA.csv
melwavelets.ipynb		melwavelets.ipynb