-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Welcome to the AudFeature_extraction wiki!
The feature that it is extracting on the basis of pitch value are :
-
min_pitch
-
max_pitch
-
mean_pitch
-
num_voice_breaks
-
percentage_breaks
-
speak_rate
-
num_pause
-
Total_dur_pause
-
no. of rise
-
no. of fall
-
total duration of the audio file
-
play_time Logic for finding differnt features are :
-
min_pitch = just apply the function min over all the numpy array value
-
max_pitch = just apply the function max over all the numpy array value
-
mean_pitch = just apply the fucntion mean over all numpy array value
-
num_voice_breaks = in order to find this value what i did is that whenever there is pitch changes from zero to some value and some value to zero then it means there is some sort of voice breaks and i counted all the occurences and displayed it.
-
percentage_breaks = total number of voice breaks divides by the lenght of the numpy value.
-
speak rates = it means we have to find the words per minutes for this what i did is i converted the spoken word into text and then count the total play time by subtracting pause_time from duration_time and then dividing the lenght of the word by the paly time it basically display the word spoken per second.
-
num_pause = in order to find this value the simple logic that i applied is when pitch is zero it is pause time
-
Total_dur_pause = for this I find the corrosponding time when pitches are zero and then add all the corrosponding value and got the Total_dur_pause
-
duration_file = divide the total number of frames with frame rate
-
play_time = for this subtrace the pause time from duration of the audion file.
-
num_rise = when pitch is incrasing means rise since it depennds on the frequensy as well as the amplitude
-
num_fall = when pitch is decreasing.
This file is used to represent the audio in different format like spectrogram, spectrogram roll off, spectrogram centroid, mfcc etc.It uses the library librosa in python See the result by running this code
foo@bar python audio_graph.py /audio/human.wav
The above code will dispay some of the important features in terms of graph that is :
- spectrogram
- Zero cross rating
- Zoomed in views
- Spectral centroid
- Spectral roll off
- MFCC
The importnace of spectrogram graph is that it can easily be used as an input feature to any neural networl which can be used to extract some important features.
This file contain the code that can be used to extract some of the measure and important features from the audio file. The importance is that these features are more important than the other since it contains most of the features that is enough when we train the model.
The features that it extracting are :
- ZCR
- Energy
- Entropy of energy
- Spectral centroid
- Spectral Entropy
- Spectral Flux
- Spectral Roll off
- MFCC
- Chroma vector
- Chroma deviation or 'zcr', 'energy', 'energy_entropy', 'spectral_centroid', 'spectral_spread', 'spectral_entropy', 'spectral_flux', 'spectral_rolloff', 'mfcc_1', 'mfcc_2', 'mfcc_3', 'mfcc_4', 'mfcc_5', 'mfcc_6', 'mfcc_7', 'mfcc_8', 'mfcc_9', 'mfcc_10', 'mfcc_11', 'mfcc_12', 'mfcc_13', 'chroma_1', 'chroma_2', 'chroma_3', 'chroma_4', 'chroma_5', 'chroma_6', 'chroma_7', 'chroma_8', 'chroma_9', 'chroma_10', 'chroma_11', 'chroma_12', 'chroma_std' The value will be displayed in terms of an array.