This GUI is designed to make it easy to segment vocalizations across multiple channels from multiple experimental sources.
-
git clone https://github.com/theunissenlab/soundsep.git
-
cd soundsep
-
virtualenv env -p python3
-
source env/bin/activate
-
pip install -r requirements.txt
- on ubuntu you may need to also install PortAudio library
sudo apt-get install libportaudio2
- Use an installer found at https://github.com/kevinyu/soundsep/releases
-
If you haven't used them before, you may need to install git, pip, and virtualenv. On mac, install homebrew (https://brew.sh/) and then
brew install git
. Pip can be installed withsudo -H python -m ensurepip
, and virtualenv can be installed via pip withpip install virtualenv
. -
You may need to install PortAudio. On ubuntu:
sudo apt-get install libportaudio2
-
You may need to install Qt5 on ubuntu.
sudo apt-get install qt5-default
-
If the program crashes, you can reload an autosaved file by going to File > Load Sources and loading the file "outputs/save.npy.autosave.npy". The program autosaves the sources table every 10 seconds in that file.
-
source env/bin/activate
-
fbs run
This app reads audio files from a directory (make sure you are opening the containing directory, not a specific file). What they all have in common that you open a directory containing some organization of subdirectories, wav files, or other file formats. The code that determines how the data is read is located in code/app/main.py
. These are the formats:
In this structure, each wav file only has one microphone channel. The recordings from each channel are separated into different directories, and each channel has the same number of wav files (marked by timestamp).
toplevel/
ch0/
[timestamp1].wav
[timestamp2].wav
...
ch1/
[timestamp1].wav
[timestamp2].wav
...
...
In this structure, each wav file only contains one channel of audio data, but each wav file actually spans the entire recording (continuous recording).
toplevel/
ch0.wav
ch1.wav
...
In this structure, there can be one or more wav files representing different time periods of audio, and each wav file contains 1+ channels of audio data.
toplevel/wavs/
[timestamp0].wav
[timestamp1].wav
...
This format is specific to our songephys data pipeline, where audio data is written to disk as a numpy memmapped array and accessible with a pickled python object for reading the data (lazy.npy).
toplevel/
lazy.npy
The toplevel directory here looks something like data/[subject]/sites/[site]/
and the memmap arrays are located in data/[subject]/sessions/*/raw/mic.memmap
.
The application organizes the vocal period data by the concepts of "sources" (audio sources, which in our case are individual vocalizers) and "channels" (audio channels in the recording). For each source, the user maps it onto a channel that will be displayed for it (i.e., the microphone attached to that vocalizer, or pointing to that vocalizer). Multiple sources may be assigned to the same channel - they will display the same audio data and it is up to the user to distinguish the vocalizations from the two subjects.
To create add a source, click the orange (+) icon in the left panel. This should create a new row with a prefilled name and channel. Click the edit button to change the name of the source and the assigned channel, and click it again to confirm. The channel can be hidden with the middle button, or deleted with the trash can button.
An audio period can be selected by clicking and dragging on the spectrogram window.
When Amp Env view is on, you drag a threshold line across the amplitude envelope shown. When Amp Env view is off, you select a region of time-frequency space. This region is used to detect individual vocal intervals. Clicking again (not dragging) on the spectrogram will clear the current selection.
When a region is selected, you can right click for some basic options, or use the keyboard shortcuts described in the following section.
Note: If you are having a hard time getting intervals to be detected, try these:
-
selecting a narrower / wider frequency range or time range in the normal view
-
drawing threshold directly in the amp env view
-
if there is a high noise floor, using
Z
in the normal view -
Using
Shift+W
andShift+S
to manually increase and decrease the amp env threshold
The keyboard shortcuts in this are designed to have your left hand on the keyboard and your right hand on the mouse.
A
and D
: Move earlier and later in time
W
and S
: Automatically segment into calls (both do the same)
Q
: Merge selected segments
E
: Toggle between amplitude envelope mode on/off
F
: "Force"-label (manually label) the selected segment (useful if nothing else works)
Z
: Segment into calls if you're desperate (better in broadband noise)
X
: Delete calls in selected time range
Shift+W
, SHIFT+S
: Increase and decrease (respectively) the amplitude envelope threshold value used in this section
M
: Toggle between automatic dragging mode (click and drag selection automatically triggers call segmentation when released)
Space
: Play currently selected audio
Esc
: Stop playing audio
(This functionality needs improvement still)
In a long recording, it may be helpful to detect periods where there might be vocalizations first, and then focusing your attention on those periods. To do this, you can click the button that says "Detect Potential Vocal Periods" in the bottom panel. The application will freeze for a bit while it does some stuff, and then will create buttons that will jump you to relevant periods and highlight them with a green marker.
Data is exported into a pandas DataFrame saved in a pickle file or csv. The data has columns "source_name", "source_channel", "t_start", and "t_stop" for all labeled intervals.
To export the data, go to File > Export CSV
or File > Export Pickle
.