Speaker identification

Implementation of CNN for speaker identification in TensorFlow. Trained on the example data of 20 speakers, the network correctly identifies about 90% of the speakers.

Setup

To install the package run

pip install git+https://github.com/papeye/speaker-identification.git@setup-package

The example demonstrating the preformance of the package is found in example.py.

Usage

The package provides the class SpeakerIdentifier, which initializes a CNN for speaker identification. This class requires a path to a directory containing training audio files in .wav format. The audio file names should represent the speakers' names:

user1 = SpeakerIdentifier(model_name=<model_name>, training_ds_dir=<train_data_dir>)

Note

Only single file per speaker is supported for training at the moment

Important

It is assumed that only one person is speaking in an audio file

The audio files are subsequently processed by optional Voice Activity Detection (optional) to remove silence and split into 1s segments which are used for the training. Optional data augmentation is possible by adding noise to the training audios (configurable in Config file).

The training is done via train method of SpeakerIdentifier:

user1.train(train_data_dir=<train_example_dir>, with_vad=True)

The trained model can be used to predict the speaker from the audio file by calling

predictions = user1.predict(test_data_dir=<test_example_dir>, with_vad=True)

which returns a class Result mapping each speaker directory to a dictionary of predicted speaker labels and their normalized probabilities, sorted in descending order of probability. Class Results has best_prediction getter which returns the most probable speaker.

For contributors

The number of epochs can be adjusted in the Config file, but on the example audio 90% accuracy is obtained for only 5 epochs (which is default). Convolutional neural network is defined in nnmodel.py file using residual blocks and its size is chosen for identifying 20 speakers but users are encouraged to experiment with hyperparameters for the best performance. If available, weights from previous training run will be used as for initialization of the NN with replaced output layer.

If user provides less then 20 speakers, base speakers are added to the training dataset to achive the best consistent performance. However if user provides more than 20 speakers in the training_ds_dir no base speakers will be added to the training dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 233 Commits
.github/workflows		.github/workflows
example_data		example_data
junit		junit
speaker_identifier		speaker_identifier
tests		tests
.coverage		.coverage
.gitignore		.gitignore
example.py		example.py
readme.md		readme.md
requirements-no-intel.txt		requirements-no-intel.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker identification

Setup

Usage

For contributors

About

Releases

Packages

Contributors 3

Languages

papeye/speaker-identification

Folders and files

Latest commit

History

Repository files navigation

Speaker identification

Setup

Usage

For contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages