Skip to content

This project uses a pre-trained diarization pipeline and embedding model to detect and cluster speakers from audio, assigning unique IDs per speaker. Speaker embeddings are clustered with DBSCAN based on cosine similarity, and audio metadata is managed via SQLAlchemy for database interactions.

Notifications You must be signed in to change notification settings

MrBinit/Speaker-embedding-and-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Code Overview

Main Components:

  • Diarization Pipeline: Uses the pre-trained pyannote/speaker-diarization-3.1 pipeline for diarization.
  • Embedding Model: Uses the pre-trained pyannote/embedding model to extract speaker embeddings.
  • Clustering: Embedding vectors are clustered using DBSCAN based on cosine similarity.
  • Database Interaction: The script interacts with a database to retrieve and update audio file metadata using SQLAlchemy.

About

This project uses a pre-trained diarization pipeline and embedding model to detect and cluster speakers from audio, assigning unique IDs per speaker. Speaker embeddings are clustered with DBSCAN based on cosine similarity, and audio metadata is managed via SQLAlchemy for database interactions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages