SCREAM (SpeeCh Recognition and Enhancement for Audio Messages) is designed to enhance and optimize digital communication by transcribing and refining spoken content. In a society that increasingly relies on fast, efficient communication through digital messenger services, SCREAM leverages advanced speech recognition and content enhancement algorithms to accelerate the exchange of information.
datasynth/
: Custom dataset synthesis for training and evaluation purposes.evaluation/
: Scripts for assessing transcription and speech processing performance.notebooks/
: Misc Jupyter notebookstelegram_bot/
: Telegram bot for transcribing and summarizing audio messages.utils/
: General utility functions that support various modules of the project.
SCREAM is tailored for processing spoken content, making it ideal for enhancing communication in digital formats like audio messages. Key functionalities include:
-
Automatic Speech Recognition (ASR): SCREAM downloads and transcribes audio messages using state-of-the-art speech recognition models like OpenAI Whisper and its optimized variants.
-
Content Enhancement: Post-processing algorithms refine the transcription by removing filler words, pauses, and other superfluous elements, resulting in concise and coherent text.
-
Audio Segmentation: Long audio messages are segmented into manageable parts, allowing for better organization and easier processing of content.
-
Evaluation: Includes quality metrics and tools for evaluating the performance of the transcription models, ensuring high accuracy and reliability.
-
Telegram Integration: SCREAM includes a Telegram bot that allows users to transcribe and summarize audio messages directly from the messaging platform.
git clone <repository-url>
Create and activate a virtual environment for package management:
# macOS/Linux
python -m venv .venv
source .venv/bin/activate
# Windows
python -m venv .venv
.venv\Scripts\activate
Once the virtual environment is activated, install all required packages:
pip install -r requirements.txt
Or install the packages with uv:
pip install uv
uv sync
Create a .env
file in the root directory and add the following environment variables:
TELEGRAM_BOT_API_KEY="your_telegram_bot_api_key"
GEMINI_API_KEY="your_gemini_api_key"
SCREAM facilitates speech-to-text transcriptions for audio messages, processes and refines the transcriptions, and evaluates the output for accuracy. Here is how you can use the key functionalities:
-
Download and Process YouTube Audio: Extract and segment audio content from YouTube videos using SCREAM’s built-in downloading functions.
-
Transcription and Refinement: Use advanced speech recognition algorithms to transcribe audio content into text, followed by refinement to remove unnecessary elements.
-
Evaluation: Assess the quality of the transcriptions using evaluation scripts and metrics provided in the
evaluation/
directory. -
Telegram Bot Integration: Deploy the Telegram bot to transcribe and summarize audio messages directly from the messaging platform.