SCREAM: SpeeCh Recognition and Enhancement for Audio Messages

Project Overview

SCREAM (SpeeCh Recognition and Enhancement for Audio Messages) is designed to enhance and optimize digital communication by transcribing and refining spoken content. In a society that increasingly relies on fast, efficient communication through digital messenger services, SCREAM leverages advanced speech recognition and content enhancement algorithms to accelerate the exchange of information.

Project Structure

datasynth/: Custom dataset synthesis for training and evaluation purposes.
evaluation/: Scripts for assessing transcription and speech processing performance.
notebooks/: Misc Jupyter notebooks
telegram_bot/: Telegram bot for transcribing and summarizing audio messages.
utils/: General utility functions that support various modules of the project.

Key Functionalities

SCREAM is tailored for processing spoken content, making it ideal for enhancing communication in digital formats like audio messages. Key functionalities include:

Automatic Speech Recognition (ASR): SCREAM downloads and transcribes audio messages using state-of-the-art speech recognition models like OpenAI Whisper and its optimized variants.
Content Enhancement: Post-processing algorithms refine the transcription by removing filler words, pauses, and other superfluous elements, resulting in concise and coherent text.
Audio Segmentation: Long audio messages are segmented into manageable parts, allowing for better organization and easier processing of content.
Evaluation: Includes quality metrics and tools for evaluating the performance of the transcription models, ensuring high accuracy and reliability.
Telegram Integration: SCREAM includes a Telegram bot that allows users to transcribe and summarize audio messages directly from the messaging platform.

Installation

1. Clone the repository

git clone <repository-url>

2. Set up the virtual environment

Create and activate a virtual environment for package management:

# macOS/Linux
python -m venv .venv
source .venv/bin/activate

# Windows
python -m venv .venv
.venv\Scripts\activate

3. Install dependencies

Once the virtual environment is activated, install all required packages:

pip install -r requirements.txt

Or install the packages with uv:

pip install uv
uv sync

4. Set up the .env file

Create a .env file in the root directory and add the following environment variables:

TELEGRAM_BOT_API_KEY="your_telegram_bot_api_key"
GEMINI_API_KEY="your_gemini_api_key"

Usage

SCREAM facilitates speech-to-text transcriptions for audio messages, processes and refines the transcriptions, and evaluates the output for accuracy. Here is how you can use the key functionalities:

Download and Process YouTube Audio: Extract and segment audio content from YouTube videos using SCREAM’s built-in downloading functions.
Transcription and Refinement: Use advanced speech recognition algorithms to transcribe audio content into text, followed by refinement to remove unnecessary elements.
Evaluation: Assess the quality of the transcriptions using evaluation scripts and metrics provided in the evaluation/ directory.
Telegram Bot Integration: Deploy the Telegram bot to transcribe and summarize audio messages directly from the messaging platform.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.vscode		.vscode
assets		assets
datasynth		datasynth
evaluation		evaluation
notebooks		notebooks
telegram_bot		telegram_bot
utils		utils
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SCREAM: SpeeCh Recognition and Enhancement for Audio Messages

Project Overview

Project Structure

Key Functionalities

Installation

1. Clone the repository

2. Set up the virtual environment

3. Install dependencies

4. Set up the .env file

Usage

About

Releases

Packages

Contributors 2

Languages

ilyii/scream

Folders and files

Latest commit

History

Repository files navigation

SCREAM: SpeeCh Recognition and Enhancement for Audio Messages

Project Overview

Project Structure

Key Functionalities

Installation

1. Clone the repository

2. Set up the virtual environment

3. Install dependencies

4. Set up the .env file

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages