AI-Powered Podcast Generator: A Python-based tool that converts text scripts into realistic audio podcasts using Google's Generative AI API. This project leverages advanced text-to-speech technology to create dynamic, multi-speaker conversations with customizable voices.
Features:
- Text-to-speech conversion using Google's Generative AI
- Support for multiple speakers with distinct voices
- Automatic audio file generation and combination
- Customizable voice selection
- Robust error handling and retry mechanisms
Prerequisites:
- Python 3.8 or higher
- FFmpeg installed and accessible in system PATH
- Google API key for Generative AI services
System Dependencies:
Windows:
- Microsoft Visual C++ 14.0 or greater
- FFmpeg
Linux:
sudo apt-get install portaudio19-dev python3-dev ffmpeg
macOS:
brew install portaudio ffmpeg
Installation:
- Clone the repository:
git clone https://github.com/agituts/gemini-2-tts.git
cd gemini-2-tts
- Create and activate virtual environment:
For Windows:
python -m venv venv
.\venv\Scripts\activate
For Linux/MacOS:
python3 -m venv venv
source venv/bin/activate
- Install required Python packages:
pip install -r requirements.txt
- Create a .env file in the project root:
GOOGLE_API_KEY=your_google_api_key_here
VOICE_A=Puck # Optional: Default is Puck; Current options are Puck, Charon, Kore, Fenrir, Aoede
VOICE_B=Kore # Optional: Default is Kore; Current options are Puck, Charon, Kore, Fenrir, Aoede
Note: To deactivate the virtual environment when you're done, simply run:
deactivate
Project Structure:
podcast_script.txt: Contains the conversation script in the format:
Speaker A: Welcome to our podcast! Today we'll be discussing...
Speaker B: Thanks for having me! I'm excited to...
Speaker A: Let's start with...
Speaker B: That's an interesting point...
system_instructions.txt: Contains system-level instructions for voice generation in the format:
You are a real-time energetic and enthusiastic narrator for a podcast.
The entire podcast script is provided below this instruction.
Your job is to narrate only the specific dialogue line provided to you in subsequent messages, responding immediately as if in real-time, using a natural, friendly, and engaging tone.
When narrating, use the context of the entire podcast script to inform your delivery.
Speak smoothly and conversationally, not like you are reading off a script.
Pause naturally at commas, periods, and question marks.
Vary your pacing slightly as a person would in real conversation.
Do not narrate anything assigned to other speakers or identify which speaker is talking.
Only narrate the specific dialogues provided to you.
Do not introduce yourself or any other speaker; simply speak the dialogues as you receive them, as if they were being spoken in that moment.
The script is designed for a podcast and contains conversational exchanges between speakers.
Do not add any additional information unless asked.
Remember, you must receive and acknowledge the full script first before you begin receiving and narrating individual dialogue lines.
.env: Environment variables configuration requirements.txt: Python package dependencies
Usage:
- Prepare your conversation script in podcast_script.txt
- Run the generator:
python app.py
- Find the generated podcast as final_podcast.wav
Environment Variables:
Create a .env file with the following variables:
GOOGLE_API_KEY=your_google_api_key_here
VOICE_A=Puck # Optional: Default is Puck; Current options are Puck, Charon, Kore, Fenrir, Aoede
VOICE_B=Kore # Optional: Default is Kore; Current options are Puck, Charon, Kore, Fenrir, Aoede
Error Handling:
The system automatically retries on connection failures Maximum retry attempts: 3 Temporary files are automatically cleaned up
Output:
Individual speaker audio files are generated temporarily Final output is combined into final_podcast.wav All temporary files are automatically cleaned up
License:
MIT License
Contributing:
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a new Pull Request