This code can be used to convert PDFs into audio podcasts, lectures, summaries, and more. It uses OpenAI's GPT models for text generation and text-to-speech conversion. You can also edit a draft transcript (multiple times) and provide specific comments, or overall directives on how it could be adapted or improved.
- Upload multiple PDF files
- Choose from different instruction templates (podcast, lecture, summary, etc.)
- Customize text generation and audio models
- Select different voices for speakers
- Iterate on the draft via specific or general commments, and/or edits to the transcript and specific feedback to the model for improvements
Follow these steps to set up PDF2Audio on your local machine using Conda:
-
Clone the repository:
git clone https://github.com/lamm-mit/PDF2Audio.git cd PDF2Audio
-
Install Miniconda (if you haven't already):
- Download the installer from Miniconda website
- Follow the installation instructions for your operating system
- Verify the installation:
conda --version
-
Create a new Conda environment:
conda create -n pdf2audio python=3.9
-
Activate the Conda environment:
conda activate pdf2audio
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up your OpenAI API key: Create a
.env
file in the project root directory and add your OpenAI API key:OPENAI_API_KEY=your_api_key_here
To run the PDF2Audio app:
-
Ensure you're in the project directory and your Conda environment is activated:
conda activate pdf2audio
-
Run the Python script that launches the Gradio interface:
python app.py
-
Open your web browser and go to the URL provided in the terminal (typically
http://127.0.0.1:7860
). -
Use the Gradio interface to upload a PDF file and convert it to audio.
- Upload one or more PDF files
- Select the desired instruction template
- Customize the instructions if needed
- Click "Generate Audio" to create your audio content