🎙️ `paper-to-audio`

Convert academic papers and PDFs to audio using GROBID and OpenAI's TTS API.

Features:

Encodes chapter marks into MP3.
Adapts formulas to be more suitable for TTS using OpenAI's GPT-4 LLM.
Generates cover image using OpenAI's Dall-E.

Important

Requires an OpenAI API key. A 10 page research paper will be ~30-60 minutes in length and cost $0.50 to $1 to generate.

Preview

The following is a 30 second snippet generated from a PDF (enable audio in video controls):

preview.mp4

Example of generated cover images:

Screenshot of CLI:

Example of chapters as seen in an audiobook player:

Setup

Requirements:

NodeJS (tested with v21) and Yarn v1 installed
ffmpeg installed, with libmp3lame enabled
OpenAI API key

Create a .env with your OpenAI key:

OPENAI_API_KEY=...

Install dependencies:

yarn install

Run with the path to your PDF file:

yarn start ./data/Example/Example.pdf

The MP3 file will be saved in the same directory as the PDF, e.g. ./data/Example/Example.mp3.

Data used for generating the file are stored in ./intermediate, and will be re-used in successive runs.

Optional configuration

Currently, parameters are configured using environment variables instead of command-line parameters. In addition to the OpenAI API key, the following environment variables can be set:

GROBID_URL=http://localhost:8070 # defaults to https://kermitt2-grobid.hf.space
TTS_VOICE=echo # Defaults to random voice, see options at https://platform.openai.com/docs/guides/text-to-speech/voice-options
TTS_MODEL=tts-1
IMAGE_MODEL=dall-e-3
LLM_MODEL=gpt-4o # Used for adapting formulas
INCLUDE_FIGURES=true # Include figure texts in audio. Defaults to false.
SKIP_CITATIONS=true # Skip in-text citations. Defaults to false.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
previews		previews
src		src
test		test
.gitignore		.gitignore
README.md		README.md
eslint.config.js		eslint.config.js
package.json		package.json
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ `paper-to-audio`

Preview

Setup

Optional configuration

About

Releases

Packages

Languages

0x31/paper-to-audio

Folders and files

Latest commit

History

Repository files navigation

🎙️ paper-to-audio

Preview

Setup

Optional configuration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

🎙️ `paper-to-audio`

Packages