Res-X

research paper pdf ——> video (ai image + text-to-speech)

Res-X (Research Explanation) is a project that turns research papers into videos. It implements three types of Machine Learning models: Optical Character Recognition (Tesseract & LaTeX OCR), Text-To-Speech, and Text-To-Image (StableDiffusion).

Usage

Upload a PDF document and Res-X will generate a video. Works with papers that have LaTeX-generated formulas. (URL input options coming soon.)

It’s specifically made with researchers and students in mind due to the overwhelming expectation that they constantly stay up-to-date with new papers without enough time to read/parse everything. It may be particularly helpful for papers from an industry that the user is unfamiliar with, or for people who are visual learners/processors.

Unlike other platforms, Res-X is tailored to research papers and seeks to compartmentalize the input PDF and it works for papers with LaTeX-generated formulas (like math/physics/compsci).

As a published researcher and visual learner, I definitely find Res-X useful.

Example with my paper's abstract:

resx_vid.mp4

Fu C, Davy A, Holmes S, Sun S, Yadav V, et al. (2021) Dynamic genome plasticity during unisexual reproduction in the human fungal pathogen Cryptococcus deneoformans. PLOS Genetics 17(11): e1009935. https://doi.org/10.1371/journal.pgen.1009935

Requirements

scipy, torch, wkhtmltopdf, coqui-ai TTS, ffmpeg, moviepy, diffusers, os, requests, imgkit, io, sys, pdf2image, fake_useragent, re, cv2, math, string, imutils, numpy, regex, PIL, pandas, statistics, pytesseract, pix2tex, itertools, IPython, matplotlib, fontTools, LaTeX OCR, diffusers (huggingface)

Reflection

Some of the major roadblocks I faced ultimately determined my approach:

Started off web-scraping but could only scrape from some sites —> limited inputs to PDFs only
PDF parsing libraries were inconsistent and often required perfectly formatted PDFs —> used document images and cv2 Contours library to parse
cv2 Contours helped identify figures in doc image but not modern/small/medium tables —> acquiring table bounding boxes was time-intensive
Text-To-Image required a GPU (which I don’t have) —> relied on Google Colab’s free TPU for that segment

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
grids		grids
respdf		respdf
README.md		README.md
TTI_Colab.py		TTI_Colab.py
demo.py		demo.py
df2vid.py		df2vid.py
docim2txt.py		docim2txt.py
docimfigs.py		docimfigs.py
docimtables.py		docimtables.py
pdf2img.py		pdf2img.py
resx_init.py		resx_init.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Res-X

research paper pdf ——> video (ai image + text-to-speech)

Usage

Example with my paper's abstract:

Requirements

Reflection

About

Releases

Packages

Languages

aaliyah-davy/resx

Folders and files

Latest commit

History

Repository files navigation

Res-X

research paper pdf ——> video (ai image + text-to-speech)

Usage

Example with my paper's abstract:

Requirements

Reflection

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages