Haptipedia Extractor

A python wrapper for extracting metadata, text, section titles, figures, and references from Haptic Device Research Papers. Uses PDFFigures2.0 for extraction of figures and figure captions and GROBID for extraction of references, section text and titles. Also has a cross-reference function to find connections between given paper inputs (which papers cited each other and how many times, shared authors and references between papers).

For More Information: https://haptipediaextractor.readthedocs.io/en/latest/

Usage

Set appropriate settings and directories for input and output files in ConfigPaths.py
Change directory to src and run main.py

Dependencies

Prereqs

Python 3.5
subprocess32 package (pip install subprocess)

Python Libraries

Psycopg2 (for connecting to the database)
Requests Library

Installation

Clone the repo on the machine
Have GROBID running in the background somewhere

GROBID

Grobid is used to extract metadata, text and citations from PDF files. Grobid should be running as a service somwhere. (See Grobid's Github project for more complete installation instructions.)

PDFFigures2.0

Pdffigures2.0 is used to extract figures, tables and captions from PDF files. It should be installed as directed by the pdffigures2 Github page. The path to the pdffigures2 binary can be configured in ConfigPaths.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Haptipedia Extractor

Usage

Dependencies

Prereqs

Python Libraries

Installation

GROBID

PDFFigures2.0

Files

README.md

Latest commit

History

README.md

File metadata and controls

Haptipedia Extractor

Usage

Dependencies

Prereqs

Python Libraries

Installation

GROBID

PDFFigures2.0