Layout Aware PDF (LAPDF) Extraction

Installation Instructions

This is a Maven project and should be installed by issuing the following commands:

$ git clone https://github.com/SciKnowEngine/lapdftext/
$ cd lapdftext
$ mvn clean install assembly:assembly

This will build the jar archive file: target/lapdftext-1.8.0-SNAPSHOT-jar-with-dependencies.jar

You can execute commands against this library to run extraction tasks from PDF files.

Executing commands against the assembled jar file takes the form:

java -cp path/to/lapdftext-1.8.0-SNAPSHOT-jar-with-dependencies.jar edu.isi.bmkeg.lapdf.bin.<COMMAND> options

where COMMAND could be

Blockify - constructs text blocks from PDF files and outputs them as XML-formatted files.
BlockifyClassify - executes Blockify but also runs rule-based classification on blocks
BlockStatistics - provides statistics about each block
ExtractFigureImagesFromFile - extracts images of figures from PDF-based scientific articles.

Details of each command is described in usage documentation available by running the code without options.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
etc		etc
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
general.drl		general.drl
pom.xml		pom.xml