Skip to content

akitenkrad/rsrpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rust Research Paper Parser (rsrpp)

CircleCI Crates.io Version

RuSt Research Paper Parser (rsrpp)

The rsrpp library provides a set of tools for parsing research papers.

LOGO

Quick Start

Pre-requirements

  • Poppler: sudo apt install poppler-utils
  • OpenCV: sudo apt install libopencv-dev clang libclang-dev

Installation

To start using the rsrpp library, add it to your project's dependencies in the Cargo.toml file:

cargo add rsrpp

Then, import the necessary modules in your code:

extern crate rsrpp;
use rsrpp::parser;

Examples

Here is a simple example of how to use the parser module:

let mut config = ParserConfig::new();
let url = "https://arxiv.org/pdf/1706.03762";
let pages = parse(url, &mut config).await.unwrap(); // Vec<Page>
let sections = Section::from_pages(&pages); // Vec<Section>
let json = serde_json::to_string(&sections).unwrap(); // String

Tests

The library includes a set of tests to ensure its functionality. To run the tests, use the following command:

cargo test

License: MIT

Releases

1.0.12
  • Fixed a bug: remove unused println!.
1.0.11
  • Fixed a bug in xml loop to finish when the file reaches to end.
1.0.10
  • Added verbose mode.
  • Fixed a bug in the process extracting page number.
1.0.9
  • Updated: implemented new errors to handle invalid URLs.
1.0.8
  • Updated: The max retry time for saving PDF files has been increased.
1.0.7
  • Fix bugs: After converting to PDF, the program now waits until processing is complete.
1.0.4
  • Fixed bugs in get_pdf_info.
  • Made minor improvements.
1.0.3
1.0.2
  • Updated the Section module. content: String was replaced by content: Vec<TextBlock>.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages