Skip to content

OCRComparison/dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

How to Compare OCR Tools: Tesseract OCR vs Amazon Textract vs Azure OCR vs Google OCR

https://ricciuti-federico.medium.com/how-to-compare-ocr-tools-tesseract-ocr-vs-amazon-textract-vs-azure-ocr-vs-google-ocr-ba3043b507c1

This is the code to download the FUNSD dataset and extract the dataset used for the OCR comparison.

Clone the repository

git clone https://github.com/OCRComparison/dataset.git
cd dataset

Download FUNSD Dataset

wget https://guillaumejaume.github.io/FUNSD/dataset.zip -O dataset.zip

Unzip FUNSD Dataset

unzip dataset.zip -d ./FUNSD/

Extraction of the dataset for the OCR comparison

python extract_dataset.py --input_path ./FUNSD/dataset/ --output_path ./OCRDataset/

About

repository for the creation of the datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages