RoundTripOCR

RoundTripOCR: A Data Generation Technique for Enhancing Post-OCR Error Correction in Low-Resource Devanagari Languages.

This repository supports research and development in the field of post-OCR error correction, especially focusing on low-resource Devanagari languages. The RoundTripOCR technique generates synthetic datasets that replicate real-world OCR errors, enabling robust training and evaluation for error correction models.

Paper link - https://aclanthology.org/2024.icon-1.33

Dataset Links

The following datasets have been generated using the RoundTripOCR technique. They are hosted on Hugging Face:

Features

Synthetic Data Generation: Mimics OCR errors in low-resource Devanagari scripts.
Language Diversity: Covers six low-resource Devanagari-based languages: Marathi, Bodo, Sanskrit, Hindi, Konkani, and Nepali.
High Quality: Carefully designed to capture typical OCR error patterns for training and evaluation purposes.

Usage

Download the datasets from the provided links on Hugging Face.
Use them to train, evaluate, or benchmark OCR error correction models.
Incorporate the data into existing workflows for improving OCR accuracy in Devanagari scripts.

Citation

If you use the RoundTripOCR datasets in your research or applications, please cite this work appropriately.

For more details, contributions, or support, feel free to reach out!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
data_generation		data_generation
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RoundTripOCR

Dataset Links

Features

Usage

Citation

About

Releases

Packages

Languages

License

harshvivek14/RoundTripOCR

Folders and files

Latest commit

History

Repository files navigation

RoundTripOCR

Dataset Links

Features

Usage

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages