Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
-
Updated
Apr 8, 2025 - HTML
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Read Japanese manga inside browser with selectable text.
A simple and beautiful cross-platform screenshot software, It also supports OCR, image translation, stickers and pinning images features. | 简单且漂亮的跨平台截图软件,支持离线 OCR、图片翻译、贴图和钉图等功能
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Simple app to extract text from pictures using Tesseract
Tesseract.js OCR
AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding across diverse domains.
CERberus -- guardian against character errors 🐶🐶🐶
Data Mining Historical Newspaper Metadata (METS/ALTO formats)
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
Some bits of javascript to transcribe scanned pages using PageXML
An OCR demo application using tesseract.js and html with smartphone camera.
Documentation for Papermerge DMS - Installation, Help, User Manual, REST API
Add a description, image, and links to the ocr topic page so that developers can more easily learn about it.
To associate your repository with the ocr topic, visit your repo's landing page and select "manage topics."