ScanOCR Script

An automatic OCR (Optical Character Recognition) script for newly added PDF files. Utilizes OCRmyPDF and inotify-tools.

Prerequisites

Linux (e.g., Ubuntu)
OCRmyPDF
inotify-tools

Install prerequisites on Ubuntu:

sudo apt-get -y install ocrmypdf inotify-tools tesseract-ocr-deu

Setup

Clone the repository and copy files:

git clone https://github.com/efnats/scanocr.git
cd scanocr
sudo cp ./scanocr.sh /usr/local/bin
sudo chmod +x /usr/local/bin/scanocr.sh
sudo cp ./scanocr.service /etc/systemd/system/

Adjust the service file by adjusting file paths according to your needs
Setup the service:

sudo systemctl daemon-reload
sudo systemctl enable scanocr.service
sudo systemctl start scanocr.service

Operation

Monitors directories for new files, renaming with timestamp, performing OCR, moving to processed directory, and deleting the original.

Note on the Service File

The service file contains the directive OOMScoreAdjust=-1000. This directive is used to prevent the Out of Memory (OOM) killer from targeting the scanocr service. This is particularly important when running the service in an LXC container with limited RAM (e.g., 500MB). If the system disk is fast, consider raising swap to 1GB to provide additional virtual memory and prevent OOM situations.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md
scanocr.service		scanocr.service
scanocr.sh		scanocr.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScanOCR Script

Prerequisites

Setup

Operation

Note on the Service File

About

Releases

Packages

Languages

License

efnats/scanocr

Folders and files

Latest commit

History

Repository files navigation

ScanOCR Script

Prerequisites

Setup

Operation

Note on the Service File

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages