Installation

Download and install git from here.
Download and install Python from here.

Note: Do not forget to check "add python to PATH".
Download and install the latest version of tesseract from here (Scroll to the end of the page. Currently the latest version is tesseract-ocr-w64-setup-v5.0.1.20220107.exe)
Open CMD and run the following commands:

git clone https://github.com/AmirHosseinCV/PdfSearcher.git
cd PdfSearcher
pip install --prefer-binary -r requirements.txt

Usage

Step 1: prepare a CSV file

You need a CSV file that contains the locations of all words in your PDF file.

To prepare this file, go to the project folder (which contains convert.py) and run the following command (if you don't want to use ocr, change its value from 1 to 0):

python convert.py --pdf "[PATH_TO_YOUR_PDF_FILE]" --ocr 1

Use python convert.py --help for more information.

Step 2: Run the program

After placing the CSV file near your PDF file, run the following command in the project folder (which contains main.py file):

python main.py --pdf "[PATH_TO_YOUR_PDF_FILE]"

The PDF and CSV files must be in the same directory and have the same name.

Replace [PATH_TO_YOUR_PDF_FILE] with the path to your pdf file! Use python main.py --help for more information.

After that, you'll be able to open "http://localhost:8000/search" in your browser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Installation

Usage

Step 1: prepare a CSV file

Step 2: Run the program

Files

README.md

Latest commit

History

README.md

File metadata and controls

Installation

Usage

Step 1: prepare a CSV file

Step 2: Run the program