Skip to content

A statement parser which extracts bank statements into a useable data frame.

Notifications You must be signed in to change notification settings

pc-1827/statement-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Statement Parser

This project extracts and processes bank statements using OCR and various Python libraries, to give a useable dataframe. The Pandas dataframe can then be used for expenditure analysis and tracking data.

Currently, this parser works only for HDFC Bank and Kotak Mahindra Bank statements.

Installation

Prerequisites

Before you can run this project, you need to install some system dependencies and Python libraries.

For Linux (Ubuntu)

  1. Update the package list:

    sudo apt-get update
  2. Install Poppler-utils:

    sudo apt-get install -y poppler-utils
  3. Install Tesseract-OCR:

    sudo apt-get install -y tesseract-ocr
  4. Install the required Python libraries:

    pip install -r requirements.txt

For Windows

  1. Install Poppler:

  2. Install Tesseract-OCR:

  3. Install the required Python libraries:

    pip install -r requirements.txt

Setting Tesseract Path in Python

In your Python script, you might need to set the path to the Tesseract executable if it's not in your system PATH.

import pytesseract

# Example for Windows
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

# Example for Linux (if not in default path)
# pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract

Usage

Replace the filePath variable with the path of the bank statement to be processed.

filePath = "samples/Statement April-Aug 2021.pdf"

To get output use the following command.

python3 main.py

Thank You!

About

A statement parser which extracts bank statements into a useable data frame.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages