- Receipt information extraction is the process of automatically identifying and extracting relevant information from receipts, such as the date, merchant name, total amount, and individual item prices.
- Receipt information extraction has numerous practical applications, such as in accounting, expense tracking, and financial analysis. By automating the extraction of receipt information, businesses can save time and reduce errors associated with manual data entry.
- In this project, we focus on working with Vietnamese-language’s receipt.
- Duong T. Thanh (@duongttr)
- Nguyen N. Doan Hieu (@ndhieunguyen)
- Khoi N. The (@nguyenthekhoig7)
- Hau T. Hoang (@hautran7201)
- Kiet T. Tuan
The image will be processed through YOLOv8 for detecting bounding boxes of texts. After cropping out, the image will be pushed to OCR engine (Pytesseract in this case) to read the content. Images and texts are combined to push to LayoutLMv3 for classifying classes.
Best F1-Score on evaluation dataset = 0.93222
You can take a look at our presentation slide for more details
Check Releases section for downloading latest models.
Download dataset and place it to folder dataset
, follow this structure:
images
: Folder contains receipt imagestrain.json
: Train annotationval.json
: Val annotation
Then run this command:
python train.py --output_dir <output_dir> \
--max_steps 15000 \
--batch_size 4
Check the source code for more parameters.
- Clone the repo
git clone https://github.com/duongttr/vireceipt-information-extraction.git
cd vireceipt-information-extraction
- Install dependency
conda env create -f environment.yml
or
pip install -r requirements
- Run localhost
python -m streamlit run VIE_run.py
- Tesseract documentation. Tesseract OCR. (n.d.). Retrieved March 31, 2023, from https://tesseract-ocr.github.io/
- A new state-of-the-art computer vision model. YOLOv8. (n.d.). Retrieved March 31, 2023, from https://yolov8.com/ Tesseract documentation. Tesseract OCR. (n.d.). Retrieved March 31, 2023, from https://tesseract-ocr.github.io/
- Huang, Y., Lv, T., Cui, L., Lu, Y., & Wei, F. (2022). LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking. ArXiv. /abs/2204.08387