Innovations in Document Analysis: Exploring Keras Segmentation and Graph Convolutional Neural Networks for Structural OCR
1 million table segmentation data
Nearly 1 million tables from scientific articles with bounding box annotations
link :: https://www.kaggle.com/datasets/bsmock/pubtables-1m-structure
this project was earlier developed under fire llama company as part of my internship. I am further developing it. This project is under development as my last semester project for the subject UCF 439 Capstone project
- Porject Architecture
- Requirements
- Installation
- Usage
- Features
- Documentation
- Credits
- Acknowledgements
- License
To run this project, you'll need the following specific dependencies:
- Python 3.7
- TensorFlow GPU 2.4.1 with CUDA 11.0 and cuDNN 8.0
- keras 2.4.3
- imgaug
- opencv 4.5.9
- django
You can install the required Python packages using the following command:
pip install <package_name>==<version>
Example
pip install tensorflow_gpu==2.4.1
There are two ways to get started with the project, follow these steps:
- Simple installation where you have to satisfy all dependency in your os.
- By using docker container.
-
Clone this repository:
git clone https://github.com/RajKrishna2123/capstone_project
cd capstone_project
pip install -r requirements.txt
-
Use docker container
docker build -t project_container:updated1 .
following command will run the project
docker run --gpus all -it -v D:/struct_ocr_data:/app -p 8000:8000 project_cotainer:updated1 /bin/bash
once container is up and running, in case of lost connection or accidentally closed terminal then to reconnect to same container use following command
docker exec -it <container_id> bash
This project can be used to convert your bulk/single images into editable formatted structured as it was in image into a relational table at once
-
Extensive Training Data: The implemented AI model is trained over an extensive dataset of 1 million high resolution images. This ensures the system's robustness and accuracy in document structure identification.
-
MLOps Integration: Our implementation adheres to MLOps practices, ensuring a seamless and automated end-to-end workflow. Continuous integration and delivery pipelines will be established for efficient model deployment and updates.
-
Containerization: The system will be containerized for deployment as a web app and API service. This promotes scalability and ease of integration into various applications.
-
Google Drive Integration: A unique feature allows users to effortlessly process bulk data by providing Google Drive links.
-
Flexible Data Outputs: Another unique feature that system supports versatile data outputs, including CSV, MySQL databases, and XLSX, catering to diverse data management preferences.
-
Integrated API Service: Integrated API capabilities will provide other developers with easy access to incorporate Structural OCR functionalities into their applications, enhancing overall system accessibility.
Link to external documentation or detailed guides.
Special thanks to Rajeev Ratan sir for their awesome repository! that supported this project a lot.
This project is licensed under the MIT License - see the LICENSE file for details.