In today's fast-paced world, managing and interacting with various documents can be a daunting task. Imagine having a tool that not only simplifies this process but also enhances it using the power of Artificial Intelligence. Enter DocuChat, an innovative solution designed to revolutionize the way we handle documents. DocuChat is a document-based chatbot that leverages advanced NLP models to provide intelligent responses based on the content of uploaded documents. This project consists of a PHP frontend and a Python backend.
DocuChat is a groundbreaking project that leverages the capabilities of Generative AI to analyze, interact with, and retrieve information from various types of documents, including PDFs, DOCX, PPT, and more. Whether you're a hobbyist, a beginner, or a professional, DocuChat offers a seamless and efficient way to manage your document needs.
Are you new to the world of Natural Language Processing (NLP) and AI?
DocuChat is the perfect starting point for you. With a user-friendly interface and straightforward setup, you can quickly dive into the exciting world of AI and document management. Here's what you can expect:
-
Easy Setup: Follow our simple installation guide to get started.
-
Interactive Learning: Experiment with different document types and see how AI analyzes and retrieves information.
-
Community Support: Join our growing community of hobbyists and beginners to share experiences, ask questions, and learn together.
If you're a professional looking for a robust and reliable document management solution, DocuChat has got you covered. With advanced features and customizable options, you can tailor the tool to meet your specific needs. Here’s how DocuChat can benefit you:
-
Efficiency: Save time by letting AI handle the heavy lifting of document analysis and information retrieval.
-
Accuracy: Ensure precise and relevant information extraction with our fine-tuned models.
-
Scalability: Easily integrate DocuChat into your existing workflows and scale it as your document management needs grow.
We invite contributors and sponsors to join us in enhancing DocuChat. Your support and contributions can help us bring even more exciting features and improvements to the project. Whether you're a developer, a researcher, or a sponsor, there's a place for you in the DocuChat community.
The project includes scripts and configurations to:
- Upload documents
- Extract text from various document formats
- Generate embeddings using a BERT-based model
- Fine-tune the model on specific tasks
backend/: Contains the backend code and scripts for the project.
-
app.py: Main Flask app for handling requests.
-
download_models.py: Script to download models.
-
fine_tune_model.py: Script to fine-tune the model.
-
models/: Directory to store models.
- bert-base-multilingual-cased/: Directory for the bert-base-multilingual-cased model.
-
project_embeddings/: Directory to store project embeddings.
- YourProjectName/: Your Own Projects Embeddings will be created here
-
results/: Directory to store fine-tuning results.
-
static/: Directory for static files.
- embeddings/: Directory to store generated embeddings.
-
YourProjectName/: Your Own Projects Embeddings will be created here
- uploads/: Directory to store uploaded files.
- YourProjectName/: Your Own Projects Embeddings will be created here
-
templates/: HTML templates.
-
pycache/: Python cache files.
frontend/: Contains the frontend code and assets.
- public/: Public assets for the frontend.
- css/: Custom styles.
- img/: Images for the frontend.
- types/: File type icons.
- js/: JavaScript files.
- src/: Source files for the frontend.
- views/: Views for the frontend.
vendor/: Contains third-party libraries and frameworks.
- bootstrap/: Bootstrap CSS and JS.
- font-awesome/: FontAwesome CSS and JS.
- jquery/: jQuery library.
- Python 3.11
- Flask
- Flask-CORS
- pdfminer.six
- python-docx
- openpyxl
- python-pptx
- transformers
- faiss
- langdetect
- torch
- PHP 7.4+
- Bootstrap
- FontAwesome
- jQuery
- Clone the repository:
git clone https://github.com/LebToki/DocuChat.git
cd DocuChat/backend
Set up a virtual environment and install dependencies:
python -m venv .venv
source .venv/bin/activate # On Windows, use `.venv\\Scripts\\activate`
pip install -r requirements.txt
Download the model:
python download_models.py
Run the Flask app:
python app.py
The api is now available at http://localhost:5000/
or if you are using a pretty url then use the local domain template and browse to the correct project domain using your browser.
https://localhost/DocuChat/frontend/
- Create a virtual environment and activate it:
python -m venv .venv source .venv/bin/activate # On Windows, use `.venv\Scripts\activate`
pip install -r requirements.txt
python backend/download_models.py
python backend/app.py
Ensure you have a local server setup (e.g., XAMPP, Laragon).
Place the PHP files in the appropriate directory of your server. Open the project in your browser.
- Upload documents through the frontend interface.
- Ask questions related to the uploaded documents.
- The backend will process the documents, generate embeddings, and provide relevant responses based on the content.
To fine-tune the model, run the following script:
python backend/fine_tune_model.py
Contributions are welcome! Please submit a pull request or open an issue for any improvements or bugs.