Authors:
- Shivansh Gupta (21d070067@iitb.ac.in)
- Dhatri Mehta (210070027@iitb.ac.in)
- Devesh Soni (21d070025@iitb.ac.in)
College websites are often cluttered, making information retrieval difficult, especially during critical periods like admissions. Our solution, SmartBot, is a conversational AI chatbot designed specifically for IIT Bombay. It simplifies access to college-related information by understanding user queries and retrieving relevant information directly from web resources.
This project is part of the course EE782 - Advanced Topics in Machine Learning and demonstrates our understanding of conversational AI and deep learning methodologies.
- Conversational AI: Handles queries naturally without requiring specific formats.
- Web Scraping: Extracts information from IITB websites like Gymkhana and SMP using tools like BeautifulSoup and Requests.
- Pretrained LLMs: Employs Meta's Llama-2-13B model fine-tuned for robust and context-aware responses.
- Efficient Embedding: Utilizes instruction-finetuned embeddings for improved language understanding.
- GPTQ Quantization: Ensures efficient and high-quality performance through advanced quantization techniques.
- Data Collection: Web scraping gathers relevant information from target websites.
- Preprocessing: Data is cleaned and structured for compatibility with AI models.
- Language Modeling: Integrates Llama-2-13B for generating human-like responses.
- Embedding Layer: Enhances comprehension of user queries and context through optimized embeddings.
- Response Generation: Dynamically generates responses using advanced prompt engineering and streaming techniques.
- Dynamic Website Structures: Adapted to varied HTML structures across websites.
- Data Consistency: Ensured structured and error-free data for improved chatbot performance.
- Response Latency: Optimized output speed through model tuning and chunk size adjustments.
- Evaluation Metrics: Developed a custom evaluation metric focusing on relevance, coherence, and grammatical correctness.
User: What is SMP?
Bot: SMP stands for Student Mentor Programme, a program within IIT Bombay that aims to provide constructive and positive interaction, guidance, and mentorship to junior students by senior students.
For more examples, refer to the Colab Notebook.
- Hugging Face Transformers
- BeautifulSoup and Requests (Web scraping)
- PyPDFDirectoryLoader (PDF handling)
- TextStreamer (Streaming text generation)
Model | Parameters | Features | Speed | Human Feedback |
---|---|---|---|---|
Zephyr 7B | 7B | DPO | 5 min | Vague |
Falcon 7B | 7B | FA | 30 min | Vague |
Mistral 7B | 7B | SWA | 30 sec | Good (~50%) |
Llama2 13B | 13B | GPTQ | 10 sec | Good (~70%) |
- Integrating more datasets for broader coverage.
- Enhancing multilingual capabilities.
- Refining the evaluation metrics for automated feedback.