The Databricks FAQ Chatbot is a comprehensive AI-based system for answering user queries related to the Databricks platform. It leverages Large Language Models (LLMs) and advanced Natural Language Processing (NLP) techniques to generate human-like responses.
- Project Overview
- Getting Started
- System Workflow
- How to Use
- Future Enhancements
- Contributing
- License
- Contact
Our project started with the need for automating responses to repetitive customer queries. The end product is a sophisticated FAQ Chatbot that learns from the Databricks FAQ dataset, uses a fine-tuned Wizard-Vicuna model, and integrates seamlessly with any customer service platform.
The following instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Clone the repo
git clone https://github.com/sumitsahaykoantek/koantekDatabricksHackathon.git
Install necessary libraries
pip install -r requirements.txt
Our development process involves the following steps:
- Data Collection: Extract FAQ data from the Databricks website.
- Topic Modeling: Use the Hugging Face's ChatGPT model for topic modeling on the collected dataset.
- Data Augmentation: Paraphrase the questions with Hugging Face's Paraphrase models.
- Data Classification: Classify the generated questions using the Databricks AutoML system.
- Embeddings Creation: Create embeddings for questions, answers, and categories using Hugging Face's Sentence Transformers.
- Data Storage: Store these embeddings in a database such as ChromaDB for easy retrieval.
Real-time inferencing involves the following steps:
- Input Processing: Create embeddings for user-input questions.
- Query Classification: Classify the question with the AutoML model.
- Similarity Check: Compare the input question with stored questions in the database.
- Answer Retrieval: If an exact match is found, display the corresponding answer. If not, retrieve similar questions and their answers.
- Response Generation: Feed the input question, similar questions, categories, and answers into the fine-tuned GPT-4 based language model.
- Answer Comparison: Employ fuzzy matching to compare the model's answer with the original answers in the database.
- Answer Display: Display the best-matched answer to the user.
Detailed instructions for usage can be found in our User Guide. It explains how to input a question, interpret the output, and handle errors or issues.
We plan to expand the training dataset continually, ensuring coverage for a wide range of customer queries. We also intend to explore live streaming man-in-the-middle QA systems to correct and improve the chatbot's responses in near-real time.
We welcome contributions to improve this project. Please refer to CONTRIBUTING.md for details on our code of conduct