Databricks FAQ Chatbot

The Databricks FAQ Chatbot is a comprehensive AI-based system for answering user queries related to the Databricks platform. It leverages Large Language Models (LLMs) and advanced Natural Language Processing (NLP) techniques to generate human-like responses.

Project Overview

Our project started with the need for automating responses to repetitive customer queries. The end product is a sophisticated FAQ Chatbot that learns from the Databricks FAQ dataset, uses a fine-tuned Wizard-Vicuna model, and integrates seamlessly with any customer service platform.

Getting Started

The following instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Installing

Clone the repo

git clone https://github.com/sumitsahaykoantek/koantekDatabricksHackathon.git

Install necessary libraries

pip install -r requirements.txt

System Workflow

Our development process involves the following steps:

Data Collection: Extract FAQ data from the Databricks website.
Topic Modeling: Use the Hugging Face's ChatGPT model for topic modeling on the collected dataset.
Data Augmentation: Paraphrase the questions with Hugging Face's Paraphrase models.
Data Classification: Classify the generated questions using the Databricks AutoML system.
Embeddings Creation: Create embeddings for questions, answers, and categories using Hugging Face's Sentence Transformers.
Data Storage: Store these embeddings in a database such as ChromaDB for easy retrieval.

Real-time inferencing involves the following steps:

Input Processing: Create embeddings for user-input questions.
Query Classification: Classify the question with the AutoML model.
Similarity Check: Compare the input question with stored questions in the database.
Answer Retrieval: If an exact match is found, display the corresponding answer. If not, retrieve similar questions and their answers.
Response Generation: Feed the input question, similar questions, categories, and answers into the fine-tuned GPT-4 based language model.
Answer Comparison: Employ fuzzy matching to compare the model's answer with the original answers in the database.
Answer Display: Display the best-matched answer to the user.

How to Use

Detailed instructions for usage can be found in our User Guide. It explains how to input a question, interpret the output, and handle errors or issues.

Future Enhancements

We plan to expand the training dataset continually, ensuring coverage for a wide range of customer queries. We also intend to explore live streaming man-in-the-middle QA systems to correct and improve the chatbot's responses in near-real time.

Contributing

We welcome contributions to improve this project. Please refer to CONTRIBUTING.md for details on our code of conduct

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Hackathon_ChatBot_Final.dbc		Hackathon_ChatBot_Final.dbc
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Databricks FAQ Chatbot

Table of Contents

Project Overview

Getting Started

Prerequisites

Installing

System Workflow

How to Use

Future Enhancements

Contributing

About

Releases

Packages

License

sumitsahaykoantek/koantekDatabricksHackathon

Folders and files

Latest commit

History

Repository files navigation

Databricks FAQ Chatbot

Table of Contents

Project Overview

Getting Started

Prerequisites

Installing

System Workflow

How to Use

Future Enhancements

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages