Click Here! 👉

🚀 Baemax T&C: Your Friendly Terms of Service Summarizer

This repository contains the code for Baemax T&C, an LLM application designed to simplify complex Terms of Service (ToS) agreements. We use Retrieval Augmented Generation (RAG) and large language models (LLMs) to provide clear and concise explanations of legal jargon, making ToS documents easier to understand.

ℹ️ About Baemax T&C

Baemax T&C uses a powerful LLM and a large repository of ToS agreements. When you ask a question about a specific ToS, the application searches its document store for relevant information (using RAG). This information is then processed by the LLM, which translates the legal language into plain English and generates a concise explanation.

Who can use Baemax T&C? Anyone! Consumers, employees, students, and anyone needing to understand online terms and conditions.

⚙️ Key Features

User-Friendly Interface: Simple and intuitive.
Comprehensive Coverage: Covers a wide range of ToS agreements (currently includes: HireVue, Llama2, Mettler Toledo, Open AI, Pentair, Qdoba Mexican Eats, Bank of America, Ulta Beauty, Verizon, Truist - constantly expanding!).
Accurate and Reliable: Powered by an LLM trained on a vast dataset of legal documents.
Customizable: Users can adjust the level of detail in the explanations.

🎯 Addressing Common ToS Problems

Baemax T&C tackles the challenges of:

Legal jargon: Complex legal language is translated into plain English.
Length and complexity: Long and complicated ToS documents are summarized.
Lack of clarity: Important terms and concepts are clearly explained.

🚀 RAG Evaluation and Implementation

This repository contains a comprehensive system for building and evaluating Retrieval Augmented Generation (RAG) pipelines. It leverages Langchain, OpenAI, FAISS, and deepeval to provide a robust and efficient solution.

📁 Project Overview

This project offers a complete workflow for creating a question-answering system that leverages external documents. The key components are:

📄 Document Processing (helper_functions.py): Handles the ingestion and processing of text files. This includes:
- Text Encoding: Processes both PDFs and text strings.
- Chunking: Splits text into overlapping chunks for efficient embedding.
- Embedding: Generates OpenAI embeddings for each chunk.
- Vector Store: Stores embeddings in a FAISS vector database for fast similarity search. This allows for efficient retrieval of relevant information.
⚙️ RAG Pipeline (simple_rag.ipynb): Implements the core RAG pipeline using OpenAI:
- Document Loading: Loads documents from specified folders and creates a FAISS index.
- Retrieval: Uses FAISS for efficient retrieval of relevant document chunks. Includes a fallback BM25 method.
- Answer Generation: Uses an OpenAI LLM (specified in the notebook) to generate answers based on the retrieved context.
🤖 RAG Evaluation (evaluate_rag.py): Provides a thorough evaluation of the RAG system using the deepeval library. Metrics include:
- Correctness (GEval): How factually accurate are the answers?
- Faithfulness: How well do answers align with the information in the source document?
- Answer Relevancy: How well do the answers address the specific question and its context within the source document?

💪 Technical Details

This project uses:

Python 3.11: Our runtime environment.
OpenAI API: Currently used for LLM processing.
FAISS: For efficient vector search within the document store.
Deepeval: For evaluating model performance (using Correctness, Faithfulness, and Contextual Relevancy metrics).
Dependencies listed in requirements.txt

🛠️ Code Quality Tools

We utilize:

Black: For code formatting.
isort: For import sorting.
Bandit: For security analysis.
Pre-commit: To automate code quality checks before each commit.

🦾 What's Under The Hood

🐍 Setup

Create a conda environment:

conda create -n your_env python=3.11

Before running the code, ensure you have the necessary libraries installed. You can install them using pip:

pip install -r requirements.txt

Add secrets.toml file:

You will also need an OpenAI API key. Create a .env file in the root directory and add your key:

OPENAI_API_KEY=your_api_key_here

In the terminal:

cd frontend/

streamlit run app.py

🤪 Quickstart Guide

🤖 Usage

Ask Away!

🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request.

📝 License

Copyrights of the Baemax T&C Team

Happy RAG-ing! 🎉

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
coding explained		coding explained
data_preparation		data_preparation
evaluations		evaluations
frontend		frontend
tests		tests
.env_example		.env_example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Baymax T&C Demo.mp4		Baymax T&C Demo.mp4
Hi! I'm Baemax.pdf		Hi! I'm Baemax.pdf
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Click Here! 👉

🚀 Baemax T&C: Your Friendly Terms of Service Summarizer

ℹ️ About Baemax T&C

⚙️ Key Features

🎯 Addressing Common ToS Problems

Baemax T&C tackles the challenges of:

🚀 RAG Evaluation and Implementation

📁 Project Overview

💪 Technical Details

🛠️ Code Quality Tools

🦾 What's Under The Hood

🐍 Setup

🤪 Quickstart Guide

🤖 Usage

About

Contributors 3

Languages

License

dsba6010-llm-applications/baemax_tc

Folders and files

Latest commit

History

Repository files navigation

Click Here! 👉

🚀 Baemax T&C: Your Friendly Terms of Service Summarizer

ℹ️ About Baemax T&C

⚙️ Key Features

🎯 Addressing Common ToS Problems

Baemax T&C tackles the challenges of:

🚀 RAG Evaluation and Implementation

📁 Project Overview

💪 Technical Details

🛠️ Code Quality Tools

🦾 What's Under The Hood

🐍 Setup

🤪 Quickstart Guide

🤖 Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

Languages