-
Notifications
You must be signed in to change notification settings - Fork 212
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #296 from ajithvcoder/feature_rag_tutorial
Add rag basic tutuorials
- Loading branch information
Showing
9 changed files
with
1,858 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,322 @@ | ||
.. raw:: html | ||
|
||
<div style="display: flex; justify-content: flex-start; align-items: center; margin-bottom: 20px;"> | ||
<a href="https://colab.research.google.com/github/SylphAI-Inc/AdalFlow/blob/main/notebooks/tutorials/adalflow_rag_documents.ipynb" target="_blank" style="margin-right: 20px;"> | ||
<img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" style="height: 20px;"> | ||
</a> | ||
|
||
<a href="https://github.com/SylphAI-Inc/AdalFlow/tree/main/tutorials/adalflow_rag_documents.py" target="_blank" style="display: flex; align-items: center;"> | ||
<img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" alt="GitHub" style="height: 20px; width: 20px; margin-right: 5px;"> | ||
<span style="vertical-align: middle;"> Open Source Code</span> | ||
</a> | ||
</div> | ||
|
||
RAG for documents | ||
============================= | ||
|
||
Overview | ||
-------- | ||
|
||
This implementation showcases an end-to-end RAG system capable of handling large-scale text files and | ||
generating context-aware responses. It is both modular and extensible, making it adaptable to various | ||
use cases and LLM APIs. | ||
|
||
**Imports** | ||
|
||
- **SentenceTransformer**: Used for creating dense vector embeddings for textual data. | ||
- **FAISS**: Provides efficient similarity search using vector indexing. | ||
- **tiktoken**: ensures that the text preprocessing aligns with the tokenization requirements of the underlying language models, making the pipeline robust and efficient. | ||
- **GroqAPIClient and OpenAIClient**: Custom classes for interacting with different LLM providers. | ||
- **ModelType**: Enum for specifying the model type. | ||
|
||
.. code-block:: python | ||
import os | ||
import tiktoken | ||
from typing import List, Dict, Tuple | ||
import numpy as np | ||
from sentence_transformers import SentenceTransformer | ||
from faiss import IndexFlatL2 | ||
from adalflow.components.model_client import GroqAPIClient, OpenAIClient | ||
from adalflow.core.types import ModelType | ||
from adalflow.utils import setup_env | ||
The ``AdalflowRAGPipeline`` class sets up the Retrieval-Augmented Generation (RAG) pipeline. Its ``__init__`` method initializes key components: | ||
|
||
- An embedding model (``all-MiniLM-L6-v2``) is loaded using ``SentenceTransformer`` to convert text into dense vector embeddings with a dimensionality of 384. | ||
- A FAISS index (``IndexFlatL2``) is created for similarity-based document retrieval. | ||
- Parameters such as ``top_k_retrieval`` (number of documents to retrieve) and ``max_context_tokens`` (limit on token count in the context) are configured. | ||
- A tokenizer (``tiktoken``) ensures precise token counting, crucial for handling large language models (LLMs). | ||
|
||
The method also initializes storage for documents, their embeddings, and associated metadata for efficient management and retrieval. | ||
|
||
The ``AdalflowRAGPipeline`` class provides a flexible pipeline for Retrieval-Augmented Generation (RAG), | ||
initializing with parameters such as the embedding model (``all-MiniLM-L6-v2`` by default), vector dimension, | ||
top-k retrieval count, and token limits for context. It utilizes a tokenizer for token counting, a | ||
SentenceTransformer for embeddings, and a FAISS index for similarity searches, while also maintaining | ||
document data and metadata. The ``load_text_file`` method processes large text files into manageable chunks | ||
by splitting the content into fixed line groups, facilitating easier embedding and storage. To handle | ||
multiple files, ``add_documents_from_directory`` iterates over text files in a directory, embeds the content, | ||
and stores them in the FAISS index along with metadata. Token counting is achieved via the ``count_tokens`` | ||
method, leveraging a tokenizer to precisely determine the number of tokens in a given text. The | ||
``retrieve_and_truncate_context`` method fetches the most relevant documents from the FAISS index based on | ||
query embeddings, truncating the context to adhere to token limits. Finally, the ``generate_response`` method | ||
constructs a comprehensive prompt by combining the retrieved context and query, invokes the provided model | ||
client for a response, and parses the results into a readable format. This pipeline demonstrates seamless | ||
integration of text retrieval and generation to handle large-scale document queries effectively. | ||
|
||
|
||
.. code-block:: python | ||
class AdalflowRAGPipeline: | ||
def __init__(self, | ||
model_client=None, | ||
model_kwargs=None, | ||
embedding_model='all-MiniLM-L6-v2', | ||
vector_dim=384, | ||
top_k_retrieval=3, | ||
max_context_tokens=800): | ||
""" | ||
Initialize RAG Pipeline for handling large text files | ||
Args: | ||
embedding_model (str): Sentence transformer model for embeddings | ||
vector_dim (int): Dimension of embedding vectors | ||
top_k_retrieval (int): Number of documents to retrieve | ||
max_context_tokens (int): Maximum tokens to send to LLM | ||
""" | ||
# Initialize model client for generation | ||
self.model_client = model_client | ||
# Initialize tokenizer for precise token counting | ||
self.tokenizer = tiktoken.get_encoding("cl100k_base") | ||
# Initialize embedding model | ||
self.embedding_model = SentenceTransformer(embedding_model) | ||
# Initialize FAISS index for vector similarity search | ||
self.index = IndexFlatL2(vector_dim) | ||
# Store document texts, embeddings, and metadata | ||
self.documents = [] | ||
self.document_embeddings = [] | ||
self.document_metadata = [] | ||
# Retrieval and context management parameters | ||
self.top_k_retrieval = top_k_retrieval | ||
self.max_context_tokens = max_context_tokens | ||
# Model generation parameters | ||
self.model_kwargs = model_kwargs | ||
def load_text_file(self, file_path: str) -> List[str]: | ||
""" | ||
Load a large text file and split into manageable chunks | ||
Args: | ||
file_path (str): Path to the text file | ||
Returns: | ||
List[str]: List of document chunks | ||
""" | ||
with open(file_path, 'r', encoding='utf-8') as file: | ||
# Read entire file | ||
content = file.read() | ||
# Split content into chunks (e.g., 10 lines per chunk) | ||
lines = content.split('\n') | ||
chunks = [] | ||
chunk_size = 10 # Adjust based on your file structure | ||
for i in range(0, len(lines), chunk_size): | ||
chunk = '\n'.join(lines[i:i+chunk_size]) | ||
chunks.append(chunk) | ||
return chunks | ||
def add_documents_from_directory(self, directory_path: str): | ||
""" | ||
Add documents from all text files in a directory | ||
Args: | ||
directory_path (str): Path to directory containing text files | ||
""" | ||
for filename in os.listdir(directory_path): | ||
if filename.endswith('.txt'): | ||
file_path = os.path.join(directory_path, filename) | ||
document_chunks = self.load_text_file(file_path) | ||
for chunk in document_chunks: | ||
# Embed document chunk | ||
embedding = self.embedding_model.encode(chunk) | ||
# Add to index and document store | ||
self.index.add(np.array([embedding])) | ||
self.documents.append(chunk) | ||
self.document_embeddings.append(embedding) | ||
self.document_metadata.append({ | ||
'filename': filename, | ||
'chunk_index': len(self.document_metadata) | ||
}) | ||
def count_tokens(self, text: str) -> int: | ||
""" | ||
Count tokens in a given text | ||
Args: | ||
text (str): Input text | ||
Returns: | ||
int: Number of tokens | ||
""" | ||
return len(self.tokenizer.encode(text)) | ||
def retrieve_and_truncate_context(self, query: str) -> str: | ||
""" | ||
Retrieve relevant documents and truncate to fit token limit | ||
Args: | ||
query (str): Input query | ||
Returns: | ||
str: Concatenated context within token limit | ||
""" | ||
# Retrieve relevant documents | ||
query_embedding = self.embedding_model.encode(query) | ||
distances, indices = self.index.search( | ||
np.array([query_embedding]), | ||
self.top_k_retrieval | ||
) | ||
# Collect and truncate context | ||
context = [] | ||
current_tokens = 0 | ||
for idx in indices[0]: | ||
doc = self.documents[idx] | ||
doc_tokens = self.count_tokens(doc) | ||
# Check if adding this document would exceed token limit | ||
if current_tokens + doc_tokens <= self.max_context_tokens: | ||
context.append(doc) | ||
current_tokens += doc_tokens | ||
else: | ||
break | ||
return "\n\n".join(context) | ||
def generate_response(self, query: str) -> str: | ||
""" | ||
Generate a response using retrieval-augmented generation | ||
Args: | ||
query (str): User's input query | ||
Returns: | ||
str: Generated response incorporating retrieved context | ||
""" | ||
# Retrieve and truncate context | ||
retrieved_context = self.retrieve_and_truncate_context(query) | ||
# Construct context-aware prompt | ||
full_prompt = f""" | ||
Context Documents: | ||
{retrieved_context} | ||
Query: {query} | ||
Generate a comprehensive response that: | ||
1. Directly answers the query | ||
2. Incorporates relevant information from the context documents | ||
3. Provides clear and concise information | ||
""" | ||
# Prepare API arguments | ||
api_kwargs = self.model_client.convert_inputs_to_api_kwargs( | ||
input=full_prompt, | ||
model_kwargs=self.model_kwargs, | ||
model_type=ModelType.LLM | ||
) | ||
# Call API and parse response | ||
response = self.model_client.call( | ||
api_kwargs=api_kwargs, | ||
model_type=ModelType.LLM | ||
) | ||
response_text = self.model_client.parse_chat_completion(response) | ||
return response_text | ||
The ``run_rag_pipeline`` function demonstrates how to use the ``AdalflowRAGPipeline``. It initializes the pipeline, | ||
adds documents from a directory, and generates responses for a list of user queries. The function is generic | ||
and can accommodate various LLM API clients, such as GroqAPIClient or OpenAIClient, highlighting the pipeline's | ||
flexibility and modularity. | ||
|
||
|
||
.. code-block:: python | ||
def run_rag_pipeline(model_client, model_kwargs, documents, queries): | ||
# Example usage of RAG pipeline | ||
rag_pipeline = AdalflowRAGPipeline( | ||
model_client=model_client, | ||
model_kwargs=model_kwargs, | ||
top_k_retrieval=1, # Retrieve top 1 most relevant chunks | ||
max_context_tokens=800 # Limit context to 1500 tokens | ||
) | ||
# Add documents from a directory of text files | ||
rag_pipeline.add_documents_from_directory(documents) | ||
# Generate responses | ||
for query in queries: | ||
print(f"\nQuery: {query}") | ||
response = rag_pipeline.generate_response(query) | ||
print(f"Response: {response}") | ||
This block provides an example of running the pipeline with different models and queries. It specifies: | ||
|
||
- The document directory containing the text files. | ||
- Example queries about topics such as the "Crystal Cavern" and "rare trees in Elmsworth." | ||
- Configuration for Groq and OpenAI model parameters, including the model type, temperature, and token limits. | ||
|
||
.. code-block:: python | ||
documents = '../../tutorials/assets/documents' | ||
queries = [ | ||
"What year was the Crystal Cavern discovered?", | ||
"What is the name of the rare tree in Elmsworth?", | ||
"What local legend claim that Lunaflits surrounds?" | ||
] | ||
groq_model_kwargs = { | ||
"model": "llama-3.2-1b-preview", # Use 16k model for larger context | ||
"temperature": 0.1, | ||
"max_tokens": 800, | ||
} | ||
openai_model_kwargs = { | ||
"model": "gpt-3.5-turbo", | ||
"temperature": 0.1, | ||
"max_tokens": 800, | ||
} | ||
# Below example shows that adalflow can be used in a genric manner for any api provider | ||
# without worrying about prompt and parsing results | ||
run_rag_pipeline(GroqAPIClient(), groq_model_kwargs, documents, queries) | ||
run_rag_pipeline(OpenAIClient(), openai_model_kwargs, documents, queries) | ||
The example emphasizes that ``AdalflowRAGPipeline`` can interact seamlessly with multiple API providers, | ||
enabling integration with diverse LLMs without modifying the core logic for prompt construction or | ||
response parsing. | ||
|
||
|
||
.. admonition:: API reference | ||
:class: highlight | ||
|
||
- :class:`utils.setup_env` | ||
- :class:`core.types.ModelType` | ||
- :class:`components.model_client.OpenAIClient` | ||
- :class:`components.model_client.GroqAPIClient` |
Oops, something went wrong.