基于 LlamaIndex 构建个人 RAG 知识库

1. 构建RAG过程

1.1 配环境：安装很多包

创建conda环境：conda create -n llamaindex python=3.10
激活这个环境：conda activate llamaindex

在这个环境，疯狂地安装超级多依赖（这个过程会有点慢）

pip install einops==0.7.0 protobuf==5.26.1
pip install llama-index==0.10.38 llama-index-llms-huggingface==0.2.0 "transformers[torch]==4.41.1" "huggingface_hub[inference]==0.23.1" huggingface_hub==0.23.1 sentence-transformers==2.7.0 sentencepiece==0.2.0
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

通过教程给的脚本，下载 Sentence Transformer 模型
下载 NLTK 相关资源
- 为什么要这样做？由于访问 NLTK 官方仓库可能存在不稳定性，预先从国内镜像（如 Gitee）下载并配置所需的 NLTK 资源，可以确保项目的顺利进行，避免因网络问题导致的资源下载失败。
- NLTK是怎么找到这个路径的？当您使用 NLTK 的某个功能（如分词、词性标注等）时，NLTK 会按照一定的顺序在预设的多个路径中查找所需的资源文件。默认情况下，nltk.data.path 包含以下路径（优先级从高到低）：
  - 当前工作目录的 nltk_data 文件夹。
  - 用户主目录下的 nltk_data 文件夹（如 ~/nltk_data）。
  - 系统级的 NLTK 数据目录（可能因操作系统不同而异）。如果 NLTK 在这些路径中找不到所需的资源，它会尝试从在线仓库下载。
- 我们的 nltk_data 文件夹就在 ~/nltk_data
- 我们将资源预先下载并解压到正确的目录，确保 NLTK 可以直接访问这些资源，避免触发在线下载

1.2 使用 llama_index 库中的 HuggingFaceLLM 类直接推理

作为对照组，展示RAG之前的模型原生的效果。

这个推理LLM的脚本代码，和我们常见的基于transformers库的LLM推理代码长得很不一样：

from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core.llms import ChatMessage
llm = HuggingFaceLLM(
    model_name="/root/model/internlm2-chat-1_8b",
    tokenizer_name="/root/model/internlm2-chat-1_8b",
    model_kwargs={"trust_remote_code":True},
    tokenizer_kwargs={"trust_remote_code":True}
)
rsp = llm.chat(messages=[ChatMessage(content="xtuner是什么？")])
print(rsp)

LlamaIndex库提供了比 transformers 库更高层次的抽象和接口，从而简化了与大语言模型（LLM）的交互过程。

运行结果：

武汉大学的超算怎么连接登陆？

可见关于这个私域问题，模型无法直接回答。

1.3 通过LlamaIndex构建RAG

核心代码 llamaindex_RAG.py 的理解：

query_engine.query("xtuner是什么?") 这一行代码背后同时完成了检索（Retrieval）和生成（Generation）两个过程。

在执行 query_engine.query("xtuner是什么?") 之前，代码已经完成了以下准备工作：

嵌入模型初始化：使用 HuggingFaceEmbedding 将文本转换为向量表示。
语言模型初始化：使用 HuggingFaceLLM 作为生成模型。
文档加载和向量索引构建：读取文档并通过 VectorStoreIndex 将其向量化存储，以支持高效的相似度检索。

query_engine.query("xtuner是什么?") 实际上是一个高度封装的函数，它整合了所有步骤。通过调用 query_engine.query("xtuner是什么?")，你实际上触发了一个完整的 RAG 流程，包括：

将查询转换为向量。
在向量索引中检索相关文档。
将检索到的文档内容提供给生成模型。
生成并返回最终的回答。

运行结果：

武汉大学的超算怎么连接登陆？

1.4 构建Web UI

安装streamlit，然后运行 app.py 即可。

app.py 的代码逻辑：

2. 结果验证

目标：解答私域问题：

武汉大学的超算怎么连接登陆？

该问题在使用 LlamaIndex 之前InternLM2-Chat-1.8B模型不会回答。

通过构建 武大超算的使用手册知识库 ，借助 LlamaIndex 后，InternLM2-Chat-1.8B 模型具备回答目标问题的能力：

附录：关于L1 G4000文档的修改小建议

关于这个教程文档，我有些修改的小建议。

文档中的配置环境的顺序是这样的：

# 1. 创建并激活环境
conda create -n llamaindex python=3.10
conda activate llamaindex

# 2. 首先安装特定版本的 PyTorch，并确保使用 CUDA 11.7
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

# 3. 安装基础依赖
pip install einops==0.7.0 protobuf==5.26.1

# 4. 安装 LLamaIndex 相关
pip install llama-index==0.10.38 
pip install llama-index-llms-huggingface==0.2.0
pip install "transformers[torch]==4.41.1"
pip install "huggingface_hub[inference]==0.23.1" huggingface_hub==0.23.1
pip install sentence-transformers==2.7.0 
pip install sentencepiece==0.2.0

# 5. 安装向量依赖包
pip install llama-index-embeddings-huggingface==0.2.0 
pip install llama-index-embeddings-instructor==0.1.3

存在的问题是：匹配cuda11.7的torch安装在前，但是其后安装 LLamaIndex 相关的指令会稳定触发torch的升级到最新版本，而新版本的PyTorch (2.5.1) 默认使用CUDA 12.4

pip install llama-index==0.10.38 llama-index-llms-huggingface==0.2.0 "transformers[torch]==4.41.1" "huggingface_hub[inference]==0.23.1" huggingface_hub==0.23.1 sentence-transformers==2.7.0 sentencepiece==0.2.0

pip install llama-index-embeddings-huggingface==0.2.0 llama-index-embeddings-instructor==0.1.3

这样，当进行到模型推理的步骤时，就会有

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

报错，也就是说安装教程的顺序走稳定会在模型推理的的时候遇到cuda报错。

必须再次重新安装正确的torch：

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

也许可以这样修改：

# 1. 创建并激活环境
conda create -n llamaindex python=3.10
conda activate llamaindex

# 2. 安装基础依赖
pip install einops==0.7.0 protobuf==5.26.1

# 3. 先安装 LlamaIndex 和相关包,让它自动处理依赖关系
# 注意：此时 pip 会安装与你当前环境不同的 torch 版本。
pip install llama-index==0.10.38 llama-index-llms-huggingface==0.2.0 "transformers[torch]==4.41.1" "huggingface_hub[inference]==0.23.1" huggingface_hub==0.23.1 sentence-transformers==2.7.0 sentencepiece==0.2.0

# 4. 安装向量依赖包
pip install llama-index-embeddings-huggingface==0.2.0 llama-index-embeddings-instructor==0.1.3

# 5. 强制重装特定版本的 PyTorch (这会覆盖/降级之前自动安装的版本)
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

安装完成后，验证 torch 是否正确安装并使用了指定的 CUDA 版本：

import torch
print(torch.__version__)        # 应输出类似 '2.0.1'
print(torch.version.cuda)       # 应输出 '11.7'
print(torch.cuda.is_available())# 应输出 True

已经提交相关issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L1 G4000.md

L1 G4000.md

基于 LlamaIndex 构建个人 RAG 知识库

1. 构建RAG过程

1.1 配环境：安装很多包

1.2 使用 llama_index 库中的 HuggingFaceLLM 类直接推理

1.3 通过LlamaIndex构建RAG

1.4 构建Web UI

2. 结果验证

附录：关于L1 G4000文档的修改小建议

Files

L1 G4000.md

Latest commit

History

L1 G4000.md

File metadata and controls

基于 LlamaIndex 构建个人 RAG 知识库

1. 构建RAG过程

1.1 配环境：安装很多包

1.2 使用 llama_index 库中的 HuggingFaceLLM 类直接推理

1.3 通过LlamaIndex构建RAG

1.4 构建Web UI

2. 结果验证

附录：关于L1 G4000文档的修改小建议