Skip to content

Commit

Permalink
Merge pull request #1346 from Jacksonxhx/JacksonX
Browse files Browse the repository at this point in the history
Update format and add colab warning
  • Loading branch information
zc277584121 authored May 30, 2024
2 parents 093c03f + 362fd22 commit 231a24c
Show file tree
Hide file tree
Showing 5 changed files with 110 additions and 55 deletions.
6 changes: 6 additions & 0 deletions bootcamp/tutorials/integration/milvus_and_DSPy.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,12 @@
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"cell_type": "markdown",
"source": "> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime**.",
"id": "bbf27b3225a33dae"
},
{
"cell_type": "markdown",
"source": [
Expand Down
46 changes: 11 additions & 35 deletions bootcamp/tutorials/integration/milvus_with_Jina.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -64,44 +64,20 @@
"cell_type": "code",
"source": [
"!pip install -U pymilvus\n",
"!pip install \"milvus[model]\""
"!pip install \"pymilvus[model]\""
],
"metadata": {
"id": "f748781570cc911f",
"ExecuteTime": {
"end_time": "2024-05-30T07:34:20.652135Z",
"start_time": "2024-05-30T07:34:12.092107Z"
}
"id": "f748781570cc911f"
},
"id": "f748781570cc911f",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pymilvus in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (2.4.3)\r\n",
"Requirement already satisfied: setuptools>=67 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (70.0.0)\r\n",
"Requirement already satisfied: grpcio<=1.63.0,>=1.49.1 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (1.63.0)\r\n",
"Requirement already satisfied: protobuf>=3.20.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (3.20.2)\r\n",
"Requirement already satisfied: environs<=9.5.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (9.5.0)\r\n",
"Requirement already satisfied: ujson>=2.0.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (5.10.0)\r\n",
"Requirement already satisfied: pandas>=1.2.4 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (2.2.2)\r\n",
"Requirement already satisfied: milvus-lite<2.5.0,>=2.4.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pymilvus) (2.4.5)\r\n",
"Requirement already satisfied: marshmallow>=3.0.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from environs<=9.5.0->pymilvus) (3.21.2)\r\n",
"Requirement already satisfied: python-dotenv in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from environs<=9.5.0->pymilvus) (1.0.1)\r\n",
"Requirement already satisfied: numpy>=1.26.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pandas>=1.2.4->pymilvus) (1.26.4)\r\n",
"Requirement already satisfied: python-dateutil>=2.8.2 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pandas>=1.2.4->pymilvus) (2.9.0.post0)\r\n",
"Requirement already satisfied: pytz>=2020.1 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pandas>=1.2.4->pymilvus) (2024.1)\r\n",
"Requirement already satisfied: tzdata>=2022.7 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from pandas>=1.2.4->pymilvus) (2024.1)\r\n",
"Requirement already satisfied: packaging>=17.0 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from marshmallow>=3.0.0->environs<=9.5.0->pymilvus) (24.0)\r\n",
"Requirement already satisfied: six>=1.5 in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas>=1.2.4->pymilvus) (1.16.0)\r\n",
"Requirement already satisfied: milvus[model] in /Users/zilliz/Library/Caches/pypoetry/virtualenvs/bootcamp-zTGGOKDG-py3.12/lib/python3.12/site-packages (2.3.5)\r\n",
"\u001b[33mWARNING: milvus 2.3.5 does not provide the extra 'model'\u001b[0m\u001b[33m\r\n",
"\u001b[0m"
]
}
],
"execution_count": 1
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"cell_type": "markdown",
"source": "> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime**.",
"id": "a20be817dcf2d3f1"
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -233,7 +209,7 @@
"cell_type": "markdown",
"source": [
"## Semantic Search with Jina & Milvus\n",
"With the strong embedding function, we can combine the embeddings retrieved by utilizing Jina AI models with Milvus Lite vector database to perform semantic search."
"With the powerful vector embedding function, we can combine the embeddings retrieved by utilizing Jina AI models with Milvus Lite vector database to perform semantic search."
],
"id": "3fb7ecc7c0bb19ef"
},
Expand Down
93 changes: 77 additions & 16 deletions bootcamp/tutorials/integration/rag_with_milvus_and_bentoml.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"cell_type": "markdown",
"source": "> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime**.",
"id": "86ed1bfa3dcba484"
},
{
"metadata": {},
"cell_type": "markdown",
Expand Down Expand Up @@ -87,10 +93,13 @@
"id": "bc9fc4a83cb30651"
},
{
"metadata": {},
"metadata": {
"ExecuteTime": {
"end_time": "2024-05-30T09:00:45.231255Z",
"start_time": "2024-05-30T09:00:45.228138Z"
}
},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": [
"# naively chunk on newlines\n",
"def chunk_text(filename: str) -> list:\n",
Expand All @@ -99,22 +108,67 @@
" sentences = text.split(\"\\n\")\n",
" return sentences"
],
"id": "c875c865b4f03cbf"
"id": "c875c865b4f03cbf",
"outputs": [],
"execution_count": 6
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Next, we process each of the files we have. ",
"source": "First we need to download the city data.",
"id": "56cf89a31307dd9"
},
{
"metadata": {},
"metadata": {
"ExecuteTime": {
"end_time": "2024-05-30T09:03:04.829125Z",
"start_time": "2024-05-30T09:03:02.749073Z"
}
},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": [
"import os\n",
"import requests\n",
"import urllib.request\n",
"\n",
"# set up the data source\n",
"repo = \"ytang07/bento_octo_milvus_RAG\"\n",
"directory = \"data\"\n",
"save_dir = \"./city_data\"\n",
"api_url = f\"https://api.github.com/repos/{repo}/contents/{directory}\"\n",
"\n",
"\n",
"response = requests.get(api_url)\n",
"data = response.json()\n",
"\n",
"if not os.path.exists(save_dir):\n",
" os.makedirs(save_dir)\n",
"\n",
"for item in data:\n",
" if item[\"type\"] == \"file\":\n",
" file_url = item[\"download_url\"]\n",
" file_path = os.path.join(save_dir, item[\"name\"])\n",
" urllib.request.urlretrieve(file_url, file_path)"
],
"id": "22279ff9d4181675",
"outputs": [],
"execution_count": 10
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Next, we process each of the files we have.",
"id": "c9c1bd74212091c4"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2024-05-30T09:03:08.342067Z",
"start_time": "2024-05-30T09:03:08.330758Z"
}
},
"cell_type": "code",
"source": [
"# please upload your data directory under this file's folder\n",
"cities = os.listdir(\"city_data\")\n",
"# store chunked text for each of the cities in a list of dicts\n",
Expand All @@ -128,7 +182,9 @@
" mapped = {\"city_name\": city.split(\".\")[0], \"chunks\": cleaned}\n",
" city_chunks.append(mapped)"
],
"id": "616ee4d005a73a32"
"id": "616ee4d005a73a32",
"outputs": [],
"execution_count": 11
},
{
"metadata": {},
Expand All @@ -137,10 +193,13 @@
"id": "19a39247b3a6d144"
},
{
"metadata": {},
"metadata": {
"ExecuteTime": {
"end_time": "2024-05-30T09:00:57.547822Z",
"start_time": "2024-05-30T09:00:57.543888Z"
}
},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": [
"def get_embeddings(texts: list) -> list:\n",
" if len(texts) > 25:\n",
Expand All @@ -154,7 +213,9 @@
" sentences=texts,\n",
" )"
],
"id": "9585e8a71f9582a7"
"id": "9585e8a71f9582a7",
"outputs": [],
"execution_count": 8
},
{
"metadata": {},
Expand All @@ -165,8 +226,6 @@
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": [
"entries = []\n",
"for city_dict in city_chunks:\n",
Expand All @@ -184,7 +243,9 @@
" entries.append(entry)\n",
" print(entries)"
],
"id": "70e248dd9b053db3"
"id": "70e248dd9b053db3",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,12 @@
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"cell_type": "markdown",
"source": "> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime**.",
"id": "78a3a6d4452080fe"
},
{
"metadata": {},
"cell_type": "markdown",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,12 @@
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"cell_type": "markdown",
"source": "> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime**.",
"id": "11d0ddab29c7ee2e"
},
{
"metadata": {},
"cell_type": "markdown",
Expand Down Expand Up @@ -140,8 +146,8 @@
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2024-05-29T08:17:20.142215Z",
"start_time": "2024-05-29T08:17:19.460511Z"
"end_time": "2024-05-30T09:11:31.175432Z",
"start_time": "2024-05-30T09:11:30.503896Z"
}
},
"id": "40d0cd0218d44802",
Expand All @@ -151,13 +157,13 @@
"output_type": "stream",
"text": [
"Query: When was artificial intelligence founded?\n",
"[{'id': 0, 'distance': 0.7196218371391296, 'entity': {'text': 'Artificial intelligence was founded as an academic discipline in 1956.', 'subject': 'history'}}]\n",
"[{'id': 0, 'distance': 0.7196218371391296, 'entity': {'text': 'Artificial intelligence was founded as an academic discipline in 1956.', 'subject': 'history'}}, {'id': 1, 'distance': 0.6297335028648376, 'entity': {'text': 'Alan Turing was the first person to conduct substantial research in AI.', 'subject': 'history'}}]\n",
"\n",
"\n"
]
}
],
"execution_count": 4
"execution_count": 3
}
],
"metadata": {
Expand Down

0 comments on commit 231a24c

Please sign in to comment.