Implement Llama 3.2 (#383)

rasbt · Oct 5, 2024 · b44096a · b44096a
1 parent a5405c2
commit b44096a
Show file tree

Hide file tree

Showing 5 changed files with 8,874 additions and 6 deletions.
diff --git a/ch05/07_gpt_to_llama/README.md b/ch05/07_gpt_to_llama/README.md
@@ -2,6 +2,10 @@
 
 
 
-This folder contains code for converting the GPT implementation from chapter 4 and 5 to Meta AI's Llama architecture:
+This folder contains code for converting the GPT implementation from chapter 4 and 5 to Meta AI's Llama architecture in the following recommended reading order:
 
-- [converting-gpt-to-llama2.ipynb](converting-gpt-to-llama2.ipynb): contains code to convert GPT to Llama 2 7B step by step and loads pretrained weights from Meta AI
+- [converting-gpt-to-llama2.ipynb](converting-gpt-to-llama2.ipynb): contains code to convert GPT to Llama 2 7B step by step and loads pretrained weights from Meta AI
+- [converting-llama2-to-llama3.ipynb](converting-llama2-to-llama3.ipynb): contains code to convert the Llama 2 model to Llama 3, Llama 3.1, and Llama 3.2
+- [standalone-llama32.ipynb](standalone-llama32.ipynb): a standalone notebook implementing Llama 3.2
+
+<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/gpt-to-llama/gpt-and-all-llamas.webp">
diff --git a/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb b/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb
@@ -108,6 +108,7 @@
     "id": "UJJneXpTEg4W"
    },
    "source": [
+    "&nbsp;\n",
     "# 1. Convert the GPT model implementation step by step"
    ]
   },
@@ -129,6 +130,7 @@
     "id": "979c7b6d-1370-4da1-8bfb-a2b27537bf2f"
    },
    "source": [
+    "&nbsp;\n",
     "## 1.1 Replace LayerNorm with RMSNorm layer"
    ]
   },
@@ -228,6 +230,7 @@
     "id": "5eb81f83-c38c-46a4-b763-aa630a32e357"
    },
    "source": [
+    "&nbsp;\n",
     "## 1.2 Replace GELU with SiLU activation"
    ]
   },
@@ -300,6 +303,7 @@
     "id": "4f9b5167-1da9-46c8-9964-8036b3b1deb9"
    },
    "source": [
+    "&nbsp;\n",
     "## 1.3 Update the FeedForward module"
    ]
   },
@@ -388,6 +392,7 @@
     "id": "f6b7bf4f-99d0-42c1-807c-5074d2cc1949"
    },
    "source": [
+    "&nbsp;\n",
     "## 1.4 Implement RoPE"
    ]
   },
@@ -503,6 +508,7 @@
     "id": "f78127b0-dda2-4c5a-98dd-bae8f5fe8297"
    },
    "source": [
+    "&nbsp;\n",
     "## 1.5 Add RoPE to MultiHeadAttention module"
    ]
   },
@@ -652,6 +658,7 @@
     "id": "e5a1a272-a038-4b8f-aaaa-f4b241e7f23f"
    },
    "source": [
+    "&nbsp;\n",
     "## 1.6 Update the TransformerBlock module"
    ]
   },
@@ -727,6 +734,7 @@
     "id": "ada953bc-e2c0-4432-a32d-3f7efa3f6e0f"
    },
    "source": [
+    "&nbsp;\n",
     "## 1.7 Update the model class"
    ]
   },
@@ -791,6 +799,7 @@
     "id": "4bc94940-aaeb-45b9-9399-3a69b8043e60"
    },
    "source": [
+    "&nbsp;\n",
     "## 2. Initialize model"
    ]
   },
@@ -1029,6 +1038,7 @@
     "id": "5dc64a06-27dc-46ec-9e6d-1700a8227d34"
    },
    "source": [
+    "&nbsp;\n",
     "## 3. Load tokenizer"
    ]
   },
@@ -1288,6 +1298,7 @@
     "id": "f63cc248-1d27-4eb6-aa50-173b436652f8"
    },
    "source": [
+    "&nbsp;\n",
     "## 4. Load pretrained weights"
    ]
   },
@@ -1544,14 +1555,23 @@
     "print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "d72ed949-b6c0-4966-922f-eb0da732c404",
+   "metadata": {},
+   "source": [
+    "&nbsp;\n",
+    "## 5. Using the instruction-finetuned model"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "akyo7WNyF_YL",
    "metadata": {
     "id": "akyo7WNyF_YL"
    },
    "source": [
-    "- Tip: as mentioned earlier, this is the pretrained base model; if you want to use a model capable of following instructions, use the `\"meta-llama/Llama-2-7b-chat\"` model instead"
+    "- As mentioned earlier, above we used the pretrained base model; if you want to use a model capable of following instructions, use the `\"meta-llama/Llama-2-7b-chat\"` model instead, as shown below"
    ]
   },
   {
@@ -1630,6 +1650,24 @@
     "\n",
     "print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f693da1-a07c-4e1d-af5a-c3923525f1e2",
+   "metadata": {},
+   "source": [
+    "&nbsp;\n",
+    "# What's next?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fae93739-ca12-46ba-8ca7-7c07c59f669b",
+   "metadata": {},
+   "source": [
+    "- This notebook converted the original GPT-2 architecture into a Llama 2 model\n",
+    "- If you are interested in how to convert Llama 2 into Llama 3, Llama 3.1, and Llama 3.2, check out the [converting-llama2-to-llama3.ipynb](converting-llama2-to-llama3.ipynb) notebook"
+   ]
   }
  ],
  "metadata": {
@@ -1653,7 +1691,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.11.4"
   },
   "widgets": {
    "application/vnd.jupyter.widget-state+json": {