Skip to content

Commit

Permalink
Implement Llama 3.2 (#383)
Browse files Browse the repository at this point in the history
  • Loading branch information
rasbt authored Oct 5, 2024
1 parent a5405c2 commit b44096a
Show file tree
Hide file tree
Showing 5 changed files with 8,874 additions and 6 deletions.
8 changes: 6 additions & 2 deletions ch05/07_gpt_to_llama/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@



This folder contains code for converting the GPT implementation from chapter 4 and 5 to Meta AI's Llama architecture:
This folder contains code for converting the GPT implementation from chapter 4 and 5 to Meta AI's Llama architecture in the following recommended reading order:

- [converting-gpt-to-llama2.ipynb](converting-gpt-to-llama2.ipynb): contains code to convert GPT to Llama 2 7B step by step and loads pretrained weights from Meta AI
- [converting-gpt-to-llama2.ipynb](converting-gpt-to-llama2.ipynb): contains code to convert GPT to Llama 2 7B step by step and loads pretrained weights from Meta AI
- [converting-llama2-to-llama3.ipynb](converting-llama2-to-llama3.ipynb): contains code to convert the Llama 2 model to Llama 3, Llama 3.1, and Llama 3.2
- [standalone-llama32.ipynb](standalone-llama32.ipynb): a standalone notebook implementing Llama 3.2

<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/gpt-to-llama/gpt-and-all-llamas.webp">
42 changes: 40 additions & 2 deletions ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@
"id": "UJJneXpTEg4W"
},
"source": [
"&nbsp;\n",
"# 1. Convert the GPT model implementation step by step"
]
},
Expand All @@ -129,6 +130,7 @@
"id": "979c7b6d-1370-4da1-8bfb-a2b27537bf2f"
},
"source": [
"&nbsp;\n",
"## 1.1 Replace LayerNorm with RMSNorm layer"
]
},
Expand Down Expand Up @@ -228,6 +230,7 @@
"id": "5eb81f83-c38c-46a4-b763-aa630a32e357"
},
"source": [
"&nbsp;\n",
"## 1.2 Replace GELU with SiLU activation"
]
},
Expand Down Expand Up @@ -300,6 +303,7 @@
"id": "4f9b5167-1da9-46c8-9964-8036b3b1deb9"
},
"source": [
"&nbsp;\n",
"## 1.3 Update the FeedForward module"
]
},
Expand Down Expand Up @@ -388,6 +392,7 @@
"id": "f6b7bf4f-99d0-42c1-807c-5074d2cc1949"
},
"source": [
"&nbsp;\n",
"## 1.4 Implement RoPE"
]
},
Expand Down Expand Up @@ -503,6 +508,7 @@
"id": "f78127b0-dda2-4c5a-98dd-bae8f5fe8297"
},
"source": [
"&nbsp;\n",
"## 1.5 Add RoPE to MultiHeadAttention module"
]
},
Expand Down Expand Up @@ -652,6 +658,7 @@
"id": "e5a1a272-a038-4b8f-aaaa-f4b241e7f23f"
},
"source": [
"&nbsp;\n",
"## 1.6 Update the TransformerBlock module"
]
},
Expand Down Expand Up @@ -727,6 +734,7 @@
"id": "ada953bc-e2c0-4432-a32d-3f7efa3f6e0f"
},
"source": [
"&nbsp;\n",
"## 1.7 Update the model class"
]
},
Expand Down Expand Up @@ -791,6 +799,7 @@
"id": "4bc94940-aaeb-45b9-9399-3a69b8043e60"
},
"source": [
"&nbsp;\n",
"## 2. Initialize model"
]
},
Expand Down Expand Up @@ -1029,6 +1038,7 @@
"id": "5dc64a06-27dc-46ec-9e6d-1700a8227d34"
},
"source": [
"&nbsp;\n",
"## 3. Load tokenizer"
]
},
Expand Down Expand Up @@ -1288,6 +1298,7 @@
"id": "f63cc248-1d27-4eb6-aa50-173b436652f8"
},
"source": [
"&nbsp;\n",
"## 4. Load pretrained weights"
]
},
Expand Down Expand Up @@ -1544,14 +1555,23 @@
"print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
]
},
{
"cell_type": "markdown",
"id": "d72ed949-b6c0-4966-922f-eb0da732c404",
"metadata": {},
"source": [
"&nbsp;\n",
"## 5. Using the instruction-finetuned model"
]
},
{
"cell_type": "markdown",
"id": "akyo7WNyF_YL",
"metadata": {
"id": "akyo7WNyF_YL"
},
"source": [
"- Tip: as mentioned earlier, this is the pretrained base model; if you want to use a model capable of following instructions, use the `\"meta-llama/Llama-2-7b-chat\"` model instead"
"- As mentioned earlier, above we used the pretrained base model; if you want to use a model capable of following instructions, use the `\"meta-llama/Llama-2-7b-chat\"` model instead, as shown below"
]
},
{
Expand Down Expand Up @@ -1630,6 +1650,24 @@
"\n",
"print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))"
]
},
{
"cell_type": "markdown",
"id": "0f693da1-a07c-4e1d-af5a-c3923525f1e2",
"metadata": {},
"source": [
"&nbsp;\n",
"# What's next?"
]
},
{
"cell_type": "markdown",
"id": "fae93739-ca12-46ba-8ca7-7c07c59f669b",
"metadata": {},
"source": [
"- This notebook converted the original GPT-2 architecture into a Llama 2 model\n",
"- If you are interested in how to convert Llama 2 into Llama 3, Llama 3.1, and Llama 3.2, check out the [converting-llama2-to-llama3.ipynb](converting-llama2-to-llama3.ipynb) notebook"
]
}
],
"metadata": {
Expand All @@ -1653,7 +1691,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.11.4"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
Expand Down
Loading

0 comments on commit b44096a

Please sign in to comment.