update

wenerme · May 19, 2024 · 973a206 · 973a206
1 parent 3e836a9
commit 973a206
Show file tree

Hide file tree

Showing 48 changed files with 1,316 additions and 187 deletions.
diff --git a/notes/ai/ai-glossary.md b/notes/ai/ai-glossary.md
@@ -5,17 +5,29 @@ tags:
 
 # AI Glossary
 
-| abbr. | for                                          | cn                   |
-| ----- | -------------------------------------------- | -------------------- |
-| GPT   | Generative Pre-trained Transformer           | 生成型预训练变换模型 |
-| LLM   | Large Language Model                         | 大语言模型           |
-| LoRA  | Language of Rules and Actions                | 语言规则与行动语言   |
-| LLaMa | Large Language Model for Machine Translation | 机器翻译的大语言模型 |
-| RLHF  | Reinforcement Learning from Human Feedback   | 人类反馈强化学习     |
-| SFT   | Supervised Fine-tuning                       | 监督微调             |
-| RM    | Reward / preference modeling                 | 奖励/偏好建模        |
-| SDXL  | Stable Diffusion XL                          |
-| ERP   | erotic role playing                          | 情色角色扮演         |
+| abbr. | for                                             | cn                   |
+| ----- | ----------------------------------------------- | -------------------- |
+| AI    | Artificial Intelligence                         | 人工智能             |
+| ERP   | erotic role playing                             | 情色角色扮演         |
+| GELAN | Generalized Efficient Layer Aggregation Network | 通用高效层聚合网络   |
+| GPT   | Generative Pre-trained Transformer              | 生成型预训练变换模型 |
+| LLaMa | Large Language Model for Machine Translation    | 机器翻译的大语言模型 |
+| LLM   | Large Language Model                            | 大语言模型           |
+| LoRA  | Language of Rules and Actions                   | 语言规则与行动语言   |
+| PGI   | Programmable Gradient Information               | 可编程梯度信息       |
+| RLHF  | Reinforcement Learning from Human Feedback      | 人类反馈强化学习     |
+| RM    | Reward / preference modeling                    | 奖励/偏好建模        |
+| SDXL  | Stable Diffusion XL                             | 稳定扩散 XL          |
+| SFT   | Supervised Fine-tuning                          | 监督微调             |
+| SOTA  | State of the Art                                | 最新技术             |
+| YOLO  | You Only Look Once                              |                      |
+
+| en               | cn       |
+| ---------------- | -------- |
+| Stable Diffusion | 稳定扩散 |
+
+
+## LLM 参数
 
 - temperature
   - 可以控制词元选择的随机性。较低的温度适合希望获得真实或正确回复的提示，而较高的温度可能会引发更加多样化或意想不到的结果。

diff --git a/notes/ai/llm/llama.cpp.md b/notes/ai/llm/llama.cpp.md
@@ -0,0 +1,28 @@
+---
+title: llama.cpp
+---
+
+# llama.cpp
+
+- [ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp)
+  - MIT, C++
+  - LLM inference in C/C++
+
+```bash
+# AlpineLinux py for ML
+apk add \
+  gcc g++ python3 py3-pip musl-dev cmake make pkgconf build-base \
+  git openssh-client binutils coreutils util-linux findutils sed grep tar wget curl neofetch \
+  rust cargo python3-dev openssl-dev linux-headers
+
+# llama.cpp
+# =========
+git clone https://github.com/ggerganov/llama.cpp.git
+cd llama.cpp
+make -j
+
+./main -m ./models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
+./main -m ./models/7B/ggml-model-q4_0.bin --file prompts/alpaca.txt --instruct --ctx_size 2048 --keep -1
+
+./main -m ./models/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
+```
diff --git a/notes/ai/llm/llm-agent.md b/notes/ai/llm/llm-agent.md
@@ -0,0 +1,26 @@
+---
+tags:
+  - Automachine
+---
+
+# Agent
+
+- Components
+  - Tools
+  - Agent Core
+  - Planing
+    - with Feedback
+    - without Feedback
+  - Memory
+    - short
+    - long
+    - hybrid
+- usecase
+  - Conversational
+  - Task Oriented
+  - Creative
+  - Collaborative
+
+---
+
+- https://www.truefoundry.com/blog/llm-agents
diff --git a/notes/ai/llm/llm-glossary.md b/notes/ai/llm/llm-glossary.md
@@ -1,8 +1,11 @@
 ---
 tags:
-- Glossary
+  - Glossary
 ---
 
 # LLM Glossary
 
-
+| en   | for                          | cn                | notes           |
+| ---- | ---------------------------- | ----------------- | --------------- |
+| GGML | GPT-Generated Model Language |                   | Georgi Gerganov |
+| GGUF | GPT-Generated Unified Format | GPT生成的统一格式 |
diff --git a/notes/ai/llm/llm-models.md b/notes/ai/llm/llm-models.md
@@ -5,25 +5,51 @@ tags:
 
 # LLM Models
 
-| model   | year | params | note               |
-| ------- | ---- | ------ | ------------------ |
-| GPT-1   | 2018 | 0.12B  |
-| GPT-2   | 2019 | 1.5B   |
-| GPT-3   | 2020 | 175B   |
-| GPT-3.5 | 2022 |        | ChatGPT,570GB Text |
-| GPT-4   | 2023 |
-| GPT-4V  | 2023 |
+**Proprietary Models**
+
+| model         | date | notes              |
+| ------------- | ---- | ------------------ |
+| GPT-3.5-turbo | 2022 | 4K                 |
+| GPT-3.5-16k   | 2022 | 16K                |
+| GPT-3.5       | 2022 | ChatGPT,570GB Text |
+| GPT-4         | 2023 |
+| GPT-4-32k     | 2023 |
+| GPT-4V        | 2023 |
+| GPT-4o        | 2023 |
+
+**Open Source/Weight Models**
+
+| model   | date | ctx | notes              |
+| ------- | ---- | --- | ------------------ |
+| GPT-1   | 2018 |     | 0.12B              |
+| GPT-2   | 2019 |     | 1.5B               |
+| GPT-3   | 2020 | 2k  | 175B               |
+| LLAMA2  | 2023 | 4K  | by Meta            |
+| LLAMA3  | 2024 | 8K  | by Meta            |
+| phi3    | 2024 |     | by Microsoft       |
+| gemma   | 2024 |     | by Google DeepMind |
+| mistral | 2024 |     | by Mistral AI      |
 
 - https://ollama.com/library
 - 7B - 8GB 内存
 - 13B - 16GB 内存
 - 70B - 32GB 内存
+- 小 context window 适用于 RAG
+- Context Window
+  - LLama-3 8B 8K-1M  https://ollama.com/library/llama3-gradient
+    - 256k context window requires at least 64GB of memory
+    - 1M+ context window requires significantly more (100GB+)
 
 ---
 
+- Leader board
+  - https://huggingface.co/open-llm-leaderboard
+  - https://chat.lmsys.org/?leaderboard
+  - https://www.vellum.ai/llm-leaderboard
 - [google-deepmind/gemma](https://github.com/google-deepmind/gemma)
   - Apache-2.0, Flax, JAX
   - by Google DeepMind
+  - Ultra, Pro, Flash, Nano
   - 2B, 7B
 - llama2
   - 7B, 13B, 70B

diff --git a/notes/ai/llm/ollama.md b/notes/ai/llm/ollama.md
@@ -6,6 +6,7 @@ title: ollama
 
 - [ollama/ollama](https://github.com/ollama/ollama)
   - MIT, Golang
+  - 封装 llama.cpp
 - 参考
   - [ollama/ollama-js](https://github.com/ollama/ollama-js)
     - MIT, TS

diff --git a/notes/ai/ml/README.md b/notes/ai/ml/README.md
@@ -1,7 +1,11 @@
-# 机器学习
+---
+title: 机器学习
+---
 
-## Tips
+# 机器学习
 
+- [训练](./traning.md)
+- [标记](./labeling.md)
 - [Comparing Deep Learning Frameworks](https://www.infoq.com/presentations/comparison-deep-learning-frameworks)
 
 | -                 | [tiny-cnn](https://github.com/nyanp/tiny-cnn) | [caffe](https://github.com/BVLC/caffe)                                        | [Theano](https://github.com/Theano/Theano)       | [TensorFlow](https://www.tensorflow.org/) |

diff --git a/notes/ai/ml/dataset.md b/notes/ai/ml/dataset.md
@@ -0,0 +1,15 @@
+---
+title: Dataset
+---
+
+# Dataset
+
+- https://roboflow.com/formats
+- https://github.com/ultralytics/yolov5/blob/master/data/coco128.yaml
+- coco128
+  - YOLOv5 Tutorial Dataset
+  - https://www.kaggle.com/datasets/ultralytics/coco128
+  - https://github.com/ultralytics/yolov5/blob/master/data/coco128.yaml
+    - https://ultralytics.com/assets/coco128.zip
+- [ultralytics/JSON2YOLO](https://github.com/ultralytics/JSON2YOLO)
+  - Convert JSON annotations into YOLO format
diff --git a/notes/ai/ml/label-studio.md b/notes/ai/ml/label-studio.md
@@ -0,0 +1,99 @@
+---
+title: Label Studio
+---
+
+# Label-studio
+
+- [HumanSignal/label-studio](https://github.com/HumanSignal/label-studio)
+  - Apache-2.0
+  - 数据库: SQLite, PostgreSQL
+  - 存储: S3
+- telementry
+  - COLLECT_ANALYTICS
+- 参考
+  - https://labelstud.io/
+  - 前端 https://github.com/HumanSignal/label-studio/tree/develop/web/libs/editor
+
+```bash
+pip install -U label-studio
+label-studio
+
+# https://hub.docker.com/r/heartexlabs/label-studio
+# https://github.com/HumanSignal/label-studio/blob/develop/docker-compose.yml
+docker run --rm -it \
+  -p 8080:8080 \
+  -v $PWD/data:/label-studio/data \
+  --name label-studio heartexlabs/label-studio
+
+# label-studio --log-level DEBUG
+
+LABEL_STUDIO_BASE_DATA_DIR=$PWD/data \
+LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true \
+LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=$PWD/files \
+  label-studio start
+```
+
+| env                                      | flags                         | default                |
+| ---------------------------------------- | ----------------------------- | ---------------------- |
+| LABEL_STUDIO_DATABASE                    | -db,--database                | label_studio.sqlite3   |
+| LABEL_STUDIO_BASE_DATA_DIR               | --data-dir                    |
+| CONFIG_PATH                              | -c,--config                   | default_config.json    |
+| LABEL_STUDIO_LABEL_CONFIG                | -l,--label-config             | None                   |
+| LABEL_STUDIO_PORT                        | -p,--port                     | 8080                   |
+| LABEL_STUDIO_HOST                        | --host                        |
+| LABEL_STUDIO_PROJECT_DESC                | --initial-project-description |
+| LABEL_STUDIO_PASSWORD                    | --password                    |
+| LABEL_STUDIO_USERNAME                    | --username                    | default_user@localhost |
+| LABEL_STUDIO_USER_TOKEN                  | --user-token                  |
+| LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED |                               | False                  |
+| LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT   |                               | /                      |
+
+## tags
+
+
+```xml
+<View>
+  <TimeSeriesLabels name="label" toName="ts">
+    <Label value="Run"/>
+    <Label value="Walk"/>
+  </TimeSeriesLabels>
+  <HyperText name="video" value="$video" inline="true"/>
+  <TimeSeries name="ts" value="$csv" valueType="url" timeColumn="time_column">
+    <Channel column="first_column"/>
+  </TimeSeries>
+</View>
+
+<!-- {
+    "csv": "/samples/time-series.csv?time=time_column&values=first_column",
+    "video": "<video src='/static/samples/opossum_snow.mp4' width='100%' controls onloadeddata=\"setTimeout(function(){ts=Htx.annotationStore.selected.names.get('ts');t=ts.data.time_column;v=document.getElementsByTagName('video')[0];w=parseInt(t.length*(5/v.duration));l=t.length-w;ts.updateTR([t[0], t[w]], 1.001);r=$=>
+ts.brushRange.map(n=>(+n).toFixed(2));_=r();setInterval($=>r().some((n,i)=>n!==_[i])&&(_=r())&&(v.currentTime=v.duration*(r()[0]-t[0])/(t.slice(-1)[0]-t[0]-(r()[1]-r()[0]))),300); console.log('video is loaded, starting to sync with time series')}, 3000); \" />"
+  } -->
+```
+
+- Video+TimeSerias
+  - https://github.com/HumanSignal/label-studio/issues/4827
+- https://labelstud.io/tags/
+- https://github.com/google-research-datasets/Video-Timeline-Tags-ViTT
+
+## structs
+
+```ts
+interface Obj {
+  id: string;
+
+  data:any
+  value:any
+
+  from_name: string;
+  to_name: string;
+  type: string;
+}
+```
+
+# FAQ
+
+## video frameCount
+
+- framerate 默认 24
+- https://github.com/HumanSignal/label-studio/issues/3315
+- https://labelstud.io/tags/video
diff --git a/notes/ai/ml/labelImg.md b/notes/ai/ml/labelImg.md
@@ -0,0 +1,9 @@
+---
+title: LabelImg
+---
+
+# LabelImg
+
+- ~~[HumanSignal/labelImg](https://github.com/HumanSignal/labelImg)~~
+  - MIT, Python
+  - -> [Label Studio](./label-studio.md)
diff --git a/notes/ai/ml/labeling.md b/notes/ai/ml/labeling.md
@@ -0,0 +1,79 @@
+---
+title: Labeling
+---
+
+# Labeling
+
+- VOC - Visual Object Classes
+  - Pascal VOC
+  - XML
+    - object, name, bndbox
+- COCO
+  - Common Objects in Context
+  - JSON
+    - images, annotations, categories
+    - bbox - `[x, y, w, h]`
+- YOLO - You Only Look Once
+  - `<class_index> <x_center> <y_center> <width> <height>`
+- 参考
+  - https://github.com/KKKSQJ/DeepLearning/tree/master/others/label_convert
+
+## VOC
+
+```xml
+<annotation>
+    <folder>VOC2012</folder>
+    <filename>image1.jpg</filename>
+    <size>
+        <width>800</width>
+        <height>600</height>
+        <depth>3</depth>
+    </size>
+    <object>
+        <name>dog</name>
+        <pose>Unspecified</pose>
+        <truncated>0</truncated>
+        <difficult>0</difficult>
+        <bndbox>
+            <xmin>48</xmin>
+            <ymin>240</ymin>
+            <xmax>195</xmax>
+            <ymax>371</ymax>
+        </bndbox>
+    </object>
+</annotation>
+
+```
+
+## COCO
+
+```json
+{
+  "images": [
+    {
+      "id": 1,
+      "file_name": "image1.jpg",
+      "width": 800,
+      "height": 600
+    }
+  ],
+  "annotations": [
+    {
+      "id": 1,
+      "image_id": 1,
+      "category_id": 18,
+      "bbox": [48, 240, 147, 131],
+      "segmentation": [],
+      "area": 19257,
+      "iscrowd": 0
+    }
+  ],
+  "categories": [
+    {
+      "id": 18,
+      "name": "dog",
+      "supercategory": "animal"
+    }
+  ]
+}
+```