Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
wenerme committed May 19, 2024
1 parent 3e836a9 commit 973a206
Show file tree
Hide file tree
Showing 48 changed files with 1,316 additions and 187 deletions.
34 changes: 23 additions & 11 deletions notes/ai/ai-glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,29 @@ tags:

# AI Glossary

| abbr. | for | cn |
| ----- | -------------------------------------------- | -------------------- |
| GPT | Generative Pre-trained Transformer | 生成型预训练变换模型 |
| LLM | Large Language Model | 大语言模型 |
| LoRA | Language of Rules and Actions | 语言规则与行动语言 |
| LLaMa | Large Language Model for Machine Translation | 机器翻译的大语言模型 |
| RLHF | Reinforcement Learning from Human Feedback | 人类反馈强化学习 |
| SFT | Supervised Fine-tuning | 监督微调 |
| RM | Reward / preference modeling | 奖励/偏好建模 |
| SDXL | Stable Diffusion XL |
| ERP | erotic role playing | 情色角色扮演 |
| abbr. | for | cn |
| ----- | ----------------------------------------------- | -------------------- |
| AI | Artificial Intelligence | 人工智能 |
| ERP | erotic role playing | 情色角色扮演 |
| GELAN | Generalized Efficient Layer Aggregation Network | 通用高效层聚合网络 |
| GPT | Generative Pre-trained Transformer | 生成型预训练变换模型 |
| LLaMa | Large Language Model for Machine Translation | 机器翻译的大语言模型 |
| LLM | Large Language Model | 大语言模型 |
| LoRA | Language of Rules and Actions | 语言规则与行动语言 |
| PGI | Programmable Gradient Information | 可编程梯度信息 |
| RLHF | Reinforcement Learning from Human Feedback | 人类反馈强化学习 |
| RM | Reward / preference modeling | 奖励/偏好建模 |
| SDXL | Stable Diffusion XL | 稳定扩散 XL |
| SFT | Supervised Fine-tuning | 监督微调 |
| SOTA | State of the Art | 最新技术 |
| YOLO | You Only Look Once | |

| en | cn |
| ---------------- | -------- |
| Stable Diffusion | 稳定扩散 |


## LLM 参数

- temperature
- 可以控制词元选择的随机性。较低的温度适合希望获得真实或正确回复的提示,而较高的温度可能会引发更加多样化或意想不到的结果。
Expand Down
28 changes: 28 additions & 0 deletions notes/ai/llm/llama.cpp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: llama.cpp
---

# llama.cpp

- [ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp)
- MIT, C++
- LLM inference in C/C++

```bash
# AlpineLinux py for ML
apk add \
gcc g++ python3 py3-pip musl-dev cmake make pkgconf build-base \
git openssh-client binutils coreutils util-linux findutils sed grep tar wget curl neofetch \
rust cargo python3-dev openssl-dev linux-headers

# llama.cpp
# =========
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make -j

./main -m ./models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
./main -m ./models/7B/ggml-model-q4_0.bin --file prompts/alpaca.txt --instruct --ctx_size 2048 --keep -1

./main -m ./models/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
```
26 changes: 26 additions & 0 deletions notes/ai/llm/llm-agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
tags:
- Automachine
---

# Agent

- Components
- Tools
- Agent Core
- Planing
- with Feedback
- without Feedback
- Memory
- short
- long
- hybrid
- usecase
- Conversational
- Task Oriented
- Creative
- Collaborative

---

- https://www.truefoundry.com/blog/llm-agents
7 changes: 5 additions & 2 deletions notes/ai/llm/llm-glossary.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
---
tags:
- Glossary
- Glossary
---

# LLM Glossary


| en | for | cn | notes |
| ---- | ---------------------------- | ----------------- | --------------- |
| GGML | GPT-Generated Model Language | | Georgi Gerganov |
| GGUF | GPT-Generated Unified Format | GPT生成的统一格式 |
42 changes: 34 additions & 8 deletions notes/ai/llm/llm-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,25 +5,51 @@ tags:

# LLM Models

| model | year | params | note |
| ------- | ---- | ------ | ------------------ |
| GPT-1 | 2018 | 0.12B |
| GPT-2 | 2019 | 1.5B |
| GPT-3 | 2020 | 175B |
| GPT-3.5 | 2022 | | ChatGPT,570GB Text |
| GPT-4 | 2023 |
| GPT-4V | 2023 |
**Proprietary Models**

| model | date | notes |
| ------------- | ---- | ------------------ |
| GPT-3.5-turbo | 2022 | 4K |
| GPT-3.5-16k | 2022 | 16K |
| GPT-3.5 | 2022 | ChatGPT,570GB Text |
| GPT-4 | 2023 |
| GPT-4-32k | 2023 |
| GPT-4V | 2023 |
| GPT-4o | 2023 |

**Open Source/Weight Models**

| model | date | ctx | notes |
| ------- | ---- | --- | ------------------ |
| GPT-1 | 2018 | | 0.12B |
| GPT-2 | 2019 | | 1.5B |
| GPT-3 | 2020 | 2k | 175B |
| LLAMA2 | 2023 | 4K | by Meta |
| LLAMA3 | 2024 | 8K | by Meta |
| phi3 | 2024 | | by Microsoft |
| gemma | 2024 | | by Google DeepMind |
| mistral | 2024 | | by Mistral AI |

- https://ollama.com/library
- 7B - 8GB 内存
- 13B - 16GB 内存
- 70B - 32GB 内存
- 小 context window 适用于 RAG
- Context Window
- LLama-3 8B 8K-1M https://ollama.com/library/llama3-gradient
- 256k context window requires at least 64GB of memory
- 1M+ context window requires significantly more (100GB+)

---

- Leader board
- https://huggingface.co/open-llm-leaderboard
- https://chat.lmsys.org/?leaderboard
- https://www.vellum.ai/llm-leaderboard
- [google-deepmind/gemma](https://github.com/google-deepmind/gemma)
- Apache-2.0, Flax, JAX
- by Google DeepMind
- Ultra, Pro, Flash, Nano
- 2B, 7B
- llama2
- 7B, 13B, 70B
Expand Down
1 change: 1 addition & 0 deletions notes/ai/llm/ollama.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ title: ollama

- [ollama/ollama](https://github.com/ollama/ollama)
- MIT, Golang
- 封装 llama.cpp
- 参考
- [ollama/ollama-js](https://github.com/ollama/ollama-js)
- MIT, TS
Expand Down
8 changes: 6 additions & 2 deletions notes/ai/ml/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
# 机器学习
---
title: 机器学习
---

## Tips
# 机器学习

- [训练](./traning.md)
- [标记](./labeling.md)
- [Comparing Deep Learning Frameworks](https://www.infoq.com/presentations/comparison-deep-learning-frameworks)

| - | [tiny-cnn](https://github.com/nyanp/tiny-cnn) | [caffe](https://github.com/BVLC/caffe) | [Theano](https://github.com/Theano/Theano) | [TensorFlow](https://www.tensorflow.org/) |
Expand Down
15 changes: 15 additions & 0 deletions notes/ai/ml/dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
title: Dataset
---

# Dataset

- https://roboflow.com/formats
- https://github.com/ultralytics/yolov5/blob/master/data/coco128.yaml
- coco128
- YOLOv5 Tutorial Dataset
- https://www.kaggle.com/datasets/ultralytics/coco128
- https://github.com/ultralytics/yolov5/blob/master/data/coco128.yaml
- https://ultralytics.com/assets/coco128.zip
- [ultralytics/JSON2YOLO](https://github.com/ultralytics/JSON2YOLO)
- Convert JSON annotations into YOLO format
99 changes: 99 additions & 0 deletions notes/ai/ml/label-studio.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
title: Label Studio
---

# Label-studio

- [HumanSignal/label-studio](https://github.com/HumanSignal/label-studio)
- Apache-2.0
- 数据库: SQLite, PostgreSQL
- 存储: S3
- telementry
- COLLECT_ANALYTICS
- 参考
- https://labelstud.io/
- 前端 https://github.com/HumanSignal/label-studio/tree/develop/web/libs/editor

```bash
pip install -U label-studio
label-studio

# https://hub.docker.com/r/heartexlabs/label-studio
# https://github.com/HumanSignal/label-studio/blob/develop/docker-compose.yml
docker run --rm -it \
-p 8080:8080 \
-v $PWD/data:/label-studio/data \
--name label-studio heartexlabs/label-studio

# label-studio --log-level DEBUG

LABEL_STUDIO_BASE_DATA_DIR=$PWD/data \
LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true \
LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=$PWD/files \
label-studio start
```

| env | flags | default |
| ---------------------------------------- | ----------------------------- | ---------------------- |
| LABEL_STUDIO_DATABASE | -db,--database | label_studio.sqlite3 |
| LABEL_STUDIO_BASE_DATA_DIR | --data-dir |
| CONFIG_PATH | -c,--config | default_config.json |
| LABEL_STUDIO_LABEL_CONFIG | -l,--label-config | None |
| LABEL_STUDIO_PORT | -p,--port | 8080 |
| LABEL_STUDIO_HOST | --host |
| LABEL_STUDIO_PROJECT_DESC | --initial-project-description |
| LABEL_STUDIO_PASSWORD | --password |
| LABEL_STUDIO_USERNAME | --username | default_user@localhost |
| LABEL_STUDIO_USER_TOKEN | --user-token |
| LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED | | False |
| LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT | | / |

## tags


```xml
<View>
<TimeSeriesLabels name="label" toName="ts">
<Label value="Run"/>
<Label value="Walk"/>
</TimeSeriesLabels>
<HyperText name="video" value="$video" inline="true"/>
<TimeSeries name="ts" value="$csv" valueType="url" timeColumn="time_column">
<Channel column="first_column"/>
</TimeSeries>
</View>

<!-- {
"csv": "/samples/time-series.csv?time=time_column&values=first_column",
"video": "<video src='/static/samples/opossum_snow.mp4' width='100%' controls onloadeddata=\"setTimeout(function(){ts=Htx.annotationStore.selected.names.get('ts');t=ts.data.time_column;v=document.getElementsByTagName('video')[0];w=parseInt(t.length*(5/v.duration));l=t.length-w;ts.updateTR([t[0], t[w]], 1.001);r=$=>
ts.brushRange.map(n=>(+n).toFixed(2));_=r();setInterval($=>r().some((n,i)=>n!==_[i])&&(_=r())&&(v.currentTime=v.duration*(r()[0]-t[0])/(t.slice(-1)[0]-t[0]-(r()[1]-r()[0]))),300); console.log('video is loaded, starting to sync with time series')}, 3000); \" />"
} -->
```

- Video+TimeSerias
- https://github.com/HumanSignal/label-studio/issues/4827
- https://labelstud.io/tags/
- https://github.com/google-research-datasets/Video-Timeline-Tags-ViTT

## structs

```ts
interface Obj {
id: string;

data:any
value:any

from_name: string;
to_name: string;
type: string;
}
```

# FAQ

## video frameCount

- framerate 默认 24
- https://github.com/HumanSignal/label-studio/issues/3315
- https://labelstud.io/tags/video
9 changes: 9 additions & 0 deletions notes/ai/ml/labelImg.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
title: LabelImg
---

# LabelImg

- ~~[HumanSignal/labelImg](https://github.com/HumanSignal/labelImg)~~
- MIT, Python
- -> [Label Studio](./label-studio.md)
79 changes: 79 additions & 0 deletions notes/ai/ml/labeling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
title: Labeling
---

# Labeling

- VOC - Visual Object Classes
- Pascal VOC
- XML
- object, name, bndbox
- COCO
- Common Objects in Context
- JSON
- images, annotations, categories
- bbox - `[x, y, w, h]`
- YOLO - You Only Look Once
- `<class_index> <x_center> <y_center> <width> <height>`
- 参考
- https://github.com/KKKSQJ/DeepLearning/tree/master/others/label_convert

## VOC

```xml
<annotation>
<folder>VOC2012</folder>
<filename>image1.jpg</filename>
<size>
<width>800</width>
<height>600</height>
<depth>3</depth>
</size>
<object>
<name>dog</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>48</xmin>
<ymin>240</ymin>
<xmax>195</xmax>
<ymax>371</ymax>
</bndbox>
</object>
</annotation>

```

## COCO

```json
{
"images": [
{
"id": 1,
"file_name": "image1.jpg",
"width": 800,
"height": 600
}
],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 18,
"bbox": [48, 240, 147, 131],
"segmentation": [],
"area": 19257,
"iscrowd": 0
}
],
"categories": [
{
"id": 18,
"name": "dog",
"supercategory": "animal"
}
]
}
```
Loading

0 comments on commit 973a206

Please sign in to comment.