Skip to content

Commit

Permalink
update readne
Browse files Browse the repository at this point in the history
  • Loading branch information
Yongtae723 committed Sep 30, 2023
1 parent ffd0337 commit 6388901
Showing 1 changed file with 123 additions and 69 deletions.
192 changes: 123 additions & 69 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,24 @@
# LLMFlowOptimizer (WIP)
# LLMFlowOptimizer

![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)
![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg)
[![python](https://img.shields.io/badge/-Python_3.9_%7C_3.10-blue?logo=python&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![hydra](https://img.shields.io/badge/Config-Hydra_1.3-89b8cd)](https://hydra.cc/)
[![Optuna](https://img.shields.io/badge/Optimize-Optuna-blue)](https://github.com/optuna/optuna)
[![black](https://img.shields.io/badge/Code%20Style-Black-black.svg?labelColor=gray)](https://black.readthedocs.io/en/stable/)
[![isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)

## 📌 Introduction

In recent years, various LLMs, embedding models, and LLM flows utilizing them have been proposed, making it difficult to manually verify which flow or component is optimal.

This repository aims to treat LLMs and Embeddings as a hyperparameter, with the goal of automatically searching for the optimal hyperparameter of the LLM flow.

**Below image is the concept image of this repository (image from [Flowise](https://github.com/FlowiseAI/Flowise)). Component of LangChain like a LLM or Embedding can be treated as hyperparameter. You can find optimal component from various candidate.**
![concept_image](documents/concept.png)
**Below image is the concept image of this repository (image is took from [Flowise](https://github.com/FlowiseAI/Flowise) and slightly modified). Component of LangChain like a LLM or Embedding can be treated as hyperparameter. You will find component from various candidate that can optimize score.**
![concept_image](documents/image/concept.png)

This repository is strongly inspired by [lightning-hydra-template](https://github.com/ashleve/lightning-hydra-template)🎉

Any feedback, bug reports, and suggestions are appreciated!

## 🔧 Main Technologies

- [LangChain](https://github.com/langchain-ai/langchain) : LangChain is a framework for developing applications powered by large language models. It can be divided conceptually into components (like llm component, embedding component), which are well-abstracted and easy to switch. This is the reason we can treat each component as hyperparameter.
Expand All @@ -26,29 +31,37 @@ This repository is strongly inspired by [lightning-hydra-template](https://githu

- [ragas](https://github.com/explodinggradients/ragas) : Ragas is an evaluation framework for Retrieval Augmented Generation (RAG) pipelines that provides tools based on the latest research for evaluating LLM-generated text to give insights about the RAG pipeline.

# How to use
# 🚀 How to use

I will explain how to use this repository by using question answering as an example. In example, we use

- **LangChain** : for model
- **Ragas** : for evaluation
- **LangSmith** : for evaluation and monitoring

## Step 0 : Environment Setup

Please click [<kbd>Use this template</kbd>](https://github.com/Yongtae723/LLMFlowOptimizer/generate) to use this repository as template.

Clone your repository and install dependencies by following command.
- Clone your repository and install dependencies by following command.

```bash
git clone [email protected]:<YOUR_GITHUB_ID>/<YOUR_REPOSITORY_NAME>.git
cd <YOUR_REPOSITORY_NAME>
```
```bash
git clone [email protected]:<YOUR_GITHUB_ID>/<YOUR_REPOSITORY_NAME>.git
cd <YOUR_REPOSITORY_NAME>
```

## setup
- We use poetry for package management. you can install poetry and python packages by following command.

We use poetry for package management. you can install poetry and python packages by following command.
```bash
pip install poetry
poetry install
```

```bash
pip install poetry
poetry install
```
- Rename `.env.example` to `.env` and set your environment variables to use external API like OpenAI ChatGPT.

## define model architect and config.
## Step 1 : Define model architect and config.

1. Define model architect like [llmflowoptimizer/modelsample_qa.py](llmflowoptimizer/model/sample_qa.py).
1. Define model architect like [llmflowoptimizer/component/model/sample_qa.py](llmflowoptimizer/component/model/sample_qa.py).

The arguments in `__init__()` can be used as hyperparameter and will be able to be optimized.

Expand All @@ -57,38 +70,72 @@ poetry install
Example model config:

```yaml
defaults: # you can use yaml file on this directly.
- _self_
- embedding: OpenAI
- text_splitter: RecursiveCharacter
- llm_for_answer: ChatOpenAI
defaults:
- _self_
- embedding: OpenAI
- text_splitter: RecursiveCharacter
- llm: OpenAI
_target_: llmflowoptimizer.component.model.sample_qa.SampleQA # what we defined on llmflowoptimizer/component/model/sample_qa.py
data_path: ${paths.reference_data_dir}/nyc_wikipedia.txt
```
_target_: llmflowoptimizer.models.sample_qa.SampleQA # what we defined on llmflowoptimizer/models/sample_qa.py
3. Then you can check your model and config by following command.
data_path: ${paths.reference_data_dir}/nyc_wikipedia.txt
```bash
poetry run python llmflowoptimizer/run.py extras.evaluation=false
```

3. then you can check your model and config by following command.
## Step 2 : Prepare dataset and define evaluation system

```bash
poetry run python llmflowoptimizer/run.py extras.evaluation=false extras.print_config=true
```
- register data on LangSmith (If you want to use LangSmith)

## define evaluation system
In our example, we use [LangSmith](https://docs.smith.langchain.com/) for evaluation and monitoring. You can register data on LangSmith by following [example notebook](notebooks/register_data_langsmith.ipynb)

1. Define evaluation system like [llmflowoptimizer/model/evaluation.py](llmflowoptimizer/model/evaluation.py), and set argument on [configs/evaluation](configs/model).
- Define evaluation system like [llmflowoptimizer/model/evaluation.py](llmflowoptimizer/model/evaluation.py), and set argument on [configs/evaluation](configs/model).

Optuna will optimize component based on the return value of this evaluation system.
Optuna will optimize component based on the return value of this evaluation system.

2. You can check your evaluation system by following command.
- You can check your evaluation system by following command.

```bash
poetry run python llmflowoptimizer/run.py
```
```bash
poetry run python llmflowoptimizer/run.py
```

- Since we use Hydra, you can change each LLM flow component by command line argument. For more detail, please check [Override](#override) section.

## Step 3 : Hyperparameter search

- Define search requirement on [configs/hparams_search](configs/hparams_search) like example.

```yaml
model/text_splitter: choice(RecursiveCharacter, CharacterTextSplitter)
model.text_splitter.chunk_size: range(500, 1500, 100)
model/llm: choice(OpenAI, GPTTurbo, GPT4)
```
This example if a part of [configs/hparams_search/optuna.yaml](configs/hparams_search/optuna.yaml), and it means this system will search best hyperparameter from `RecursiveCharacter` or `CharacterTextSplitter` for `model.text_splitter` component, chunk_size is between 500 and 1500, and `OpenAI`, `GPTTurbo`, `GPT4` for `model.llm` component.

Also complicated search range can be defined by python like [configs/hparams_search/custom-search-space-objective.py](configs/hparams_search/custom-search-space-objective.py)

- You can start hyperparameter search by following command.

```bash
poetry run python llmflowoptimizer/run.py hparams_search=optuna
```

Then you can see the best parameter on `logs/{task_name}/multirruns/{timestamp}/optimization_results.yaml`.

## dataset
And you can see the detail of each trial on LangSmith like below.

you can update sample data by following [this notebook](notebooks/register_data_langsmith.ipynb)
![langsmith_each_trial](documents/image/each_trial.png)

This figure shows the result of each runs.

![langsmith_each_input_output](documents/image/each_input_output.png)

This figure shows the input and output of each questions.

# 📚 Appendix

## override

Expand All @@ -100,7 +147,7 @@ When you want to change single parameter, you have to use `.` between parameter
Example:

```bash
poetry run python llmflowoptimizer/run.py model.llm_for_answer.model_name="gpt-4"
poetry run python llmflowoptimizer/run.py model.text_splitter.chunk_size=1000
```

By doing this, you change `model_name` parameter of `llm_for_answer` to gpt-4.
Expand All @@ -113,7 +160,7 @@ when you want to change parameter in component scale, you have to define compone
poetry run python llmflowoptimizer/run.py model/llm_for_answer=OpenAI
```

By doing this, LLM flow use [`OpenAI.yaml`](configs/model/llm_for_answer/OpenAI.yaml) model instead of [`ChatOpenAI.yaml`](configs/model/llm_for_answer/ChatOpenAI.yaml) model.
By doing this, LLM flow use [`OpenAI.yaml`](configs/model/llm_for_answer/OpenAI.yaml) model instead of [`ChatOpenAI.yaml`](configs/model/llm_for_answer/GPT4.yaml) model.

### Experiment config

Expand All @@ -125,18 +172,7 @@ after you save config [configs/experiment](configs/experiment), you can override
poetry run python llmflowoptimizer/run.py experiment=example
```

## Hyperparameter search

WIP

## managing experimental results

WIP
maybe use langflow?

## Appendix

### Test
## Test

Before you make PR, you have to test your code by following command.
In sample test, we only check model and evaluation class can be initialized.
Expand All @@ -145,7 +181,7 @@ In sample test, we only check model and evaluation class can be initialized.
make test
```

### Auto code formatting
## Auto code formatting

If you want auto code formatting, you can install by

Expand All @@ -161,18 +197,36 @@ Also you can format code manually by
make fix-lint
```

# TODO

- \[ \] .envの説明
- [ ] Hyperparameter search by optuna
- [ ] manage experiment config (langsmith? WandB?)
- [ ] actual experiment result (ask specialist)
- [ ] write blog
- [ ] add test function
- [ ] reduce data and execute e2e test
- [ ] git action
- \[ \]lint-check
- [ ] test
- [ ] model can be build from GUI langchain builder
- [ ] langflow
- [ ] Flowise
<br>
<br>
<br>
<br>

**DELETE EVERYTHING ABOVE FOR YOUR PROJECT**

______________________________________________________________________

<div align="center">

# Your Project Name

[![python](https://img.shields.io/badge/-Python_3.9_%7C_3.10-blue?logo=python&logoColor=white)](https://github.com/pre-commit/pre-commit)
[![hydra](https://img.shields.io/badge/Config-Hydra_1.3-89b8cd)](https://hydra.cc/)
[![Optuna](https://img.shields.io/badge/Optimize-Optuna-blue)](https://github.com/optuna/optuna)
[![black](https://img.shields.io/badge/Code%20Style-Black-black.svg?labelColor=gray)](https://black.readthedocs.io/en/stable/)
[![isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
<a href="https://github.com/Yongtae723/LLM-Flow-Optimizer"><img alt="Template" src="https://img.shields.io/badge/-LLMFlowOptimizer-blue?style=flat&logo=github&labelColor=gray"></a><br>

</div>

## Description

What it does and what the purpose of the project is.

## Installation

How to install the project.

## How to run

How to run the project.

0 comments on commit 6388901

Please sign in to comment.