Skip to content

Latest commit

 

History

History
86 lines (58 loc) · 2.63 KB

README.md

File metadata and controls

86 lines (58 loc) · 2.63 KB

Bird for Chat2Query

Test dataset score is 60.98.

Bird

Below are the steps to run bird evaluation

Step 1: Create a new Chat2Query App in TiDBCloud

You have to login in TiDBCloud, and create a Chat2Query DataApp.

Create Chat2Query App Step 1

Create Chat2Query App Step 2

Create Chat2Query App Step 3

Chat2Query Base URL

Save the Base URL, we'll use it in step 5.

Step 2: Create Chat2Query API Key

Create Admin API Key

Save the public key and private key, we'll use it in step 5.

Step 3: Clone the repository and install dependencies

$ git clone https://github.com/tidbcloud/tiinsight
$ cd tiinsight/chat2query_benchmark
$ cd bird
$ pip install -r requirements.txt

Download the bird dataset: https://bird-bench.oss-cn-beijing.aliyuncs.com/dev.zip unzip it in the benchmark_bird/data folder, and make sure the folder name is data, not dev, rename dev.sql to dev_gold.sql.

File structures should like:

$ tree -L 1 data/
data/
├── dev_databases
├── dev_gold.sql
└── dev.json

1 directories, 2 files

Step 4(Optional): Customize the bird evaluation parameters

NOTE By default, we will run the bird evaluation with gpt-4o-mini model. If you want to use gpt-4 model, you need to provide your org_id to us, and we will enable settings api for you. You can do this by sending an email to [email protected].

You can customize the bird evaluation parameters by calling settings API, for example:

export PUBLIC_KEY="<Your Public Key>"
export PRIVATE_KEY="<Your Private Key>"
export BASE_URL="<Your data app endpoint url>"

curl --digest --user ${PUBLIC_KEY}:${PRIVATE_KEY} --request PUT ${BASE_URL}\
 --header 'content-type: application/json' \
 --data-raw '{
    "openai_api_key": "<Your Secret OpenAI API Key>",
    "language": "English",
    "ai_model": "gpt-4"
 }'

Step 5: Edit the runbird.sh script

Replace or paste the BASE_URL, PUBLIC_KEY, PRIVATE_KEY variables in runbird.sh

Step 6: Run the script to generate sql and run evaluation

If you want to run the evaluation with gpt-4 model, make sure you've send the OpenAI API Key and Public Key to us and after we've enabled gpt-4 model for you, you can run the script.

$./runbird.sh