Skip to content

Latest commit

 

History

History
198 lines (147 loc) · 8.04 KB

README.md

File metadata and controls

198 lines (147 loc) · 8.04 KB

Define functions with code, data, or natural language description

pkg.go.dev Go Report Card pipeline status coverage report

A Go package provides high-level abstraction to define functions with code (the usual way), data (providing examples of inputs and expected outputs which are then used with an AI model), or natural language description. It is the simplest but powerful way to use large language models (LLMs) in Go.

Features:

  • A common interface to support both code-defined, data-defined, and description-defined functions.
  • Functions are strongly typed so inputs and outputs can be Go structs and values.
  • Provides unofficial OpenAI, Groq, Anthropic and Ollama integrations for AI (LLM) models.
  • Support for tool calling which transparently calls into Go functions with Go structs and values as inputs and outputs. Recursion possible.
  • Uses adaptive rate limiting to maximize throughput of API calls made to integrated AI models.
  • Provides a CLI tool fun which makes it easy to run data-defined and description-defined functions on files.

Installation

This is a Go package. You can add it to your project using go get:

go get gitlab.com/tozd/go/fun

It requires Go 1.23 or newer.

Releases page contains a list of stable versions of the fun tool. Each includes:

  • Statically compiled binaries.
  • Docker images.

You should just download/use the latest one.

The tool is implemented in Go. You can also use go install to install the latest stable (released) version:

go install gitlab.com/tozd/go/fun/cmd/go/fun@latest

To install the latest development version (main branch):

go install gitlab.com/tozd/go/fun/cmd/go/fun@main

Usage

As a package

See full package documentation with examples on pkg.go.dev.

fun tool

fun tool calls a function on files. You can provide:

  • Examples of inputs and expected outputs as files (as pairs of files with same basename but different file extensions).
  • Natural language description of the function, a prompt.
  • Input files on which to run the function.
  • Files with input and output JSON Schemas to validate inputs and outputs, respectively.

You have to provide example inputs and outputs or a prompt, and you can provide both.

fun has two sub-commands:

  • extract supports extracting parts of one JSON into multiple files using GJSON query. Because fun calls the function on files this is useful to preprocess a large JSON file to create files to then call the function on.
    • The query should return an array of objects with ID and data fields (by default named id and data).
  • call then calls the function on files in the input directory and writes results into files in the output directory.
    • Corresponding output files will have the same basename as input files but with the output file extension (configurable) so it is safe to use the same directory both for input and output files.
    • fun calls the function only for files which do not yet exist in the output directory so it is safe to run fun multiple times if previous run of fun had issues or was interrupted.
    • fun supports splitting input files into batches so one run of fun can operate only on a particular batch. Useful if you want to distribute execution across multiple machines.
    • If output fails to validate the JSON Schema, the output is stored into a file with additional suffix .invalid. If calling the function fails for some other reason, the error is stored into a file with additional suffix .error.
  • combine combines multiple input directories into one output directory with only those files which are equal in all input directories.
    • Provided input directories should be outputs from different models or different configurations but all run on same input files.
    • This allows decreasing false positives at the expense of having less outputs overall.

For details on all CLI arguments possible, run fun --help:

fun --help

If you have Go available, you can run it without installation:

go run gitlab.com/tozd/go/fun/cmd/go/fun@latest --help

Or with Docker:

docker run -i registry.gitlab.com/tozd/go/fun/branch/main:latest --help

The above command runs the latest development version (main branch). See releases page for a Docker image for the latest stable version.

Example

If you have a large JSON file with the following structure:

{
  "exercises": [
    {
      "serial": 1,
      "text": "Ariel was playing basketball. 1 of her shots went in the hoop. 2 of the shots did not go in the hoop. How many shots were there in total?"
    },
    // ...
  ]
}

To create for each exercise a .txt file with filename based on the serial field (e.g., 1.txt) and contents based on the text field, in the data output directory, you could run:

fun extract --input exercises.json --output data --out=.txt 'exercises.#.{id:serial,data:text}'

To solve all exercises, you can then run:

export ANTHROPIC_API_KEY='...'
echo "You MUST output only final number, nothing more." > prompt.txt
fun call --input data --output results --provider anthropic --model claude-3-haiku-20240307 --in .txt --out .txt --prompt prompt.txt

For the data/1.txt input file you should now get results/1.txt output file with contents 3.

The issue is that sadly the function might sometimes output more than just the number. We can detect those cases using JSON Schema to validate outputs. We can use a JSON Schema to validate that the output is an integer. We will see warnings in cases when outputs do not validate and corresponding output files will not be created.

echo '{"type": "integer"}' > schema.json
fun call --input data --output results --provider anthropic --model claude-3-haiku-20240307 --in .txt --out .txt --prompt prompt.txt --output-schema schema.json

We can also use a JSON Schema to validate that the output is a string matching a regex:

echo '{"type": "string", "pattern": "^[0-9]+$"}' > schema.json
fun call --input data --output results --provider anthropic --model claude-3-haiku-20240307 --in .txt --out .txt --prompt prompt.txt --output-schema schema.json

GitHub mirror

There is also a read-only GitHub mirror available, if you need to fork the project there.

Acknowledgements

The project gratefully acknowledge the HPC RIVR consortium and EuroHPC JU for funding this project by providing computing resources of the HPC system Vega at the Institute of Information Science.

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Commission. Neither the European Union nor the granting authority can be held responsible for them. Funded within the framework of the NGI Search project under grant agreement No 101069364.

Funded by the European Union emblem

NGI Search logo