Support rate limiting #37

Morsey187 · 2023-11-29T14:26:09Z

Add support for raising a custom Wagtail AI rate limit exceptions.

I'm not aware of any existing support for rate limiting within wagtail and unsure what library would be preferable to use here, so I can't suggest an approach, however, I'd imagine we'd want to support limiting not only requests but also tokens per a user account. Allowing developers to configure the package so that individual editors activity doesn't effect one another i.e. editor 1 reaching the usage limit for the whole organisation account, thus preventing editor 2 from using AI tools.

tm-kn · 2023-11-29T14:54:10Z

We'd need to investigate if we can catch those in the AI backend implementation.

It looks like those would need to be implemented in https://github.com/simonw/llm directly and then we could catch the "llm" package's exceptions, or if there's an HTTP response returned, we could use the status code to figure that out.

We can't guarantee that our local environment will have all the optional dependencies installed.

Another way might be something like this which is still not ideal, but a good trade-off if the user experience matters.

def get_rate_limitting_exceptions() -> Generator[Exception, None, None]:
    try:
        import openai
    except ImportError:
        pass
    else:
        yield openai.RateLimitError

    try:
        import another_package
    except ImportError:
        pass
    else:
        yield another_package.RateLimitException

def handle(prompt, context):
    try:
        backend.prompt_with_context(prompt, context)
    except Exception as e:
        rate_limit_exception_classes = tuple(get_rate_limitting_exceptions())
        if rate_limit_exception_classes and isinstance(e, rate_limit_exception_classes):
            raise WagtailAiRateLimitError from e
        raise

ishaan-jaff · 2023-11-29T18:12:48Z

@Morsey187 @tm-kn

I'm the maintainer of LiteLLM we provide an Open source proxy for load balancing Azure + OpenAI + Any LiteLLM supported LLM
It can process (500+ requests/second)

From this thread it looks like you're trying to handle rate limits + load balance between OpenAI instance - I hope our solution makes it easier for you. (i'd love feedback if you're trying to do this)

Here's the quick start:

Doc: https://docs.litellm.ai/docs/simple_proxy#load-balancing---multiple-instances-of-1-model

Step 1 Create a Config.yaml

model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_version: "2023-05-15"
      api_key: 
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/

Step 2: Start the litellm proxy:

litellm --config /path/to/config.yaml

Step3 Make Request to LiteLLM proxy:

curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "gpt-4",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ],
    }
'

Morsey187 mentioned this issue Nov 29, 2023

Revise Draftail integration UI #35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support rate limiting #37

Support rate limiting #37

Morsey187 commented Nov 29, 2023

tm-kn commented Nov 29, 2023 •

edited

Loading

ishaan-jaff commented Nov 29, 2023

Support rate limiting #37

Support rate limiting #37

Comments

Morsey187 commented Nov 29, 2023

tm-kn commented Nov 29, 2023 • edited Loading

ishaan-jaff commented Nov 29, 2023

Here's the quick start:

Step 1 Create a Config.yaml

Step 2: Start the litellm proxy:

Step3 Make Request to LiteLLM proxy:

tm-kn commented Nov 29, 2023 •

edited

Loading