Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support rate limiting #37

Open
Morsey187 opened this issue Nov 29, 2023 · 2 comments
Open

Support rate limiting #37

Morsey187 opened this issue Nov 29, 2023 · 2 comments

Comments

@Morsey187
Copy link
Collaborator

Add support for raising a custom Wagtail AI rate limit exceptions.

I'm not aware of any existing support for rate limiting within wagtail and unsure what library would be preferable to use here, so I can't suggest an approach, however, I'd imagine we'd want to support limiting not only requests but also tokens per a user account. Allowing developers to configure the package so that individual editors activity doesn't effect one another i.e. editor 1 reaching the usage limit for the whole organisation account, thus preventing editor 2 from using AI tools.

@tm-kn
Copy link
Member

tm-kn commented Nov 29, 2023

We'd need to investigate if we can catch those in the AI backend implementation.

It looks like those would need to be implemented in https://github.com/simonw/llm directly and then we could catch the "llm" package's exceptions, or if there's an HTTP response returned, we could use the status code to figure that out.

We can't guarantee that our local environment will have all the optional dependencies installed.

Another way might be something like this which is still not ideal, but a good trade-off if the user experience matters.

def get_rate_limitting_exceptions() -> Generator[Exception, None, None]:
    try:
        import openai
    except ImportError:
        pass
    else:
        yield openai.RateLimitError

    try:
        import another_package
    except ImportError:
        pass
    else:
        yield another_package.RateLimitException

def handle(prompt, context):
    try:
        backend.prompt_with_context(prompt, context)
    except Exception as e:
        rate_limit_exception_classes = tuple(get_rate_limitting_exceptions())
        if rate_limit_exception_classes and isinstance(e, rate_limit_exception_classes):
            raise WagtailAiRateLimitError from e
        raise

@ishaan-jaff
Copy link

@Morsey187 @tm-kn

I'm the maintainer of LiteLLM we provide an Open source proxy for load balancing Azure + OpenAI + Any LiteLLM supported LLM
It can process (500+ requests/second)

From this thread it looks like you're trying to handle rate limits + load balance between OpenAI instance - I hope our solution makes it easier for you. (i'd love feedback if you're trying to do this)

Here's the quick start:

Doc: https://docs.litellm.ai/docs/simple_proxy#load-balancing---multiple-instances-of-1-model

Step 1 Create a Config.yaml

model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_version: "2023-05-15"
      api_key: 
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/

Step 2: Start the litellm proxy:

litellm --config /path/to/config.yaml

Step3 Make Request to LiteLLM proxy:

curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "gpt-4",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ],
    }
'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants