-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support rate limiting #37
Comments
We'd need to investigate if we can catch those in the AI backend implementation. It looks like those would need to be implemented in https://github.com/simonw/llm directly and then we could catch the "llm" package's exceptions, or if there's an HTTP response returned, we could use the status code to figure that out. We can't guarantee that our local environment will have all the optional dependencies installed. Another way might be something like this which is still not ideal, but a good trade-off if the user experience matters. def get_rate_limitting_exceptions() -> Generator[Exception, None, None]:
try:
import openai
except ImportError:
pass
else:
yield openai.RateLimitError
try:
import another_package
except ImportError:
pass
else:
yield another_package.RateLimitException
def handle(prompt, context):
try:
backend.prompt_with_context(prompt, context)
except Exception as e:
rate_limit_exception_classes = tuple(get_rate_limitting_exceptions())
if rate_limit_exception_classes and isinstance(e, rate_limit_exception_classes):
raise WagtailAiRateLimitError from e
raise |
I'm the maintainer of LiteLLM we provide an Open source proxy for load balancing Azure + OpenAI + Any LiteLLM supported LLM From this thread it looks like you're trying to handle rate limits + load balance between OpenAI instance - I hope our solution makes it easier for you. (i'd love feedback if you're trying to do this) Here's the quick start:Doc: https://docs.litellm.ai/docs/simple_proxy#load-balancing---multiple-instances-of-1-model Step 1 Create a Config.yamlmodel_list:
- model_name: gpt-4
litellm_params:
model: azure/chatgpt-v-2
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
api_version: "2023-05-15"
api_key:
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_key:
api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_key:
api_base: https://openai-gpt-4-test-v-2.openai.azure.com/ Step 2: Start the litellm proxy:
Step3 Make Request to LiteLLM proxy:
|
Add support for raising a custom Wagtail AI rate limit exceptions.
I'm not aware of any existing support for rate limiting within wagtail and unsure what library would be preferable to use here, so I can't suggest an approach, however, I'd imagine we'd want to support limiting not only requests but also tokens per a user account. Allowing developers to configure the package so that individual editors activity doesn't effect one another i.e. editor 1 reaching the usage limit for the whole organisation account, thus preventing editor 2 from using AI tools.
The text was updated successfully, but these errors were encountered: