[Issue]: Are rate limiting parameters observed? #1500

simra · 2024-12-11T19:18:06Z

Do you need to file an issue?

I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the issue

I am running 0.6.0 and encountering throttling issues with Azure OpenAI.
The rate limiter seems to default to 0 seconds and tries to parse the recommendation from the error message, but always returns 0:

graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 10/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True

I don't see anywhere where the max_retry_wait parameter is used or any logic to implement smart backoff. Looking at 0.9.0 it seems like the rate-limiting LLM class was removed. How is the current system handling throttling?

Steps to reproduce

Run indexing on a throttled AOAI endpoint.

GraphRAG Config Used

# Paste your config here
### This config file contains required core defaults that must be set, along with a handful of common optional settings.
### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/

### LLM settings ###
## There are a number of settings to tune the threading and token limits for LLM calls - check the docs.

encoding_model: cl100k_base # this needs to be matched to your model!

llm:
  # api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file
  type: azure_openai_chat # openai_chat # or 
  #model: gpt-4o  #-turbo-preview
  model: gpt-35-turbo
  model_supports_json: false # recommended if this is available for your model.
  audience: "https://cognitiveservices.azure.com/.default"
  api_base: <redacted>
  api_version: 2024-08-01-preview
  # organization: <organization_id>
  #deployment_name: gpt-4o
  deployment_name: gpt-35-turbo

parallelization:
  stagger: 0.3
  # num_threads: 50

async_mode: threaded # or asyncio

embeddings:
  async_mode: threaded # or asyncio
  vector_store:
    type: lancedb
    db_uri: 'output/lancedb'
    container_name: default
    overwrite: true
  llm:
    # api_key: ${GRAPHRAG_API_KEY}
    type: azure_openai_embedding # openai_embedding # or 
    model: text-embedding-3-small
    api_base: <redacted>
    api_version: "2023-05-15"  # 2024-02-15-preview
    audience: "https://cognitiveservices.azure.com/.default"
    deployment_name: text-embedding-3-small

### Input settings ###

input:
  type: file # or blob
  file_type: text # or csv
  base_dir: "input"
  file_encoding: utf-8
  file_pattern: ".*\\.txt$"

chunks:
  size: 1200
  overlap: 100
  group_by_columns: [id]

### Storage settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided

cache:
  type: file # or blob
  base_dir: "cache"

reporting:
  type: file # or console, blob
  base_dir: "logs"

storage:
  type: file # or blob
  base_dir: "output"

## only turn this on if running `graphrag index` with custom settings
## we normally use `graphrag update` with the defaults
update_index_storage:
  # type: file # or blob
  # base_dir: "update_output"

### Workflow settings ###

skip_workflows: []

entity_extraction:
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1

summarize_descriptions:
  prompt: "prompts/summarize_descriptions.txt"
  max_length: 500

claim_extraction:
  enabled: false
  prompt: "prompts/claim_extraction.txt"
  description: "Any claims or facts that could be relevant to information discovery."
  max_gleanings: 1

community_reports:
  prompt: "prompts/community_report.txt"
  max_length: 2000
  max_input_length: 8000

cluster_graph:
  max_cluster_size: 10

embed_graph:
  enabled: true # if true, will generate node2vec embeddings for nodes

umap:
  enabled: true # if true, will generate UMAP embeddings for nodes

snapshots:
  graphml: true
  raw_entities: true
  top_level_nodes: false
  embeddings: true
  transient: false

### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query

local_search:
  prompt: "prompts/local_search_system_prompt.txt"

global_search:
  map_prompt: "prompts/global_search_map_system_prompt.txt"
  reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
  knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"

drift_search:
  prompt: "prompts/drift_search_system_prompt.txt"

Logs and screenshots

log snippet (thousands of entries):

12/11/2024 18:55:42 - INFO - __main__ - 18:52:59,275 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 3/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:52:59,846 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:52:59,847 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 2/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:00,603 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:53:00,604 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 3/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:01,708 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:53:01,709 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 4/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:01,939 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:53:01,940 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 4/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,87 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,88 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 4/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,129 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,130 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 4/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,146 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,147 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 4/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,256 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,257 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 4/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,357 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,358 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 3/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,373 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,374 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 4/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,644 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,644 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 4/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,660 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,661 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 4/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,797 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,798 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 4/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,809 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,810 graphrag.llm.base.rate_limiting_llm WARNING extract-continuation-0 failed to invoke LLM 4/10 attempts. Cause: rate limit exceeded, will retry. Recommended sleep for 0 seconds. Follow recommendation? True
12/11/2024 18:55:42 - INFO - __main__ - 18:53:02,908 httpx INFO HTTP Request: POST https://pai-aoai.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2024-08-01-preview "HTTP/1.1 429 Too Many Requests"

After too many retries:

12/11/2024 18:55:42 - INFO - __main__ - File "/azureml-envs/azureml_d8bf7ce4d1728bb74a5a480413947462/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 151, in do_attempt
12/11/2024 18:55:42 - INFO - __main__ - await sleep_for(sleep_time)
12/11/2024 18:55:42 - INFO - __main__ - File "/azureml-envs/azureml_d8bf7ce4d1728bb74a5a480413947462/lib/python3.10/site-packages/graphrag/llm/base/rate_limiting_llm.py", line 147, in do_attempt
12/11/2024 18:55:42 - INFO - __main__ - return await self._delegate(input, **kwargs)
12/11/2024 18:55:42 - INFO - __main__ - File "/azureml-envs/azureml_d8bf7ce4d1728bb74a5a480413947462/lib/python3.10/site-packages/graphrag/llm/base/base_llm.py", line 50, in __call__
12/11/2024 18:55:42 - INFO - __main__ - return await self._invoke(input, **kwargs)
12/11/2024 18:55:42 - INFO - __main__ - File "/azureml-envs/azureml_d8bf7ce4d1728bb74a5a480413947462/lib/python3.10/site-packages/graphrag/llm/base/base_llm.py", line 54, in _invoke
12/11/2024 18:55:42 - INFO - __main__ - output = await self._execute_llm(input, **kwargs)
12/11/2024 18:55:42 - INFO - __main__ - File "/azureml-envs/azureml_d8bf7ce4d1728bb74a5a480413947462/lib/python3.10/site-packages/graphrag/llm/openai/openai_chat_llm.py", line 53, in _execute_llm
12/11/2024 18:55:42 - INFO - __main__ - completion = await self.client.chat.completions.create(
12/11/2024 18:55:42 - INFO - __main__ - File "/azureml-envs/azureml_d8bf7ce4d1728bb74a5a480413947462/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 1661, in create
12/11/2024 18:55:42 - INFO - __main__ - return await self._post(
12/11/2024 18:55:42 - INFO - __main__ - File "/azureml-envs/azureml_d8bf7ce4d1728bb74a5a480413947462/lib/python3.10/site-packages/openai/_base_client.py", line 1843, in post
12/11/2024 18:55:42 - INFO - __main__ - return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
12/11/2024 18:55:42 - INFO - __main__ - File "/azureml-envs/azureml_d8bf7ce4d1728bb74a5a480413947462/lib/python3.10/site-packages/openai/_base_client.py", line 1537, in request
12/11/2024 18:55:42 - INFO - __main__ - return await self._request(
12/11/2024 18:55:42 - INFO - __main__ - File "/azureml-envs/azureml_d8bf7ce4d1728bb74a5a480413947462/lib/python3.10/site-packages/openai/_base_client.py", line 1638, in _request
12/11/2024 18:55:42 - INFO - __main__ - raise self._make_status_error_from_response(err.response) from None
12/11/2024 18:55:42 - INFO - __main__ - openai.RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Rate limit is exceeded. Try again in 25 seconds.'}}
12/11/2024 18:55:42 - INFO - __main__ - 18:53:03,128 graphrag.callbacks.file_workflow_callbacks INFO Entity Extraction Error

Additional Information

GraphRAG Version: 0.6.0
Operating System: linux
Python Version: 3.10
Related Issues:

The text was updated successfully, but these errors were encountered:

simra · 2024-12-12T17:29:28Z

As an update I migrated to 0.9.0 and still see lots of 429 errors but the retry wait time is no longer logged, so I'm not sure what the client is doing on a 429 error. I do see some examples of retry failures reaching max_retries and the exception is logged.

12/11/2024 19:52:16 - INFO - __main__ - 19:34:45,78 graphrag.index.graph.extractors.graph.graph_extractor ERROR error extracting graph
12/11/2024 19:52:16 - INFO - __main__ - Traceback (most recent call last):
12/11/2024 19:52:16 - INFO - __main__ - File "/azureml-envs/azureml_f2ccdda275a8f9f1edd283b03e88e55a/lib/python3.10/site-packages/graphrag/index/graph/extractors/graph/graph_extractor.py", line 127, in __call__
12/11/2024 19:52:16 - INFO - __main__ - result = await self._process_document(text, prompt_variables)
12/11/2024 19:52:16 - INFO - __main__ - File "/azureml-envs/azureml_f2ccdda275a8f9f1edd283b03e88e55a/lib/python3.10/site-packages/graphrag/index/graph/extractors/graph/graph_extractor.py", line 165, in _process_document
12/11/2024 19:52:16 - INFO - __main__ - response = await self._llm(
12/11/2024 19:52:16 - INFO - __main__ - File "/azureml-envs/azureml_f2ccdda275a8f9f1edd283b03e88e55a/lib/python3.10/site-packages/fnllm/openai/llm/chat.py", line 83, in __call__
12/11/2024 19:52:16 - INFO - __main__ - return await self._text_chat_llm(prompt, **kwargs)
12/11/2024 19:52:16 - INFO - __main__ - File "/azureml-envs/azureml_f2ccdda275a8f9f1edd283b03e88e55a/lib/python3.10/site-packages/fnllm/openai/llm/features/tools_parsing.py", line 120, in __call__
12/11/2024 19:52:16 - INFO - __main__ - return await self._delegate(prompt, **kwargs)
12/11/2024 19:52:16 - INFO - __main__ - File "/azureml-envs/azureml_f2ccdda275a8f9f1edd283b03e88e55a/lib/python3.10/site-packages/fnllm/base/base.py", line 112, in __call__
12/11/2024 19:52:16 - INFO - __main__ - return await self._invoke(prompt, **kwargs)
12/11/2024 19:52:16 - INFO - __main__ - File "/azureml-envs/azureml_f2ccdda275a8f9f1edd283b03e88e55a/lib/python3.10/site-packages/fnllm/base/base.py", line 128, in _invoke
12/11/2024 19:52:16 - INFO - __main__ - return await self._decorated_target(prompt, **kwargs)
12/11/2024 19:52:16 - INFO - __main__ - File "/azureml-envs/azureml_f2ccdda275a8f9f1edd283b03e88e55a/lib/python3.10/site-packages/fnllm/services/json.py", line 71, in invoke
12/11/2024 19:52:16 - INFO - __main__ - return await delegate(prompt, **kwargs)
12/11/2024 19:52:16 - INFO - __main__ - File "/azureml-envs/azureml_f2ccdda275a8f9f1edd283b03e88e55a/lib/python3.10/site-packages/fnllm/services/retryer.py", line 109, in invoke
12/11/2024 19:52:16 - INFO - __main__ - result = await execute_with_retry()
12/11/2024 19:52:16 - INFO - __main__ - File "/azureml-envs/azureml_f2ccdda275a8f9f1edd283b03e88e55a/lib/python3.10/site-packages/fnllm/services/retryer.py", line 106, in execute_with_retry
12/11/2024 19:52:16 - INFO - __main__ - raise RetriesExhaustedError(name, self._max_retries)
12/11/2024 19:52:16 - INFO - __main__ - fnllm.services.errors.RetriesExhaustedError: Operation 'extract-continuation-0' failed - 10 retries exhausted.

natoverse · 2024-12-18T23:55:07Z

We believe this was a bug introducing during our adoption of fnllm as the underlying LLM library. We just pushed out a 1.0.1 patch today, please let if know if your problem still exists with that version.

simra added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Dec 11, 2024

natoverse added the awaiting_response Maintainers or community have suggested solutions or requested info, awaiting filer response label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: Are rate limiting parameters observed? #1500

[Issue]: Are rate limiting parameters observed? #1500

simra commented Dec 11, 2024

simra commented Dec 12, 2024

natoverse commented Dec 18, 2024

[Issue]: Are rate limiting parameters observed? #1500

[Issue]: Are rate limiting parameters observed? #1500

Comments

simra commented Dec 11, 2024

Do you need to file an issue?

Describe the issue

Steps to reproduce

GraphRAG Config Used

Logs and screenshots

Additional Information

simra commented Dec 12, 2024

natoverse commented Dec 18, 2024