You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
After training a text can normally query the entity records of the text, but after training a new text the entity records of the previous text can not be queried, is it because the training text parquet is overwritten?
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
### This config file contains required core defaults that must be set, along with a handful of common optional settings.### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/### LLM settings ##### There are a number of settings to tune the threading and token limits for LLM calls - check the docs.encoding_model: cl100k_base # this needs to be matched to your model!llm:
# exllamav2api_key: xxxtype: openai_chat # or azure_openai_chatmodel: Rombos-Coder-V2.5-Qwen-32b-exl2_5.0bpwmodel_supports_json: true # recommended if this is available for your model.max_tokens: 16000api_base: http://127.0.0.1:5001/v1requests_per_minute: 5_000# set a leaky bucket throttlemax_retries: 10max_retry_wait: 0# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times# concurrent_requests: 25 # the number of parallel inflight requests that may be madetemperature: 0.5# temperature for sampling# top_p: 1 # top-p sampling# n: 1 # Number of completions to generate# audience: "https://cognitiveservices.azure.com/.default"# api_base: https://<instance>.openai.azure.com# api_version: 2024-02-15-preview# organization: <organization_id># deployment_name: <azure_model_deployment_name>parallelization:
stagger: 0.3# num_threads: 50async_mode: asyncio # or asyncioembeddings:
async_mode: asyncio # or asynciovector_store:
type: lancedbdb_uri: 'output\lancedb'container_name: defaultoverwrite: truellm:
api_key: ${GRAPHRAG_API_KEY}type: openai_embedding # or azure_openai_embeddingmodel: bge-m3:Q4api_base: http://localhost:11434/v1max_tokens: 8192# api_version: 2024-02-15-preview# audience: "https://cognitiveservices.azure.com/.default"# organization: <organization_id># deployment_name: <azure_model_deployment_name>### Input settings ###input:
type: file # or blobfile_type: text # or csvbase_dir: "input"file_encoding: utf-8file_pattern: ".*\\.txt$"chunks:
size: 1200overlap: 100group_by_columns: [id]### Storage settings ##### If blob storage is specified in the following four sections,## connection_string and container_name must be providedcache:
type: file # or blobbase_dir: "cache"reporting:
type: file # or console, blobbase_dir: "input/reports"storage:
type: file # or blobbase_dir: "input/artifacts"## only turn this on if running `graphrag index` with custom settings## we normally use `graphrag update` with the defaultsupdate_index_storage:
# type: file # or blob# base_dir: "update_output"### Workflow settings ###skip_workflows: []entity_extraction:
prompt: "prompts/entity_extraction.txt"entity_types: [organization,person,geo,event]max_gleanings: 1summarize_descriptions:
prompt: "prompts/summarize_descriptions.txt"max_length: 500claim_extraction:
enabled: trueprompt: "prompts/claim_extraction.txt"description: "Any claims or facts that could be relevant to information discovery."max_gleanings: 1community_reports:
prompt: "prompts/community_report.txt"max_length: 2000max_input_length: 8000cluster_graph:
max_cluster_size: 10embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodesumap:
enabled: false # if true, will generate UMAP embeddings for nodessnapshots:
graphml: falseembeddings: falsetransient: false### Query settings ##### The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#querylocal_search:
prompt: "prompts/local_search_system_prompt.txt"global_search:
map_prompt: "prompts/global_search_map_system_prompt.txt"reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"drift_search:
prompt: "prompts/drift_search_system_prompt.txt"
Logs and screenshots
No response
Additional Information
GraphRAG Version:1.0
Operating System:
Python Version:
Related Issues:
The text was updated successfully, but these errors were encountered:
xldistance
added
bug
Something isn't working
triage
Default label assignment, indicates new issue needs reviewed by a maintainer
labels
Dec 15, 2024
global_search only uses the files create_final_communities.parquet, create_final_community_reports.parquet, and create_final_entities.parquet, and each time the training text Each time the text is trained, these files are overwritten, resulting in the loss of the previously trained text data.
Do you need to file an issue?
Describe the bug
After training a text can normally query the entity records of the text, but after training a new text the entity records of the previous text can not be queried, is it because the training text parquet is overwritten?
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
Logs and screenshots
No response
Additional Information
The text was updated successfully, but these errors were encountered: