Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Global (and local) encoding_model option is being ignored #1499

Open
3 tasks done
nreinartz opened this issue Dec 11, 2024 · 1 comment
Open
3 tasks done

[Bug]: Global (and local) encoding_model option is being ignored #1499

nreinartz opened this issue Dec 11, 2024 · 1 comment
Labels
awaiting_response Maintainers or community have suggested solutions or requested info, awaiting filer response bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer

Comments

@nreinartz
Copy link
Contributor

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

The global encoding_model option is ignored when constructing the LLMParameters for the root llm and all sub llm options. In fact this option is even ignored if it is set explicitly for each section. This leads to the wrong tokenizer being used while indexing.

Possible fix: The create_graphrag_config function needs to set all encoding_model (local > global > default) options when initializing LLMParameters models.

Steps to reproduce

Init a new project via the latest (0.9.0) graphrag version, edit the global encoding_model option in the settings.yaml, run the index command. Have a look at the config dump in the console. All (except the global) encoding_model options will have the default "cl100k_base" entry.

Expected Behavior

The global option should overwrite the default one if set.

GraphRAG Config Used

encoding_model: o200k_base

...

Logs and screenshots

No response

Additional Information

  • GraphRAG Version: 0.9.0
  • Operating System: Windows 10, Ubuntu 24.04
  • Python Version: 3.12
  • Related Issues: None
@nreinartz nreinartz added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Dec 11, 2024
@natoverse
Copy link
Collaborator

We believe this was a bug introducing during our adoption of fnllm as the underlying LLM library. We just pushed out a 1.0.1 patch today, please let if know if your problem still exists with that version.

@natoverse natoverse added the awaiting_response Maintainers or community have suggested solutions or requested info, awaiting filer response label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting_response Maintainers or community have suggested solutions or requested info, awaiting filer response bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer
Projects
None yet
Development

No branches or pull requests

2 participants