Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token count error in semantic functions #39

Closed
LuisM000 opened this issue Jan 9, 2024 · 0 comments · Fixed by #43
Closed

Token count error in semantic functions #39

LuisM000 opened this issue Jan 9, 2024 · 0 comments · Fixed by #43
Labels
bug Something isn't working

Comments

@LuisM000
Copy link
Contributor

LuisM000 commented Jan 9, 2024

I'm using the GetSemanticFunctionUsedTokensAsync function to calculate tokens for a prompt. The issue I'm encountering is that while it accurately counts the tokens generated by the prompt, it doesn't capture the total tokens generated when invoking a semantic function. This discrepancy may potentially lead to exceeding the maximum token limit.

In the version used by Enmarcha, Semantic Kernel (1.0.0-beta8), when executing a semantic function and generating the call with the OpenAI SDK, two messages are generated (https://github.com/microsoft/semantic-kernel/blob/dotnet-1.0.0-beta8/dotnet/src/Connectors/Connectors.AI.OpenAI/AzureSdk/ClientBase.cs#L317):

assistant: Assistant is a large language model.
user: Prompt sent

Where the assistant's message will typically be the default message, and the user's message will be the prompt that is sent. This results in not accurately calculating the total number of tokens being sent, potentially causing us to exceed the maximum token limit.
According to the OpenAI documentation (https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb), the calculation for models such as ["gpt-3.5-turbo-0613", "gpt-3.5-turbo-16k-0613", "gpt-4-0314", "gpt-4-32k-0314", "gpt-4-0613", "gpt-4-32k-0613"] would be:

3 extra tokens per message + 3 extra tokens for the output.

In the above scenario, the calculation would be:

assistant: Assistant is a large language model. => 3 tokens for the message + 7 tokens for the content
user: Hello                                     => 3 tokens for the message + 1 token for the content
                                                => 3 extra output tokens
                                                => Total: 17 tokens

In tests conducted with a prompt containing the text 'Hello' the token usage returned by the request is 19, indicating a 2-token difference from the above calculation.
With an empty prompt, according to the OpenAI documentation, the result should be 16 tokens. In tests, the result is 18 tokens.
The test results data obtained is what the OpenAI client returns in the response.

Proposed Solution

While it may not be the optimal solution, to ensure accuracy, we could consider always adding 25 tokens (as a parameter with a default value) to the GetSemanticFunctionUsedTokensAsync function.

Considerations

  • In the current version of Semantic Kernel (v1.0.1 https://github.com/microsoft/semantic-kernel/tree/dotnet-1.0.1), this behavior has changed, and only a system message is sent when executing a semantic function.
  • This applies only when using the Chat Completion service as Text Completion Service with OpenAI.
  • It's important to note that this modification is specific to OpenAI and does not affect other connectors like Hugging Face.
@LuisM000 LuisM000 added the bug Something isn't working label Jan 9, 2024
@rliberoff rliberoff linked a pull request Jan 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant