Token count error in semantic functions #39

LuisM000 · 2024-01-09T07:44:35Z

I'm using the GetSemanticFunctionUsedTokensAsync function to calculate tokens for a prompt. The issue I'm encountering is that while it accurately counts the tokens generated by the prompt, it doesn't capture the total tokens generated when invoking a semantic function. This discrepancy may potentially lead to exceeding the maximum token limit.

In the version used by Enmarcha, Semantic Kernel (1.0.0-beta8), when executing a semantic function and generating the call with the OpenAI SDK, two messages are generated (https://github.com/microsoft/semantic-kernel/blob/dotnet-1.0.0-beta8/dotnet/src/Connectors/Connectors.AI.OpenAI/AzureSdk/ClientBase.cs#L317):

assistant: Assistant is a large language model.
user: Prompt sent

Where the assistant's message will typically be the default message, and the user's message will be the prompt that is sent. This results in not accurately calculating the total number of tokens being sent, potentially causing us to exceed the maximum token limit.
According to the OpenAI documentation (https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb), the calculation for models such as ["gpt-3.5-turbo-0613", "gpt-3.5-turbo-16k-0613", "gpt-4-0314", "gpt-4-32k-0314", "gpt-4-0613", "gpt-4-32k-0613"] would be:

3 extra tokens per message + 3 extra tokens for the output.

In the above scenario, the calculation would be:

assistant: Assistant is a large language model. => 3 tokens for the message + 7 tokens for the content
user: Hello                                     => 3 tokens for the message + 1 token for the content
                                                => 3 extra output tokens
                                                => Total: 17 tokens

In tests conducted with a prompt containing the text 'Hello' the token usage returned by the request is 19, indicating a 2-token difference from the above calculation.
With an empty prompt, according to the OpenAI documentation, the result should be 16 tokens. In tests, the result is 18 tokens.
The test results data obtained is what the OpenAI client returns in the response.

Proposed Solution

While it may not be the optimal solution, to ensure accuracy, we could consider always adding 25 tokens (as a parameter with a default value) to the GetSemanticFunctionUsedTokensAsync function.

Considerations

In the current version of Semantic Kernel (v1.0.1 https://github.com/microsoft/semantic-kernel/tree/dotnet-1.0.1), this behavior has changed, and only a system message is sent when executing a semantic function.
This applies only when using the Chat Completion service as Text Completion Service with OpenAI.
It's important to note that this modification is specific to OpenAI and does not affect other connectors like Hugging Face.

The text was updated successfully, but these errors were encountered:

LuisM000 added the bug Something isn't working label Jan 9, 2024

rliberoff linked a pull request Jan 17, 2024 that will close this issue

@rliberoff/update semantic kernel ~1.0.1~ 1.1.0 #43

Merged

rliberoff closed this as completed in #43 Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token count error in semantic functions #39

Token count error in semantic functions #39

LuisM000 commented Jan 9, 2024

Token count error in semantic functions #39

Token count error in semantic functions #39

Comments

LuisM000 commented Jan 9, 2024

Proposed Solution

Considerations