Skip to content

Commit

Permalink
Merge pull request #30 from Encamina/@lmarcos/issue-27#-token-countin…
Browse files Browse the repository at this point in the history
…g-functions-are-slow

@lmarcos/issue 27# token counting functions are slow
  • Loading branch information
LuisM000 authored Dec 14, 2023
2 parents a6f0d68 + ae06cbf commit 8393567
Show file tree
Hide file tree
Showing 3 changed files with 77 additions and 4 deletions.
48 changes: 47 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,56 @@ Each version and revision is followed by a date of change, specially for under d
- **Major Changes**: big improvements in the code, like adding or enabling features, or bug fixes.
- **Minor Changes**: small changes that have little impact, like spell checks in an API's documentation, adding or removing comments, etc.

Also, any bug fix must start with the prefix «Bug fix:» followed by the description of the changes _per se_.
Also, any bug fix must start with the prefix Bug fix: followed by the description of the changes _per se_.

Previous classification is not required if changes are simple or all belong to the same category.

## [8.0.1]

### **Major Changes**
- In `Encamina.Enmarcha.SemanticKernel.Abstractions.ILengthFunctions`, `GptEncoding` is now cached and reused to improve performance. [(#30)](https://github.com/Encamina/enmarcha/pull/30)

### Minor Changes
- Changes from version 8.0.0 have been added to the `CHANGELOG.md` file.

## [8.0.0]

### Major Changes

- Changed .NET version from 6 to 8, therefore closes issue `Everything ready for ENMARCHA 8.0.0 #7`.
- Updated the following .NET libraries to their newest version (8.0.0):
- Microsoft.AspNetCore.Authentication.JwtBearer
- Microsoft.AspNetCore.Authentication.OpenIdConnect
- Microsoft.EntityFrameworkCore
- Microsoft.EntityFrameworkCore.SqlServer
- Microsoft.Extensions.Caching.Abstractions
- Microsoft.Extensions.Configuration.Abstractions
- Microsoft.Extensions.DependencyInjection.Abstractions
- Microsoft.Extensions.Hosting
- Microsoft.Extensions.Http
- Microsoft.Extensions.Logging.Abstractions
- Microsoft.Extensions.Options
- Microsoft.Extensions.Options.ConfigurationExtensions
- Microsoft.Extensions.Options.DataAnnotations
- System.Net.Http.Json
- System.Text.Json
- Updated library Azure.Data.Tables from 12.8.1 to 12.8.2.
- Updated library Microsoft.Azure.Cosmos from 3.36.0 to 3.37.0.
- Updated Bot Framework related libraries from version 4.21.1 to 4.21.2. These libraries are:
- Microsoft.Bot.Builder.Azure
- Microsoft.Bot.Builder.Azure.Blobs
- Microsoft.Bot.Builder.Dialogs
- Microsoft.Bot.Builder.Integration.ApplicationInsights.Core
- Microsoft.Bot.Builder.Integration.AspNet.Core
- Updated library Moq from 4.20.69 to 4.20.70.
- Updated library xunit from 2.6.1 to 2.6.2.
- Updated library xunit.analyzers from 1.5.0 to 1.6.0.
- Updated library xunit.extensibility.core from 2.6.1 to 2.6.2.
- Updated library xunit.runner.visualstudio from 2.5.3 to 2.5.4.

## Minor Changes
- Some minor tweaks.

## [6.0.4]

### Important
Expand Down
2 changes: 1 addition & 1 deletion Directory.Build.props
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
</PropertyGroup>

<PropertyGroup>
<VersionPrefix>8.0.0</VersionPrefix>
<VersionPrefix>8.0.1</VersionPrefix>
<VersionSuffix></VersionSuffix>
</PropertyGroup>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,45 @@ namespace Encamina.Enmarcha.SemanticKernel.Abstractions;
/// <inheritdoc/>
public interface ILengthFunctions : AI.Abstractions.ILengthFunctions
{
/// <summary>
/// Gets the default <see cref="GptEncoding">encoding</see> for models like `GPT-3.5-Turbo` and `GPT-4` from OpenAI.
/// </summary>
public static readonly GptEncoding DefaultGptEncoding = GptEncoding.GetEncoding("cl100k_base");

/// <summary>
/// Dictionary to cache GptEncoding instances based on encoding names.
/// </summary>
private static readonly Dictionary<string, GptEncoding> EncodingCache = [];

Check warning on line 16 in src/Encamina.Enmarcha.SemanticKernel.Abstractions/ILengthFunctions.cs

View workflow job for this annotation

GitHub Actions / CI

Opening square brackets should not be preceded by a space (https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/master/documentation/SA1010.md)

/// <summary>
/// Gets the number of tokens using encodings for models like `GPT-3.5-Turbo` and `GPT-4` from OpenAI on the specified text.
/// If the text is <see langword="null"/> or empty (i.e., <see cref="string.Empty"/>), returns zero (<c>0</c>).
/// </summary>
/// <seealso href="https://platform.openai.com/tokenizer"/>
/// <seealso href="https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb"/>
public static Func<string, int> LengthByTokenCount => (text) => string.IsNullOrEmpty(text) ? 0 : GptEncoding.GetEncoding("cl100k_base").Encode(text).Count;
public static Func<string, int> LengthByTokenCount => (text) => string.IsNullOrEmpty(text) ? 0 : DefaultGptEncoding.Encode(text).Count;

/// <summary>
/// Gets the number of tokens using a given encoding on the specified text.
/// If the text is <see langword="null"/> or empty (i.e., <see cref="string.Empty"/>), returns zero (<c>0</c>).
/// </summary>
/// <seealso href="https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb"/>
public static Func<string, string, int> LengthByTokenCountUsingEncoding => (encoding, text) => string.IsNullOrEmpty(text) ? 0 : GptEncoding.GetEncoding(encoding).Encode(text).Count;
public static Func<string, string, int> LengthByTokenCountUsingEncoding => (encoding, text) => string.IsNullOrEmpty(text) ? 0 : GetCachedEncoding(encoding).Encode(text).Count;

/// <summary>
/// Gets the GptEncoding instance based on the specified encoding name, caching it for future use.
/// </summary>
/// <param name="encoding">The name of the GptEncoding.</param>
/// <returns>The GptEncoding instance.</returns>
private static GptEncoding GetCachedEncoding(string encoding)
{
if (EncodingCache.TryGetValue(encoding, out var gptEncoding))
{
return gptEncoding;
}

gptEncoding = GptEncoding.GetEncoding(encoding);
EncodingCache[encoding] = gptEncoding;
return gptEncoding;
}
}

0 comments on commit 8393567

Please sign in to comment.