Skip to content

Commit

Permalink
Catch token count issue while streaming with customized models
Browse files Browse the repository at this point in the history
If llama, llava, phi, or some other models are used for streaming (with stream=True), the current design would crash after fetching the response.

A warning is enough in this case, just like the non-streaming use cases.
  • Loading branch information
BeibinLi authored Jul 28, 2024
1 parent 61b9e8b commit a40aa56
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion autogen/oai/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,7 +272,12 @@ def create(self, params: Dict[str, Any]) -> ChatCompletion:

# Prepare the final ChatCompletion object based on the accumulated data
model = chunk.model.replace("gpt-35", "gpt-3.5") # hack for Azure API
prompt_tokens = count_token(params["messages"], model)
try:
prompt_tokens = count_token(params["messages"], model)
except Exception as e:
# Catch token calculation error if streaming with customized models.
logger.warning(str(e))
prompt_tokens = 0
response = ChatCompletion(
id=chunk.id,
model=chunk.model,
Expand Down

0 comments on commit a40aa56

Please sign in to comment.