Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Total token usage and latency metrics should be reflected in TaskResult and Response #4719

Open
ekzhu opened this issue Dec 16, 2024 · 7 comments
Assignees
Milestone

Comments

@ekzhu
Copy link
Collaborator

ekzhu commented Dec 16, 2024

The current autogen_agentchat.base.TaskResult and autogen_agentchat.base.Response should contain the following additional fields:

  1. a total token usage field using the autogen_core.models.RequestUsage type.
  2. total latency in seconds

Console is currently using agents' inner messages to keep track of the total token usage -- this is inaccurate. As agents may not emit inner messages, and SelectorGroupChat's model client usage is not reflected. We need to fix these as well.

Related: #4172

@ekzhu ekzhu added this to the 0.4.0 milestone Dec 16, 2024
@ekzhu ekzhu assigned ekzhu and gziz and unassigned ekzhu Dec 16, 2024
@gziz
Copy link
Contributor

gziz commented Dec 18, 2024

I see the first part as extending both TaskResult and Response to include the RequestUsage and latency. Since various teams use TaskResult and Response, I’m assuming we would need to modify each of them to add the usage information to the RequestUsage field of TaskResult and Response. Is that correct?

Then there’s the SelectorGroupChat problem, where we are not capturing the tokens spent to select the next speaker. We should find a way to fix that.

However, what do you mean by the Console using the agent’s inner messages to keep track of total token usage? I have run the Console with a couple of teams, and it seems like it’s not only keeping track of inner messages. I could debug this further, but I’m wondering what am I missing?

@ekzhu
Copy link
Collaborator Author

ekzhu commented Dec 18, 2024

I see the first part as extending both TaskResult and Response to include the RequestUsage and latency. Since various teams use TaskResult and Response, I’m assuming we would need to modify each of them to add the usage information to the RequestUsage field of TaskResult and Response. Is that correct?

Yes

However, what do you mean by the Console using the agent’s inner messages to keep track of total token usage? I have run the Console with a couple of teams, and it seems like it’s not only keeping track of inner messages. I could debug this further, but I’m wondering what am I missing?

You can take a look at this example in the doc: https://microsoft.github.io/autogen/dev/user-guide/agentchat-user-guide/tutorial/custom-agents.html#arithmeticagent

The total token usage reported is 0, which is false, because the selector group chat uses model client to choose the next agent.

@gziz
Copy link
Contributor

gziz commented Dec 18, 2024

Thanks.
In that case, is the problem you are referring about Console only occurs when it's used with SelectorGroupChat?

@husseinmozannar
Copy link
Contributor

Note the model usage of the M1 orchestrator is also not tracked.

@ekzhu
Copy link
Collaborator Author

ekzhu commented Dec 18, 2024

Thanks. In that case, is the problem you are referring about Console only occurs when it's used with SelectorGroupChat?

Right now, anytime when the inner messages that incur token usage are not emitted, it is not tracked. We would like to make sure even when no inner message is emitted, we can still track the token usage by using the Response.

@gziz
Copy link
Contributor

gziz commented Dec 20, 2024

Hi @ekzhu ,

We would like to make sure even when no inner message is emitted, we can still track the token usage by using the Response

Hmm, I couldn't find a way to do this using Response.
Here's the flow I noticed:

  1. SelectorGroupChat.select_speaker makes the client call which costs tokens.
  2. SelectorGroupChat.select_speaker returns the next speaker as a string inside the function BaseGroupChatManager.handle_agent_response (code)). The way that handle_agent_response communicates who's next is by publishing a GroupChatRequestPublish message to the next speaker (code).

Recap: If the tokens usage is happening because of the model.client(...), called inside SelectorGroupChat.select_speaker, which is called inside BaseGroupChatManager.handle_agent_response but this last function doesn't return a Message or Response in any way to anyone but rather just communicates by using self.publish_message(GroupChatRequestPublish, ...), I cannot find a way to use Response to share the token usage.

I have created a PR with a proposal where we can discuss some alternatives. Alternatively, can you help me see how to use Response to share the token usage? Thanks!!

Additionally, I noticed that the ultimate source of truth of token usage would be the model_client, every time we make a call to the LLM it goes this model_client. However, I also noticed that currently the way we are tracking the tokens is not working, see this PR to fix it.

@ekzhu
Copy link
Collaborator Author

ekzhu commented Dec 22, 2024

I think you can create a new internal event type GroupChatSelectSpeaker event type, which contains the token usage information.

In select_speaker method of SelectorGroupChatManager, you can publish this event after making the model inference to the self._output_topic_type

response = await self._model_client.create(messages=select_speaker_messages)
assert isinstance(response.content, str)

In BaseGroupChat, in the collect_output_messages closure, you can add GroupChatSelectSpeaker as a message type this closure listens to and accumulate usage inside the closure via a private member variable of BaseGroupChat class, and reset this variable after each run, just like the termination condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants