Total token usage and latency metrics should be reflected in `TaskResult` and `Response` #4719

ekzhu · 2024-12-16T07:43:13Z

The current autogen_agentchat.base.TaskResult and autogen_agentchat.base.Response should contain the following additional fields:

a total token usage field using the autogen_core.models.RequestUsage type.
total latency in seconds

Console is currently using agents' inner messages to keep track of the total token usage -- this is inaccurate. As agents may not emit inner messages, and SelectorGroupChat's model client usage is not reflected. We need to fix these as well.

Related: #4172

The text was updated successfully, but these errors were encountered:

gziz · 2024-12-18T04:05:42Z

I see the first part as extending both TaskResult and Response to include the RequestUsage and latency. Since various teams use TaskResult and Response, I’m assuming we would need to modify each of them to add the usage information to the RequestUsage field of TaskResult and Response. Is that correct?

Then there’s the SelectorGroupChat problem, where we are not capturing the tokens spent to select the next speaker. We should find a way to fix that.

However, what do you mean by the Console using the agent’s inner messages to keep track of total token usage? I have run the Console with a couple of teams, and it seems like it’s not only keeping track of inner messages. I could debug this further, but I’m wondering what am I missing?

ekzhu · 2024-12-18T04:55:44Z

I see the first part as extending both TaskResult and Response to include the RequestUsage and latency. Since various teams use TaskResult and Response, I’m assuming we would need to modify each of them to add the usage information to the RequestUsage field of TaskResult and Response. Is that correct?

Yes

However, what do you mean by the Console using the agent’s inner messages to keep track of total token usage? I have run the Console with a couple of teams, and it seems like it’s not only keeping track of inner messages. I could debug this further, but I’m wondering what am I missing?

You can take a look at this example in the doc: https://microsoft.github.io/autogen/dev/user-guide/agentchat-user-guide/tutorial/custom-agents.html#arithmeticagent

The total token usage reported is 0, which is false, because the selector group chat uses model client to choose the next agent.

gziz · 2024-12-18T05:25:06Z

Thanks.
In that case, is the problem you are referring about Console only occurs when it's used with SelectorGroupChat?

husseinmozannar · 2024-12-18T07:37:51Z

Note the model usage of the M1 orchestrator is also not tracked.

ekzhu · 2024-12-18T07:50:06Z

Thanks. In that case, is the problem you are referring about Console only occurs when it's used with SelectorGroupChat?

Right now, anytime when the inner messages that incur token usage are not emitted, it is not tracked. We would like to make sure even when no inner message is emitted, we can still track the token usage by using the Response.

gziz · 2024-12-20T06:57:28Z

Hi @ekzhu ,

We would like to make sure even when no inner message is emitted, we can still track the token usage by using the Response

Hmm, I couldn't find a way to do this using Response.
Here's the flow I noticed:

SelectorGroupChat.select_speaker makes the client call which costs tokens.
SelectorGroupChat.select_speaker returns the next speaker as a string inside the function BaseGroupChatManager.handle_agent_response (code)). The way that handle_agent_response communicates who's next is by publishing a GroupChatRequestPublish message to the next speaker (code).

Recap: If the tokens usage is happening because of the model.client(...), called inside SelectorGroupChat.select_speaker, which is called inside BaseGroupChatManager.handle_agent_response but this last function doesn't return a Message or Response in any way to anyone but rather just communicates by using self.publish_message(GroupChatRequestPublish, ...), I cannot find a way to use Response to share the token usage.

I have created a PR with a proposal where we can discuss some alternatives. Alternatively, can you help me see how to use Response to share the token usage? Thanks!!

Additionally, I noticed that the ultimate source of truth of token usage would be the model_client, every time we make a call to the LLM it goes this model_client. However, I also noticed that currently the way we are tracking the tokens is not working, see this PR to fix it.

ekzhu · 2024-12-22T02:13:27Z

I think you can create a new internal event type GroupChatSelectSpeaker event type, which contains the token usage information.

In select_speaker method of SelectorGroupChatManager, you can publish this event after making the model inference to the self._output_topic_type

autogen/python/packages/autogen-agentchat/src/autogen_agentchat/teams/_group_chat/_selector_group_chat.py

Lines 140 to 141 in b15551c

    
           response = await self._model_client.create(messages=select_speaker_messages) 
        
           assert isinstance(response.content, str)

In BaseGroupChat, in the collect_output_messages closure, you can add GroupChatSelectSpeaker as a message type this closure listens to and accumulate usage inside the closure via a private member variable of BaseGroupChat class, and reset this variable after each run, just like the termination condition.

ekzhu added the proj-agentchat label Dec 16, 2024

ekzhu added this to the 0.4.0 milestone Dec 16, 2024

github-actions bot added the needs-triage label Dec 16, 2024

ekzhu removed the needs-triage label Dec 16, 2024

ekzhu assigned ekzhu and gziz and unassigned ekzhu Dec 16, 2024

ekzhu mentioned this issue Dec 18, 2024

Define AgentEvent, rename tool call messages to events. #4750

Merged

This was referenced Dec 20, 2024

Fix tracking SelectorGroupChat tokens with new Message type #4768

Closed

Track SelectorGroupChat token usage with new Message type #4771

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Total token usage and latency metrics should be reflected in `TaskResult` and `Response` #4719

Total token usage and latency metrics should be reflected in `TaskResult` and `Response` #4719

ekzhu commented Dec 16, 2024 •

edited

Loading

gziz commented Dec 18, 2024 •

edited

Loading

ekzhu commented Dec 18, 2024

gziz commented Dec 18, 2024 •

edited

Loading

husseinmozannar commented Dec 18, 2024

ekzhu commented Dec 18, 2024

gziz commented Dec 20, 2024 •

edited

Loading

ekzhu commented Dec 22, 2024

Total token usage and latency metrics should be reflected in TaskResult and Response #4719

Total token usage and latency metrics should be reflected in TaskResult and Response #4719

Comments

ekzhu commented Dec 16, 2024 • edited Loading

gziz commented Dec 18, 2024 • edited Loading

ekzhu commented Dec 18, 2024

gziz commented Dec 18, 2024 • edited Loading

husseinmozannar commented Dec 18, 2024

ekzhu commented Dec 18, 2024

gziz commented Dec 20, 2024 • edited Loading

ekzhu commented Dec 22, 2024

Total token usage and latency metrics should be reflected in `TaskResult` and `Response` #4719

Total token usage and latency metrics should be reflected in `TaskResult` and `Response` #4719

ekzhu commented Dec 16, 2024 •

edited

Loading

gziz commented Dec 18, 2024 •

edited

Loading

gziz commented Dec 18, 2024 •

edited

Loading

gziz commented Dec 20, 2024 •

edited

Loading