Skip to content

[BUG] [0.x] Sum streamed usage across tool-call steps in all provider gateways#698

Open
mo-inkhan wants to merge 9 commits into
laravel:0.xfrom
mo-inkhan:fix-streamed-usage-accumulation
Open

[BUG] [0.x] Sum streamed usage across tool-call steps in all provider gateways#698
mo-inkhan wants to merge 9 commits into
laravel:0.xfrom
mo-inkhan:fix-streamed-usage-accumulation

Conversation

@mo-inkhan

Copy link
Copy Markdown

The bug

For streamed multi-step (tool-calling) turns, the final StreamEnd event reports only the last inference step's usage. In every provider's HandlesTextStreaming concern, each step's usage is captured into $usage, but when a step ends with pending tool calls the method delegates to handleStreamingToolCalls() and returns without emitting a StreamEnd — so that step's usage is discarded. Only the deepest (final, text-only) recursion emits a StreamEnd, carrying just its own usage. StreamedAgentResponse sums StreamEnd events via StreamEnd::combineUsage(), but there is only ever one per run (and a Usage(0, 0) one when max steps are exhausted).

Anything consuming this usage — StreamedAgentResponse->usage, the AgentStreamed event, the conversation store's usage column — under-reports multi-step turns. On a real-world agent run with ~12 tool-calling steps we measured a reported completion_tokens of 565 for a turn that generated tens of thousands of output tokens across its steps; anyone metering or billing on these numbers under-counts severalfold.

The Bedrock gateway already handles this correctly ($totalUsage = $totalUsage->add($stepUsage) across steps); this PR brings the other gateways in line with it. The non-streamed path is also already correct (ParsesTextResponses returns combineUsage($steps)).

The fix

Thread an accumulated Usage carry through the step recursion in each gateway: when a step ends in tool calls, fold its usage into the carry; pass the carry into the continuation stream; report carry + final step on the final StreamEnd. Max-steps-exhausted StreamEnds report the carry instead of Usage(0, 0). Anthropic's pause_turn resume path is threaded the same way. The new parameters are optional and last, so no call sites change. One StreamEnd is still emitted per run, so the public event stream is unchanged — only the usage totals are now complete.

Covered gateways: OpenAI (incl. Azure OpenAI via the shared concern), Anthropic, Gemini, DeepSeek, Groq, Mistral, Ollama, OpenRouter, xAI. One commit per provider.

Tests

Each provider's StreamingTest gains a streaming sums usage across tool call steps test (10 in total, including Azure OpenAI): a two-step tool-call stream asserting the single StreamEnd carries the summed prompt/completion/cached tokens. All fail on current 0.x and pass with this change.

@mo-inkhan mo-inkhan changed the title [0.x] Sum streamed usage across tool-call steps in all provider gateways [BUG] [0.x] Sum streamed usage across tool-call steps in all provider gateways Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant