[BUG] [0.x] Sum streamed usage across tool-call steps in all provider gateways#698
Open
mo-inkhan wants to merge 9 commits into
Open
[BUG] [0.x] Sum streamed usage across tool-call steps in all provider gateways#698mo-inkhan wants to merge 9 commits into
mo-inkhan wants to merge 9 commits into
Conversation
Covers Azure OpenAI as well, which reuses the OpenAI streaming concern; a regression test is included for both providers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The bug
For streamed multi-step (tool-calling) turns, the final
StreamEndevent reports only the last inference step's usage. In every provider'sHandlesTextStreamingconcern, each step's usage is captured into$usage, but when a step ends with pending tool calls the method delegates tohandleStreamingToolCalls()and returns without emitting aStreamEnd— so that step's usage is discarded. Only the deepest (final, text-only) recursion emits aStreamEnd, carrying just its own usage.StreamedAgentResponsesumsStreamEndevents viaStreamEnd::combineUsage(), but there is only ever one per run (and aUsage(0, 0)one when max steps are exhausted).Anything consuming this usage —
StreamedAgentResponse->usage, theAgentStreamedevent, the conversation store'susagecolumn — under-reports multi-step turns. On a real-world agent run with ~12 tool-calling steps we measured a reportedcompletion_tokensof 565 for a turn that generated tens of thousands of output tokens across its steps; anyone metering or billing on these numbers under-counts severalfold.The Bedrock gateway already handles this correctly (
$totalUsage = $totalUsage->add($stepUsage)across steps); this PR brings the other gateways in line with it. The non-streamed path is also already correct (ParsesTextResponsesreturnscombineUsage($steps)).The fix
Thread an accumulated
Usagecarry through the step recursion in each gateway: when a step ends in tool calls, fold its usage into the carry; pass the carry into the continuation stream; report carry + final step on the finalStreamEnd. Max-steps-exhaustedStreamEnds report the carry instead ofUsage(0, 0). Anthropic'spause_turnresume path is threaded the same way. The new parameters are optional and last, so no call sites change. OneStreamEndis still emitted per run, so the public event stream is unchanged — only the usage totals are now complete.Covered gateways: OpenAI (incl. Azure OpenAI via the shared concern), Anthropic, Gemini, DeepSeek, Groq, Mistral, Ollama, OpenRouter, xAI. One commit per provider.
Tests
Each provider's
StreamingTestgains astreaming sums usage across tool call stepstest (10 in total, including Azure OpenAI): a two-step tool-call stream asserting the singleStreamEndcarries the summed prompt/completion/cached tokens. All fail on current0.xand pass with this change.