fix(renderer-client): wrap tools in OpenAI envelope to match training distribution#1307
Open
fix(renderer-client): wrap tools in OpenAI envelope to match training distribution#1307
Conversation
… distribution
`RendererClient.to_native_tool` returned bare-form `{name, description,
parameters}`, while every other client (`OpenAIChatCompletionsClient`,
`OpenAIResponsesClient`) wraps tools in the OpenAI envelope
`{"type": "function", "function": {...}}`. Modern function-calling
models (Qwen3 family, GLM, Kimi, ...) saw the envelope at training
time — TITO/MITO send envelope server-side via apply_chat_template,
so the model recognises the tools list and emits `<tool_call>` blocks.
Under `use_renderer=true`, the client-side renderer was given bare-form
tools and produced an out-of-distribution prompt: the chat templates
serialise whatever they receive verbatim (Qwen3's `tool | tojson` is a
passthrough), so the model saw a tool list it had never been trained on
and reliably failed to emit `<tool_call>` — producing zero rewards on
multi-turn ToolEnv rollouts and tripping the zero_advantage filter at
step 0.
The bug has existed since `RendererClient` was introduced (#1068, the
"feat: add renderers package" PR) but only fires when (multi-turn tool
env + use_renderer=true). It went undetected because reverse-text — the
only smoke test exercised against the renderer client — has no tools.
Cast to `ToolSpec` keeps type-check parity with the currently-pinned
renderers release (which still types `ToolSpec` as bare); the cast
becomes redundant once the renderers package publishes an
envelope-shaped `ToolSpec`. Two regression tests pin the contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The function now mirrors OpenAIChatCompletionsClient.to_native_tool shape — the envelope contract is self-documenting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tool.description and Tool.parameters are non-optional in the Pydantic model — the fallbacks were dead code carried over from the prior bare-form implementation. Function now mirrors OpenAIChatCompletionsClient.to_native_tool exactly modulo the typed model vs dict-literal construction style. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
vllm.SamplingParams.max_tokens defaults to 16. /v1/chat/completions
masks this server-side via get_max_tokens(), so TITO/MITO callers that
omit max_tokens get the full remaining context. /inference/v1/generate
(the new disagg endpoint this client talks to) hands SamplingParams to
the engine verbatim and skips that defaulting — the 16-token default
leaks through and silently caps every generation at 16 tokens. That's
exactly long enough to start a `<tool_call>\n{"name": "...", "arguments":
{"` envelope but not long enough to close one, so the JSON parse fails,
tool_calls comes back None, no_tools_called fires, and every tool-using
rollout produces reward 0.
Until vLLM patches serve_tokens to apply the same defaulting, query
max_model_len once per (server, model) from /v1/models and default to
max_model_len - len(prompt_ids) when the caller omitted max_tokens.
Cache forever per process — max_model_len is fixed at server startup.
TODO marker on the cache and the call site point to the upstream fix
(vllm/entrypoints/serve/disagg/serving.py::serve_tokens) so the dead
code is easy to reap once that lands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 831f8bc. Configure here.
…rompt_len" This reverts commit 831f8bc. The fix moved server-side: prime-rl's PrimeRlServingTokens now applies get_max_tokens() defaulting in serve_tokens (PrimeIntellect-ai/prime-rl#2408, commit 913cc4ca), matching every other vLLM endpoint. The client-side workaround was always a band-aid and is no longer needed for prime-rl deployments. Other vLLM 0.20 deployments hitting /inference/v1/generate still need the upstream fix or to apply the prime-rl override locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
RendererClient.to_native_toolwas returning bare-form{name, description, parameters}while every other client (OpenAIChatCompletionsClient,OpenAIResponsesClient) wraps tools in the OpenAI envelope{type: function, function: {...}}.apply_chat_template, so the model recognises the tools list and emits<tool_call>blocks. Underuse_renderer=true, the renderer was given bare-form tools and produced an out-of-distribution prompt — Qwen3'stool | tojsonis a passthrough, so the model saw a tool list it had never been trained on and reliably failed to emit<tool_call>. Multi-turn ToolEnv rollouts hit zero rewards and tripped thezero_advantagefilter at step 0.RendererClientwas introduced (feat: add renderers package #1068, the "feat: add renderers package" PR). Single-turn / no-tool smoke tests (reverse-text) never exercised this path, so it slipped past CI for ~7 days until a real ToolEnv rollout hit it.Test plan
uv run pytest tests/test_renderer_client.py(32 existing + 2 new envelope-contract tests pass)uv run pytest tests/test_renderer_client.py tests/test_renderer_e2e.py(42 total pass)use_renderer=trueagainstQwen/Qwen3-4B-Instruct-2507and confirm non-zero rewards on first step🤖 Generated with Claude Code
Note
Medium Risk
Changes the wire format of tools passed to the renderer to match OpenAI’s
{type:"function", function:{...}}contract, which can affect tool-calling behavior across models and any code expecting the previous bare{name, description, parameters}shape.Overview
Ensures
RendererClient.to_native_toolemits tools in the OpenAI function-calling envelope ({"type":"function","function":{...}}) instead of the prior bare{name, description, parameters}dict, aligning renderer-mode prompts with other OpenAI clients.Propagates
Tool.strictinto the innerfunctionobject when set, and adds regression tests asserting the envelope shape and strict-flag behavior.Reviewed by Cursor Bugbot for commit 2eb1a38. Bugbot is set up for automated code reviews on this repo. Configure here.