Skip to content

fix(renderer-client): wrap tools in OpenAI envelope to match training distribution#1307

Open
hallerite wants to merge 5 commits intomainfrom
fix/renderer-client-tool-envelope
Open

fix(renderer-client): wrap tools in OpenAI envelope to match training distribution#1307
hallerite wants to merge 5 commits intomainfrom
fix/renderer-client-tool-envelope

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 7, 2026

Summary

  • RendererClient.to_native_tool was returning bare-form {name, description, parameters} while every other client (OpenAIChatCompletionsClient, OpenAIResponsesClient) wraps tools in the OpenAI envelope {type: function, function: {...}}.
  • Modern function-calling models (Qwen3 family, GLM, Kimi, ...) saw the envelope at training time. TITO/MITO send envelope server-side via apply_chat_template, so the model recognises the tools list and emits <tool_call> blocks. Under use_renderer=true, the renderer was given bare-form tools and produced an out-of-distribution prompt — Qwen3's tool | tojson is a passthrough, so the model saw a tool list it had never been trained on and reliably failed to emit <tool_call>. Multi-turn ToolEnv rollouts hit zero rewards and tripped the zero_advantage filter at step 0.
  • Bug has existed since RendererClient was introduced (feat: add renderers package #1068, the "feat: add renderers package" PR). Single-turn / no-tool smoke tests (reverse-text) never exercised this path, so it slipped past CI for ~7 days until a real ToolEnv rollout hit it.

Test plan

  • uv run pytest tests/test_renderer_client.py (32 existing + 2 new envelope-contract tests pass)
  • uv run pytest tests/test_renderer_client.py tests/test_renderer_e2e.py (42 total pass)
  • Re-run a multi-turn ToolEnv rollout under use_renderer=true against Qwen/Qwen3-4B-Instruct-2507 and confirm non-zero rewards on first step

🤖 Generated with Claude Code


Note

Medium Risk
Changes the wire format of tools passed to the renderer to match OpenAI’s {type:"function", function:{...}} contract, which can affect tool-calling behavior across models and any code expecting the previous bare {name, description, parameters} shape.

Overview
Ensures RendererClient.to_native_tool emits tools in the OpenAI function-calling envelope ({"type":"function","function":{...}}) instead of the prior bare {name, description, parameters} dict, aligning renderer-mode prompts with other OpenAI clients.

Propagates Tool.strict into the inner function object when set, and adds regression tests asserting the envelope shape and strict-flag behavior.

Reviewed by Cursor Bugbot for commit 2eb1a38. Bugbot is set up for automated code reviews on this repo. Configure here.

hallerite and others added 4 commits May 7, 2026 22:24
… distribution

`RendererClient.to_native_tool` returned bare-form `{name, description,
parameters}`, while every other client (`OpenAIChatCompletionsClient`,
`OpenAIResponsesClient`) wraps tools in the OpenAI envelope
`{"type": "function", "function": {...}}`. Modern function-calling
models (Qwen3 family, GLM, Kimi, ...) saw the envelope at training
time — TITO/MITO send envelope server-side via apply_chat_template,
so the model recognises the tools list and emits `<tool_call>` blocks.

Under `use_renderer=true`, the client-side renderer was given bare-form
tools and produced an out-of-distribution prompt: the chat templates
serialise whatever they receive verbatim (Qwen3's `tool | tojson` is a
passthrough), so the model saw a tool list it had never been trained on
and reliably failed to emit `<tool_call>` — producing zero rewards on
multi-turn ToolEnv rollouts and tripping the zero_advantage filter at
step 0.

The bug has existed since `RendererClient` was introduced (#1068, the
"feat: add renderers package" PR) but only fires when (multi-turn tool
env + use_renderer=true). It went undetected because reverse-text — the
only smoke test exercised against the renderer client — has no tools.

Cast to `ToolSpec` keeps type-check parity with the currently-pinned
renderers release (which still types `ToolSpec` as bare); the cast
becomes redundant once the renderers package publishes an
envelope-shaped `ToolSpec`. Two regression tests pin the contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The function now mirrors OpenAIChatCompletionsClient.to_native_tool
shape — the envelope contract is self-documenting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tool.description and Tool.parameters are non-optional in the Pydantic
model — the fallbacks were dead code carried over from the prior
bare-form implementation. Function now mirrors
OpenAIChatCompletionsClient.to_native_tool exactly modulo the typed
model vs dict-literal construction style.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
vllm.SamplingParams.max_tokens defaults to 16. /v1/chat/completions
masks this server-side via get_max_tokens(), so TITO/MITO callers that
omit max_tokens get the full remaining context. /inference/v1/generate
(the new disagg endpoint this client talks to) hands SamplingParams to
the engine verbatim and skips that defaulting — the 16-token default
leaks through and silently caps every generation at 16 tokens. That's
exactly long enough to start a `<tool_call>\n{"name": "...", "arguments":
{"` envelope but not long enough to close one, so the JSON parse fails,
tool_calls comes back None, no_tools_called fires, and every tool-using
rollout produces reward 0.

Until vLLM patches serve_tokens to apply the same defaulting, query
max_model_len once per (server, model) from /v1/models and default to
max_model_len - len(prompt_ids) when the caller omitted max_tokens.
Cache forever per process — max_model_len is fixed at server startup.

TODO marker on the cache and the call site point to the upstream fix
(vllm/entrypoints/serve/disagg/serving.py::serve_tokens) so the dead
code is easy to reap once that lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 831f8bc. Configure here.

Comment thread verifiers/clients/renderer_client.py Outdated
…rompt_len"

This reverts commit 831f8bc.

The fix moved server-side: prime-rl's PrimeRlServingTokens now applies
get_max_tokens() defaulting in serve_tokens (PrimeIntellect-ai/prime-rl#2408,
commit 913cc4ca), matching every other vLLM endpoint. The client-side
workaround was always a band-aid and is no longer needed for prime-rl
deployments. Other vLLM 0.20 deployments hitting /inference/v1/generate
still need the upstream fix or to apply the prime-rl override locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant