fix(renderer-client): wrap tools in OpenAI envelope to match training distribution by hallerite · Pull Request #1307 · PrimeIntellect-ai/verifiers

hallerite · 2026-05-07T22:42:15Z

Summary

RendererClient.to_native_tool was returning bare-form {name, description, parameters} while every other client (OpenAIChatCompletionsClient, OpenAIResponsesClient) wraps tools in the OpenAI envelope {type: function, function: {...}}.
Modern function-calling models (Qwen3 family, GLM, Kimi, ...) saw the envelope at training time. TITO/MITO send envelope server-side via apply_chat_template, so the model recognises the tools list and emits <tool_call> blocks. Under use_renderer=true, the renderer was given bare-form tools and produced an out-of-distribution prompt — Qwen3's tool | tojson is a passthrough, so the model saw a tool list it had never been trained on and reliably failed to emit <tool_call>. Multi-turn ToolEnv rollouts hit zero rewards and tripped the zero_advantage filter at step 0.
Bug has existed since RendererClient was introduced (feat: add renderers package #1068, the "feat: add renderers package" PR). Single-turn / no-tool smoke tests (reverse-text) never exercised this path, so it slipped past CI for ~7 days until a real ToolEnv rollout hit it.

Test plan

uv run pytest tests/test_renderer_client.py (32 existing + 2 new envelope-contract tests pass)
uv run pytest tests/test_renderer_client.py tests/test_renderer_e2e.py (42 total pass)
Re-run a multi-turn ToolEnv rollout under use_renderer=true against Qwen/Qwen3-4B-Instruct-2507 and confirm non-zero rewards on first step

🤖 Generated with Claude Code

Note

Medium Risk
Changes the wire format of tools passed to the renderer to match OpenAI’s {type:"function", function:{...}} contract, which can affect tool-calling behavior across models and any code expecting the previous bare {name, description, parameters} shape.

Overview
Ensures RendererClient.to_native_tool emits tools in the OpenAI function-calling envelope ({"type":"function","function":{...}}) instead of the prior bare {name, description, parameters} dict, aligning renderer-mode prompts with other OpenAI clients.

Propagates Tool.strict into the inner function object when set, and adds regression tests asserting the envelope shape and strict-flag behavior.

^{Reviewed by Cursor Bugbot for commit 2eb1a38. Bugbot is set up for automated code reviews on this repo. Configure here.}

… distribution `RendererClient.to_native_tool` returned bare-form `{name, description, parameters}`, while every other client (`OpenAIChatCompletionsClient`, `OpenAIResponsesClient`) wraps tools in the OpenAI envelope `{"type": "function", "function": {...}}`. Modern function-calling models (Qwen3 family, GLM, Kimi, ...) saw the envelope at training time — TITO/MITO send envelope server-side via apply_chat_template, so the model recognises the tools list and emits `<tool_call>` blocks. Under `use_renderer=true`, the client-side renderer was given bare-form tools and produced an out-of-distribution prompt: the chat templates serialise whatever they receive verbatim (Qwen3's `tool | tojson` is a passthrough), so the model saw a tool list it had never been trained on and reliably failed to emit `<tool_call>` — producing zero rewards on multi-turn ToolEnv rollouts and tripping the zero_advantage filter at step 0. The bug has existed since `RendererClient` was introduced (#1068, the "feat: add renderers package" PR) but only fires when (multi-turn tool env + use_renderer=true). It went undetected because reverse-text — the only smoke test exercised against the renderer client — has no tools. Cast to `ToolSpec` keeps type-check parity with the currently-pinned renderers release (which still types `ToolSpec` as bare); the cast becomes redundant once the renderers package publishes an envelope-shaped `ToolSpec`. Two regression tests pin the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The function now mirrors OpenAIChatCompletionsClient.to_native_tool shape — the envelope contract is self-documenting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tool.description and Tool.parameters are non-optional in the Pydantic model — the fallbacks were dead code carried over from the prior bare-form implementation. Function now mirrors OpenAIChatCompletionsClient.to_native_tool exactly modulo the typed model vs dict-literal construction style. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vllm.SamplingParams.max_tokens defaults to 16. /v1/chat/completions masks this server-side via get_max_tokens(), so TITO/MITO callers that omit max_tokens get the full remaining context. /inference/v1/generate (the new disagg endpoint this client talks to) hands SamplingParams to the engine verbatim and skips that defaulting — the 16-token default leaks through and silently caps every generation at 16 tokens. That's exactly long enough to start a `<tool_call>\n{"name": "...", "arguments": {"` envelope but not long enough to close one, so the JSON parse fails, tool_calls comes back None, no_tools_called fires, and every tool-using rollout produces reward 0. Until vLLM patches serve_tokens to apply the same defaulting, query max_model_len once per (server, model) from /v1/models and default to max_model_len - len(prompt_ids) when the caller omitted max_tokens. Cache forever per process — max_model_len is fixed at server startup. TODO marker on the cache and the call site point to the upstream fix (vllm/entrypoints/serve/disagg/serving.py::serve_tokens) so the dead code is easy to reap once that lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 831f8bc. Configure here.}

…rompt_len" This reverts commit 831f8bc. The fix moved server-side: prime-rl's PrimeRlServingTokens now applies get_max_tokens() defaulting in serve_tokens (PrimeIntellect-ai/prime-rl#2408, commit 913cc4ca), matching every other vLLM endpoint. The client-side workaround was always a band-aid and is no longer needed for prime-rl deployments. Other vLLM 0.20 deployments hitting /inference/v1/generate still need the upstream fix or to apply the prime-rl override locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hallerite and others added 4 commits May 7, 2026 22:24

chore: drop verbose comment in to_native_tool

f212d41

The function now mirrors OpenAIChatCompletionsClient.to_native_tool shape — the envelope contract is self-documenting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor Bot reviewed May 7, 2026

View reviewed changes

Comment thread verifiers/clients/renderer_client.py Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(renderer-client): wrap tools in OpenAI envelope to match training distribution#1307

fix(renderer-client): wrap tools in OpenAI envelope to match training distribution#1307
hallerite wants to merge 5 commits intomainfrom
fix/renderer-client-tool-envelope

hallerite commented May 7, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hallerite commented May 7, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hallerite commented May 7, 2026 •

edited by cursor Bot

Loading