Skip to content

feat: support renderer stop token IDs#9141

Draft
AmeenP wants to merge 1 commit intoai-dynamo:bis/dynamo-rlfrom
AmeenP:codex/renderer-stop-token-ids
Draft

feat: support renderer stop token IDs#9141
AmeenP wants to merge 1 commit intoai-dynamo:bis/dynamo-rlfrom
AmeenP:codex/renderer-stop-token-ids

Conversation

@AmeenP
Copy link
Copy Markdown
Contributor

@AmeenP AmeenP commented May 5, 2026

Summary

Adds renderer-style stop_token_ids support to the Dynamo chat completions path on top of bis/dynamo-rl.

  • accepts top-level stop_token_ids as an RL/verifiers passthrough field
  • parses it for NvCreateChatCompletionRequest
  • forwards it into StopConditions.stop_token_ids_hidden, the existing decoder path that hides stop tokens from output
  • adds a request-level test covering nvext.token_data + stop_token_ids extraction

This keeps the transport on /v1/chat/completions with nvext.token_data; it does not add /generate or revive /v1/chat/completions/tokens.

Compatibility

This is the Dynamo-side piece for Prime-RL/verifiers renderer parity. Companion verifiers PR: PrimeIntellect-ai/verifiers#1287

Validation

  • git diff --check

Attempted locally:

  • cargo fmt --check is blocked by unrelated formatting diffs already present in bis/dynamo-rl (lib/llm/src/http/service/openai.rs, lib/llm/src/protocols/openai/chat_completions/delta.rs).
  • RUSTC=$(rustup which --toolchain 1.90.0 rustc) $(rustup which --toolchain 1.90.0 cargo) test -p dynamo-llm --test test_common_ext test_chat_completions_stop_token_ids_extraction starts compiling but fails on macOS-only build incompatibilities in existing dynamo-llm Linux storage paths (dynamo_memory::numa, DiskStorage, fallocate, O_DIRECT).

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

👋 Hi AmeenP! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions Bot added external-contribution Pull request is from an external contributor frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels May 5, 2026
Signed-off-by: AmeenP <ameenp360@gmail.com>
@AmeenP AmeenP changed the title Support renderer stop token IDs feat: support renderer stop token IDs May 5, 2026
@github-actions github-actions Bot added the feat label May 5, 2026
@AmeenP AmeenP force-pushed the codex/renderer-stop-token-ids branch from ec45e57 to bf00fb2 Compare May 5, 2026 06:32
biswapanda added a commit that referenced this pull request May 5, 2026
…clusion

Three improvements borrowed from PR #9141 (Ameen Patel) on top of the
existing PASSTHROUGH_EXTRA_FIELDS expansion:

1. get_stop_token_ids returns Result<Option<Vec<TokenIdType>>>, not
   Option<Vec<TokenIdType>>. Malformed payloads (e.g.
   stop_token_ids: "not-an-array") now surface as a typed 400 with the
   diagnostic 'stop_token_ids must be an array of unsigned token IDs:
   {err}'. extract_stop_conditions propagates the Result via ?.

   Replaces the prior silent-fallback Option<> variant which dropped
   malformed inputs without telling the caller. Silent drops on RL
   correctness primitives (stop conditions affect what tokens
   the engine emits) is the bug class CR-9 was about — same principle
   applies here.

2. Mutual-exclusion between messages and pre-tokenized input is now
   scoped to the canonical TOP-LEVEL prompt_token_ids extension only.
   The legacy nvext.token_data channel — which the verifiers
   dynamo_chat_nvext renderer transport (#1287) uses with placeholder
   messages 'role: user, content: (token-in mode)' — is allowed to
   coexist with non-empty messages. validate_messages still gates the
   empty-messages-with-no-tokens case.

   Without this relaxation, the renderer transport's placeholder
   pattern would 400 on every request.

3. Two new tests in test_common_ext.rs:
   - test_chat_completions_stop_token_ids_extraction: positive case
     with nvext.token_data + top-level stop_token_ids (lifted from
     PR #9141 verbatim).
   - test_chat_completions_stop_token_ids_malformed_returns_400:
     verifies the typed-error path on bad input.

Pre-existing test struct-init sites in test_common_ext.rs were missing
required fields (return_token_ids, tokens) added to the
NvCreateChatCompletionRequest struct since the tests were written. Three
sites updated to construct cleanly. cargo test test_common_ext: 15
tests, 15 passes.
biswapanda added a commit that referenced this pull request May 8, 2026
Cleanup-only — no behavior change. Strips review-tracker noise that
accumulated on top of PR-added text during iteration:

  - "Closes hhzhang16 HH-19/HH-21/HH-22/HH-23/HH-25/HH-26/HH-27"
  - "CR-8 / CR-9 / CR-10 closure" prefixes on serde-error / doc-attach fixes
  - Branch-name references: bis/parity-tokenize-tcp, bis/prime-rl-merged
  - Internal PR numbers: #6094, #7699, #8197, #9141
  - Phase numbers from internal design docs (rl-support.md Phase 1/4/5)
  - "prime-rl" mentions in narrative copy and mermaid diagrams →
    generic "RL trainer / RL orchestrator / external client"

Technical content (semantics, invariants, why-this-exists rationale)
preserved everywhere; only the internal-process scaffolding is removed.

Scope verification: every removed line is one this branch ADDED
(diff main..HEAD shows the removed text on a `+` line). No edits land
on pre-existing main-branch comments. Specifically reverted the
nvext.rs cleanup attempt — its target lines (GAIE Stage 1/2,
SGLang-specific) live on main, not in this PR's diff.

Files touched:
  components/src/dynamo/vllm/handlers.py            +9  -10
  components/src/dynamo/vllm/worker_factory.py      +6  -4
  docs/dynamo-RL-api.md                             +19 -32
  lib/llm/src/http/service/openai.rs                +32 -34
  lib/llm/src/protocols/openai/chat_completions/delta.rs  +4 -4
  lib/llm/src/protocols/openai/completions/delta.rs       +3 -3
  lib/llm/src/protocols/openai/validate.rs                +20 -20

cargo check -p dynamo-llm: clean (1 pre-existing benign warning).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant