fix: don't pass task stop sequences to vLLM for reasoning models#3700
Open
jwmacd wants to merge 1 commit intoEleutherAI:mainfrom
Open
fix: don't pass task stop sequences to vLLM for reasoning models#3700jwmacd wants to merge 1 commit intoEleutherAI:mainfrom
jwmacd wants to merge 1 commit intoEleutherAI:mainfrom
Conversation
When think_end_token is set, task-level stop sequences like "\n\n" (the fewshot delimiter default) fire inside <think> blocks and truncate generation before any response is produced. Two changes: 1. vllm_causallms.py: When think_end_token is set, only pass EOS to vLLM's SamplingParams. Task stop sequences remain in the cached gen_kwargs for post-processing. 2. utils.py: Reorder postprocess_generated_text to strip thinking content before applying stop sequences, so stops match the actual response rather than the reasoning trace. Non-reasoning models are completely unaffected — the code path only diverges when think_end_token is set.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When think_end_token is set, task-level stop sequences like "\n\n" (the fewshot delimiter default) fire inside blocks and truncate generation before any response is produced.
Two changes:
Non-reasoning models are unaffected — the code path only diverges when think_end_token is set.
Scope
This affects all generate_until tasks. 17 tasks in the repo don't specify explicit until in their generation_kwargs and inherit the
fewshot delimiter (typically "\n\n") as a stop sequence. Any reasoning model evaluated on these tasks may produce truncated or empty output without this fix.
Test results
Tested with Kimi-K2.5 (MoE, reasoning_parser=kimi_k2) on JSONSchemaBench (generate_until task, default until: ["\n\n"] from fewshot delimiter, max_model_len=65536, max_gen_toks=32768):
┌────────────────────┬─────────┬───────────┬─────────┐
│ │ JS-Easy │ JS-Medium │ JS-Hard │
├────────────────────┼─────────┼───────────┼─────────┤
│ Before (unpatched) │ 0.00 │ 0.00 │ 0.00 │
├────────────────────┼─────────┼───────────┼─────────┤
│ After (patched) │ 0.99 │ 0.96 │ 0.88 │
└────────────────────┴─────────┴───────────┴─────────┘
Before: all 1,531 samples scored 0.0 on both json_validity and schema_compliance — generation was truncated inside blocks before any JSON was produced. Total eval time 778s (near-immediate truncation per sample).
After: real scores, 34 min eval time with full thinking traces and JSON output.