fix: don't pass task stop sequences to vLLM for reasoning models by jwmacd · Pull Request #3700 · EleutherAI/lm-evaluation-harness

jwmacd · 2026-04-12T20:24:16Z

When think_end_token is set, task-level stop sequences like "\n\n" (the fewshot delimiter default) fire inside blocks and truncate generation before any response is produced.

Two changes:

vllm_causallms.py: When think_end_token is set, only pass EOS to vLLM's SamplingParams. Task stop sequences remain in the cached gen_kwargs for post-processing.
utils.py: Reorder postprocess_generated_text to strip thinking content before applying stop sequences, so stops match the actual response rather than the reasoning trace.

Non-reasoning models are unaffected — the code path only diverges when think_end_token is set.

Scope

This affects all generate_until tasks. 17 tasks in the repo don't specify explicit until in their generation_kwargs and inherit the
fewshot delimiter (typically "\n\n") as a stop sequence. Any reasoning model evaluated on these tasks may produce truncated or empty output without this fix.

Test results

Tested with Kimi-K2.5 (MoE, reasoning_parser=kimi_k2) on JSONSchemaBench (generate_until task, default until: ["\n\n"] from fewshot delimiter, max_model_len=65536, max_gen_toks=32768):

┌────────────────────┬─────────┬───────────┬─────────┐
│ │ JS-Easy │ JS-Medium │ JS-Hard │
├────────────────────┼─────────┼───────────┼─────────┤
│ Before (unpatched) │ 0.00 │ 0.00 │ 0.00 │
├────────────────────┼─────────┼───────────┼─────────┤
│ After (patched) │ 0.99 │ 0.96 │ 0.88 │
└────────────────────┴─────────┴───────────┴─────────┘

Before: all 1,531 samples scored 0.0 on both json_validity and schema_compliance — generation was truncated inside blocks before any JSON was produced. Total eval time 778s (near-immediate truncation per sample).

After: real scores, 34 min eval time with full thinking traces and JSON output.

When think_end_token is set, task-level stop sequences like "\n\n" (the fewshot delimiter default) fire inside <think> blocks and truncate generation before any response is produced. Two changes: 1. vllm_causallms.py: When think_end_token is set, only pass EOS to vLLM's SamplingParams. Task stop sequences remain in the cached gen_kwargs for post-processing. 2. utils.py: Reorder postprocess_generated_text to strip thinking content before applying stop sequences, so stops match the actual response rather than the reasoning trace. Non-reasoning models are completely unaffected — the code path only diverges when think_end_token is set.

CLAassistant · 2026-04-12T20:24:25Z

All committers have signed the CLA.

jwmacd requested a review from 0xSMT as a code owner April 12, 2026 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: don't pass task stop sequences to vLLM for reasoning models#3700

fix: don't pass task stop sequences to vLLM for reasoning models#3700
jwmacd wants to merge 1 commit intoEleutherAI:mainfrom
jwmacd:fix/reasoning-stop-sequences

jwmacd commented Apr 12, 2026

Uh oh!

CLAassistant commented Apr 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jwmacd commented Apr 12, 2026

Uh oh!

CLAassistant commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Apr 12, 2026 •

edited

Loading