/goal Stop hook fails JSON validation when sub-evaluator wraps response in markdown
TL;DR
Happy's /goal Stop hook delegates the goal-met evaluation to a Claude Code sub-process whose stdout is parsed as strict JSON ({ok: boolean, reason: string}). When the underlying LLM is DeepSeek V4 Flash (via ANTHROPIC_BASE_URL), the evaluator non-deterministically wraps its JSON output in markdown headers + code fences. The parent then emits Stop hook error: JSON validation failed and the /goal loop parks indefinitely, even though preventedContinuation: false is recorded in the event.
A 3-window sweep on identical workload reproduced the bug in 2 of 3 runs. Tolerant JSON extraction in the evaluator would fix it.
Environment
- Happy CLI: 1.1.8 (npm
happy)
- Claude Code: 2.1.177
- Backend: DeepSeek V4 Flash via Anthropic-compatible endpoint
ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic
ANTHROPIC_DEFAULT_*_MODEL=deepseek-v4-flash
- OS: Ubuntu 22.04 (VPS,
/usr/lib/node_modules/happy)
Reproduction
Reliable on V4 Flash. Run any /goal with a clearly checkable success condition and let it complete the work. The Stop hook is invoked and ~2/3 of the time emits the JSON validation failure.
A controlled synthetic reproduction (/goal "Run 8 iterations. Each iter: read 4 × 25MB files, spawn 4 serial subagents, summarize, repeat") run at three CLAUDE_CODE_AUTO_COMPACT_WINDOW values:
| Run |
Window |
Iters completed |
Stop hook stdout (literal) |
Outcome |
| T1 |
100K |
8/8 |
{"ok":true,"reason":"Goal achieved"} |
Clean exit |
| T2 |
150K |
8/8 |
**Stop condition: NOT SATISFIED** newline newline ```json newline {"ok":false,"reason":"..."} newline ``` |
JSON validation failed, session parked |
| T3 |
200K |
8/8 |
First call valid {"ok":false,...}, then 14-min recovery loop, then second call markdown-wrapped JSON |
JSON validation failed, session parked |
So window size is not the variable — V4 Flash's adherence to "JSON only, no markdown" is the variable, and it's non-deterministic on identical input.
Evidence from Claude Code session JSONL
From /root/.claude-glm/projects/.../{uuid}.jsonl:
{
"type": "attachment",
"attachment": {
"type": "hook_non_blocking_error",
"hookName": "Stop",
"hookEvent": "Stop",
"stderr": "JSON validation failed",
"stdout": "**Stop condition: NOT SATISFIED**\n\n```json\n{\n \"ok\": false,\n \"reason\": \"...\"\n}\n```",
"exitCode": 1
}
}
preventedContinuation: false, level: "suggestion" — but the user-visible effect IS a hard park: agent stops dispatching, sits at ❯ prompt with /goal active indefinitely.
Why it matters
This isn't a "Claude API on Claude" issue — Anthropic models reliably honor strict-JSON prompts. But:
- Happy supports
ANTHROPIC_BASE_URL overrides (third-party models)
- DeepSeek V4 Flash is a common cost-saving backend (1M context, ~1/10th the price of Claude Opus)
- Long-running
/goal sessions (12-24hr autonomous bug-hunting in our case) are exactly where stop-hook reliability matters most
We've had /goal sessions complete real-world work (e.g. landing a HIGH-severity finding in a 4-hour run) only to park on the Stop hook and require manual harvest of the artifacts.
Suggested fix (defensive parser)
Replace the strict JSON.parse(stdout) in the /goal Stop hook evaluator with a tolerant extractor:
- Strip leading/trailing whitespace.
- Find the first/last
{...} balanced object via a brace-counting scan.
JSON.parse on each candidate.
- Accept the first parse that matches the expected schema
{ok: boolean, reason: string}.
- Reject candidates with missing/incompatible fields.
This way:
- Pure JSON output still works (no behavior change for Claude models).
- Markdown-fenced JSON works (handles V4 Flash markdown wrapping).
- Prose with
{...} not matching schema is rejected (no false positives).
Bonus suggestion (prompt hardening)
When formulating the Stop hook evaluator's system prompt, consider adding:
"If a requested condition cannot be verified using available CLI-visible state after one explicit check, record it as unverifiable, do not loop on it, and evaluate completion using the remaining verifiable criteria."
This prevents the T3 failure mode where the evaluator pedantically returns ok: false on a met goal because some brittle sub-condition (e.g. "report context %") wasn't satisfied, which then sends the agent into an unrecoverable recovery loop.
Willing to help test
We run a fleet of long-running /goal sessions on V4 Flash daily — happy to test a patched build with the tolerant extractor and report repro-rate before/after.
cc @ex3ndr (or whoever maintains the /goal hook)
/goalStop hook fails JSON validation when sub-evaluator wraps response in markdownTL;DR
Happy's
/goalStop hook delegates the goal-met evaluation to a Claude Code sub-process whose stdout is parsed as strict JSON ({ok: boolean, reason: string}). When the underlying LLM is DeepSeek V4 Flash (viaANTHROPIC_BASE_URL), the evaluator non-deterministically wraps its JSON output in markdown headers + code fences. The parent then emitsStop hook error: JSON validation failedand the/goalloop parks indefinitely, even thoughpreventedContinuation: falseis recorded in the event.A 3-window sweep on identical workload reproduced the bug in 2 of 3 runs. Tolerant JSON extraction in the evaluator would fix it.
Environment
happy)ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropicANTHROPIC_DEFAULT_*_MODEL=deepseek-v4-flash/usr/lib/node_modules/happy)Reproduction
Reliable on V4 Flash. Run any
/goalwith a clearly checkable success condition and let it complete the work. The Stop hook is invoked and ~2/3 of the time emits the JSON validation failure.A controlled synthetic reproduction (
/goal"Run 8 iterations. Each iter: read 4 × 25MB files, spawn 4 serial subagents, summarize, repeat") run at threeCLAUDE_CODE_AUTO_COMPACT_WINDOWvalues:{"ok":true,"reason":"Goal achieved"}**Stop condition: NOT SATISFIED**newline newline```jsonnewline{"ok":false,"reason":"..."}newline```JSON validation failed, session parked{"ok":false,...}, then 14-min recovery loop, then second call markdown-wrapped JSONJSON validation failed, session parkedSo window size is not the variable — V4 Flash's adherence to "JSON only, no markdown" is the variable, and it's non-deterministic on identical input.
Evidence from Claude Code session JSONL
From
/root/.claude-glm/projects/.../{uuid}.jsonl:{ "type": "attachment", "attachment": { "type": "hook_non_blocking_error", "hookName": "Stop", "hookEvent": "Stop", "stderr": "JSON validation failed", "stdout": "**Stop condition: NOT SATISFIED**\n\n```json\n{\n \"ok\": false,\n \"reason\": \"...\"\n}\n```", "exitCode": 1 } }preventedContinuation: false,level: "suggestion"— but the user-visible effect IS a hard park: agent stops dispatching, sits at❯prompt with/goal activeindefinitely.Why it matters
This isn't a "Claude API on Claude" issue — Anthropic models reliably honor strict-JSON prompts. But:
ANTHROPIC_BASE_URLoverrides (third-party models)/goalsessions (12-24hr autonomous bug-hunting in our case) are exactly where stop-hook reliability matters mostWe've had
/goalsessions complete real-world work (e.g. landing a HIGH-severity finding in a 4-hour run) only to park on the Stop hook and require manual harvest of the artifacts.Suggested fix (defensive parser)
Replace the strict
JSON.parse(stdout)in the/goalStop hook evaluator with a tolerant extractor:{...}balanced object via a brace-counting scan.JSON.parseon each candidate.{ok: boolean, reason: string}.This way:
{...}not matching schema is rejected (no false positives).Bonus suggestion (prompt hardening)
When formulating the Stop hook evaluator's system prompt, consider adding:
This prevents the T3 failure mode where the evaluator pedantically returns
ok: falseon a met goal because some brittle sub-condition (e.g. "report context %") wasn't satisfied, which then sends the agent into an unrecoverable recovery loop.Willing to help test
We run a fleet of long-running
/goalsessions on V4 Flash daily — happy to test a patched build with the tolerant extractor and report repro-rate before/after.cc @ex3ndr (or whoever maintains the
/goalhook)