/goal Stop hook fails JSON validation when sub-evaluator wraps response in markdown (V4 Flash backend)

# `/goal` Stop hook fails JSON validation when sub-evaluator wraps response in markdown

## TL;DR

Happy's `/goal` Stop hook delegates the goal-met evaluation to a Claude Code sub-process whose stdout is parsed as strict JSON (`{ok: boolean, reason: string}`). When the underlying LLM is **DeepSeek V4 Flash** (via `ANTHROPIC_BASE_URL`), the evaluator non-deterministically wraps its JSON output in markdown headers + code fences. The parent then emits `Stop hook error: JSON validation failed` and the `/goal` loop parks indefinitely, even though `preventedContinuation: false` is recorded in the event.

A 3-window sweep on identical workload reproduced the bug in 2 of 3 runs. Tolerant JSON extraction in the evaluator would fix it.

## Environment

- Happy CLI: **1.1.8** (npm `happy`)
- Claude Code: **2.1.177**
- Backend: DeepSeek V4 Flash via Anthropic-compatible endpoint
  - `ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic`
  - `ANTHROPIC_DEFAULT_*_MODEL=deepseek-v4-flash`
- OS: Ubuntu 22.04 (VPS, `/usr/lib/node_modules/happy`)

## Reproduction

Reliable on V4 Flash. Run any `/goal` with a clearly checkable success condition and let it complete the work. The Stop hook is invoked and ~2/3 of the time emits the JSON validation failure.

A controlled synthetic reproduction (`/goal` "Run 8 iterations. Each iter: read 4 × 25MB files, spawn 4 serial subagents, summarize, repeat") run at three `CLAUDE_CODE_AUTO_COMPACT_WINDOW` values:

| Run | Window | Iters completed | Stop hook stdout (literal) | Outcome |
|-----|--------|-----------------|-----------------------------|---------|
| T1 | 100K | 8/8 | `{"ok":true,"reason":"Goal achieved"}` | Clean exit |
| T2 | 150K | 8/8 | `**Stop condition: NOT SATISFIED**` newline newline ` ```json` newline `{"ok":false,"reason":"..."}` newline ` ``` ` | `JSON validation failed`, **session parked** |
| T3 | 200K | 8/8 | First call valid `{"ok":false,...}`, then 14-min recovery loop, then second call markdown-wrapped JSON | `JSON validation failed`, **session parked** |

So window size is not the variable — V4 Flash's adherence to "JSON only, no markdown" is the variable, and it's non-deterministic on identical input.

## Evidence from Claude Code session JSONL

From `/root/.claude-glm/projects/.../{uuid}.jsonl`:

```json
{
  "type": "attachment",
  "attachment": {
    "type": "hook_non_blocking_error",
    "hookName": "Stop",
    "hookEvent": "Stop",
    "stderr": "JSON validation failed",
    "stdout": "**Stop condition: NOT SATISFIED**\n\n```json\n{\n  \"ok\": false,\n  \"reason\": \"...\"\n}\n```",
    "exitCode": 1
  }
}
```

`preventedContinuation: false`, `level: "suggestion"` — but the user-visible effect IS a hard park: agent stops dispatching, sits at `❯` prompt with `/goal active` indefinitely.

## Why it matters

This isn't a "Claude API on Claude" issue — Anthropic models reliably honor strict-JSON prompts. But:

1. Happy supports `ANTHROPIC_BASE_URL` overrides (third-party models)
2. DeepSeek V4 Flash is a common cost-saving backend (1M context, ~1/10th the price of Claude Opus)
3. Long-running `/goal` sessions (12-24hr autonomous bug-hunting in our case) are exactly where stop-hook reliability matters most

We've had `/goal` sessions complete real-world work (e.g. landing a HIGH-severity finding in a 4-hour run) only to park on the Stop hook and require manual harvest of the artifacts.

## Suggested fix (defensive parser)

Replace the strict `JSON.parse(stdout)` in the `/goal` Stop hook evaluator with a tolerant extractor:

1. Strip leading/trailing whitespace.
2. Find the first/last `{...}` balanced object via a brace-counting scan.
3. `JSON.parse` on each candidate.
4. Accept the first parse that matches the expected schema `{ok: boolean, reason: string}`.
5. Reject candidates with missing/incompatible fields.

This way:
- Pure JSON output still works (no behavior change for Claude models).
- Markdown-fenced JSON works (handles V4 Flash markdown wrapping).
- Prose with `{...}` not matching schema is rejected (no false positives).

## Bonus suggestion (prompt hardening)

When formulating the Stop hook evaluator's system prompt, consider adding:

> "If a requested condition cannot be verified using available CLI-visible state after one explicit check, record it as unverifiable, do not loop on it, and evaluate completion using the remaining verifiable criteria."

This prevents the T3 failure mode where the evaluator pedantically returns `ok: false` on a met goal because some brittle sub-condition (e.g. "report context %") wasn't satisfied, which then sends the agent into an unrecoverable recovery loop.

## Willing to help test

We run a fleet of long-running `/goal` sessions on V4 Flash daily — happy to test a patched build with the tolerant extractor and report repro-rate before/after.

---

cc @ex3ndr (or whoever maintains the `/goal` hook)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/goal Stop hook fails JSON validation when sub-evaluator wraps response in markdown (V4 Flash backend) #1405

`/goal` Stop hook fails JSON validation when sub-evaluator wraps response in markdown

TL;DR

Environment

Reproduction

Evidence from Claude Code session JSONL

Why it matters

Suggested fix (defensive parser)

Bonus suggestion (prompt hardening)

Willing to help test

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Run	Window	Iters completed	Stop hook stdout (literal)	Outcome
T1	100K	8/8	`{"ok":true,"reason":"Goal achieved"}`	Clean exit
T2	150K	8/8	`Stop condition: NOT SATISFIED` newline newline ```json newline `{"ok":false,"reason":"..."}` newline ```	`JSON validation failed`, session parked
T3	200K	8/8	First call valid `{"ok":false,...}`, then 14-min recovery loop, then second call markdown-wrapped JSON	`JSON validation failed`, session parked

/goal Stop hook fails JSON validation when sub-evaluator wraps response in markdown (V4 Flash backend) #1405

Description

/goal Stop hook fails JSON validation when sub-evaluator wraps response in markdown

TL;DR

Environment

Reproduction

Evidence from Claude Code session JSONL

Why it matters

Suggested fix (defensive parser)

Bonus suggestion (prompt hardening)

Willing to help test

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`/goal` Stop hook fails JSON validation when sub-evaluator wraps response in markdown