From c6022362ce7e2edc52b356a9a4efb45556cf86dc Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 09:40:50 +0200 Subject: [PATCH 01/17] Add ln-induct skill: generative diagnostic lens from review-bot comments Induces a fault-type lens from PR review-bot comments (or supplied observations), gates token-vs-symptom via a three-part promotion test, audits both family and ownership axes, and proposes graduating recurring lenses into ln-review's contract catalog. Wire into ln-consult routing and the ln-skills praxis reference; back-reference from ln-review. Amp-Thread-ID: https://ampcode.com/threads/T-019ea618-c2ac-721d-809a-a72bfd9ce453 Co-authored-by: Amp --- .agents/skills/ln-consult/SKILL.md | 1 + .agents/skills/ln-induct/SKILL.md | 121 +++++++++++++++++++++++++++++ .agents/skills/ln-review/SKILL.md | 2 + docs/praxis/ln-skills.md | 2 + 4 files changed, 126 insertions(+) create mode 100644 .agents/skills/ln-induct/SKILL.md diff --git a/.agents/skills/ln-consult/SKILL.md b/.agents/skills/ln-consult/SKILL.md index 82548c877..5c1027533 100644 --- a/.agents/skills/ln-consult/SKILL.md +++ b/.agents/skills/ln-consult/SKILL.md @@ -109,6 +109,7 @@ Spikes are the escape hatch, not the default. | Module interface needs exploration | structural | `ln-design` | | Full or light scope card exists, ready to code | bounded, hardening, bugfix | `ln-build` | | Technical uncertainty blocks progress, or a cheap investigation could invalidate planned work | any | `ln-spike` | +| Review-bot comments or point findings may be symptomatic of a systemic fault | any | `ln-induct` | | Code works but needs restructuring | refactor | `ln-refactor` | | Code works but quality / architecture needs audit | any | `ln-review` | | Docs are stale, overweight, or milestone context needs cleanup | structural / maintenance | `ln-sync` | diff --git a/.agents/skills/ln-induct/SKILL.md b/.agents/skills/ln-induct/SKILL.md new file mode 100644 index 000000000..3f4a8ebf3 --- /dev/null +++ b/.agents/skills/ln-induct/SKILL.md @@ -0,0 +1,121 @@ +--- +name: ln-induct +description: "Treat PR review-bot comments (or similar point observations) as samples from a latent defect distribution: induce the operative fault-type, then audit the codebase for unsampled instances. Use when small review findings may be symptomatic of a systemic-ish fault or fallacy, and you want a generative diagnostic lens rather than a one-off fix." +argument-hint: "[pasted comments/observations, or empty to fetch the current branch's PR review comments]" +--- + +# Ln Induct + +A bot comment is a *sample*, not a fix. Each point finding is one draw from a latent defect distribution the author can't see. The move: infer the distribution from the samples, then go fishing for the instances nobody sampled. + +This skill **generates** lenses. `ln-review`'s `contract` category is the **library** of lenses that have already stabilized. `ln-induct` induces a fresh lens from this batch of evidence; when a lens recurs across PRs, step 6 proposes graduating it into `ln-review`. + +Read `memory/SPEC.md` first when it exists (lexicon, live architecture register, §Acknowledged Blind Spots). Read `memory/PLAN.md` for active frontier context when the touched area is in-flight. + +## Anti-sprawl is the point of the skill + +A generative audit *wants* to manufacture work — it goes looking for more. Left ungated it becomes completionist sprawl and topical caricature (`AGENTS.md`, user-global §Local necessity over category default). The triage gate (step 3) is what keeps this a diagnostic instrument and not a make-work generator. **Find and fix are separate**: this skill produces a triaged report and names adjacent work; it does not auto-implement. Routing to `ln-build`/`ln-refactor` is a separate, human-gated step. + +## Input + +Evidence to work from: $ARGUMENTS + +## 1. Ingest the evidence + +Two sources: + +- **Supplied directly** — if `$ARGUMENTS` carries comments or observations, use those verbatim. Any source counts: PR bot, human reviewer, a thing you noticed. +- **Fetched from the remote** — if `$ARGUMENTS` is empty, **confirm with the user** that you'll look up review comments for the current branch's PR, then fetch them. Use whatever remote-review access is available — GitHub is the usual case (`gh` / `cli-gh-axi`), but do not lock to one provider; GitLab, Graphite (`gt`), or another host are equally valid. Pick the access path that fits the repo. + +Normalize each item to `(location, claim, suggested fix)`. Drop nothing yet. + +## 2. Abstract each item to a fault type (the lens) + +For each item, climb the abstraction ladder from the concrete comment toward the fault *type* behind it. The stopping rule is the whole craft here: + +> **Stop at the lowest rung that is both mechanically searchable AND names a repair.** + +- Too low → you've restated the comment. No lift. +- Too high → "code should be correct." Useless. +- Just right → "a `Map` built from a list keyed by an assumed-unique field" — you can grep it, and you know the fix. + +The lens must be a *fishing instrument*, not a category label. Record the climb (`comment → rung → rung → lens`) so the abstraction is auditable and the user can challenge it. + +Seed your climb with the stabilized lenses in `ln-review` §Contract integrity as **priors**, not a checklist — they bias what to look for, but the operative lens is induced from *this* evidence and may be new. A batch may yield several distinct lenses, or none worth promoting. + +## 3. Triage: is it symptomatic? (the gate) + +For each induced lens, decide **fix-in-place** vs **generalize-and-audit**. Promote to audit only when **all three** hold: + +1. **Plausible recurrence** — a pattern a developer or agent reaches for repeatedly, not a freak. +2. **Cheap search exists** — there is a real family-grep or ownership seam you can actually sweep. +3. **High-value failure mode** — the fault is *silent / latent* (data silently dropped, a contract silently unhonored, a wrong default silently chosen). Loud faults self-report and don't need this skill. + +Fail any one → fix in place (or route the single finding), record nothing further, move on. Most items will not promote, and that is the correct outcome. + +## 4. Audit for unsampled instances + +For each promoted lens, fish along **both** axes — not just the easy one: + +- **Family axis** (syntactic / structural): find every site sharing the pattern's shape. Grep-shaped, fast. +- **Ownership axis** (responsibility / seam): audit everything a seam *owns*, to catch same-responsibility faults that share no syntax. This is the higher-value, harder sweep. **Force at least one ownership-seam question per promoted lens** — otherwise the skill quietly degenerates into "grep for the pattern." + +Collect each hit as a candidate finding. Verify it is a real instance, not a false positive that merely matches the shape. + +## 5. Report + +Emit triaged findings. For each: the **assumed contract** in one sentence, the **failure mode** when it breaks, the **repair class**, and a **confidence**. Repair classes (from `ln-review` §Contract integrity, extend if the induced lens needs a new one): + +- **enforce it loudly** — fail on violation (throw on collision, assert the invariant) +- **thread the real value** — carry provenance instead of hardcoding it +- **name the contract** — a predicate / type / comment that makes the assumption explicit +- **normalize at the boundary** — for ambient-environment leaks (paths, `cwd`, ordering) + +Name adjacent work; do not implement it. + +## 6. Propose graduation + +Last step, proposal only. If an induced lens recurred here, or matches one this skill has surfaced before, **propose** adding it to `ln-review` §Contract integrity (or as a new review category) — the same promote-stabilized-truth move `ln-sync` uses. State the lens, its cue, and its repair. Leave the edit to the user; do not modify `ln-review` unprompted. + +## Canonical reconciliation + +Reconcile only durable truth: + +- A recurring lens worth a permanent review pass → propose the `ln-review` edit (step 6). +- A confirmed systemic blind spot → propose an entry in `memory/SPEC.md` §Acknowledged Blind Spots. +- Findings tied to active frontier work → note against `memory/PLAN.md` status. +- One-off findings with no durable implication → no canonical update. + +Do not create alternate ledgers or audit docs. Canonical docs are `memory/SPEC.md` and `memory/PLAN.md`; the lens library lives in `ln-review`. + +## Output + +```md +## Induction: [evidence source] + +**Samples:** [n comments/observations ingested] + +### Lenses induced +1. [lens] — climb: `comment → … → lens` · gate: [promoted | fix-in-place: which test failed] + +### Findings (promoted lenses only) +| # | Lens | Location | Assumed contract | Failure mode | Repair | Confidence | +| - | ---- | -------- | ---------------- | ------------ | ------ | ---------- | + +### Graduation proposals +- [lens] → `ln-review` §Contract integrity (recurred: [evidence]) | none +``` + +## Routing + +After the report, present the relevant options to the user (use `tool-ask-question`): + +| # | Label | Target | Why | +| --- | -------------------- | ------------- | --- | +| 1 | Scope the fixes | `ln-scope` | Findings need buildable cards or durable seam updates | +| 2 | Build a fix | `ln-build` | A finding is settled and ready for red-green-refactor | +| 3 | Plan a cluster | `ln-refactor` | Findings cluster across a seam into a structural change | +| 4 | Graduate the lens | manual edit | A recurring lens should join `ln-review`'s catalog | +| 5 | Reconcile blind spot | `ln-sync` | A confirmed systemic gap belongs in SPEC §Blind Spots | + +Recommended depends on the findings: clusters → **3**, isolated silent faults → **1**, nothing promoted → stop and say so. diff --git a/.agents/skills/ln-review/SKILL.md b/.agents/skills/ln-review/SKILL.md index 937d5b1b3..07fd98cd5 100644 --- a/.agents/skills/ln-review/SKILL.md +++ b/.agents/skills/ln-review/SKILL.md @@ -62,6 +62,8 @@ Concrete cues to look for: Collect findings as numbered items (category: `contract`). Frame each as: the assumed contract in one sentence, the failure mode when it breaks, and which of the three repairs applies. Most are concrete fixes (`ln-scope`/`ln-build`); clusters across a seam route to `ln-refactor`. +This catalog is the stabilized lens library. `ln-induct` is the generator that induces fresh lenses from review-bot evidence and proposes graduating recurring ones into this list. + ### Oracle coverage (category: `oracle-coverage`) If `memory/SPEC.md` §Oracle Strategy by Loop Tier exists, check whether recent work implemented the oracles declared by the relevant `memory/PLAN.md` frontier definition. If a full or light scope card is available in session context, use it as a higher-resolution slice supplement, not the primary source of truth. Look for: diff --git a/docs/praxis/ln-skills.md b/docs/praxis/ln-skills.md index af15f017d..cb006bfbd 100644 --- a/docs/praxis/ln-skills.md +++ b/docs/praxis/ln-skills.md @@ -117,6 +117,7 @@ Posture ranks the next *vertical* slice; it has no completeness test, so vertica | `ln-scope` | A frontier item or next step needs a thin vertical slice with target behavior and acceptance criteria. | Scope card / slice definition. | | `ln-build` | A scoped slice is ready for TDD implementation. | Code, tests, inner-loop verification, and PLAN updates when appropriate. | | `ln-diagnose` | Something is broken, failing, flaky, slow, or nondeterministic. | Trusted repro loop, falsified hypotheses, regression oracle, route back to planning if needed. | +| `ln-induct` | Review-bot comments or point observations may be symptomatic of a systemic-ish fault. | An induced diagnostic lens, an audit for unsampled instances, and a triaged report. | | `ln-review` | After implementation bursts, or when architecture/model hygiene needs an opinionated audit. | Quality findings and next-step recommendations. | | `ln-refactor` | Working code needs restructuring without behavior change. | Refactor plan as tiny safe commits. | @@ -154,6 +155,7 @@ There is currently no project-local `ln-map` skill in `.agents/skills/`. If you | “Can this technical approach work?” | `ln-spike` | | “Can we make the idea tangible before committing?” | `ln-prototype` | | “Why is this failing?” | `ln-diagnose` | +| “Is this small finding a symptom of something systemic?” | `ln-induct` | | “Is this code still conceptually clean?” | `ln-review` | | “How do we restructure safely?” | `ln-refactor` | | “Are the docs still true?” | `ln-sync` | From 2d1b382956d1194b80b250fe1da34494fa1cd5aa Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 09:51:25 +0200 Subject: [PATCH 02/17] Harden resource-body-depth card for new-thread builders Anchor every prompt-resource family to an authoritative source (goals/methods have no README; goals->D59-L, methods->D58-L, strategies/lenses->README+SPEC), add a per-family facet checklist, make verification self-checkable via a required structural test, and add concurrency guardrails scoping edits to .pi/skills/**/*.md. Amp-Thread-ID: https://ampcode.com/threads/T-019ea2fc-9f12-767d-bd7a-08497f7307fd Co-authored-by: Amp --- .../crosscut-know--resource-body-depth.md | 75 ++++++++++++++++--- 1 file changed, 64 insertions(+), 11 deletions(-) diff --git a/memory/cards/crosscut-know--resource-body-depth.md b/memory/cards/crosscut-know--resource-body-depth.md index 36cd4cc03..9d5b1fcc2 100644 --- a/memory/cards/crosscut-know--resource-body-depth.md +++ b/memory/cards/crosscut-know--resource-body-depth.md @@ -17,14 +17,24 @@ Created: 2026-06-07 - **Volatile state:** the bodies are genuinely thin — every resource is ~5 lines (`goals/*`, `lenses/*`, `methods/{commit-graph,read-context,review-for-gaps}`, all four non-freestyle `strategies/*`); only `methods/{infer-and-capture,generate-proposal,run-structured-exchange}` - reach 12–15 lines. The contracts for what each body should contain already exist in the - family READMEs ([strategies/README.md](file:///Users/lunelson/Code/hashintel/brunch-next/src/.pi/skills/strategies/README.md) - lists the required facets; [lenses/README.md](file:///Users/lunelson/Code/hashintel/brunch-next/src/.pi/skills/lenses/README.md)). + reach 12–15 lines (use these three as the **shape exemplar** for body depth). +- **Source-anchoring gotcha (new-thread-critical):** only **strategies/** and **lenses/** have a + README contract; **goals/** and **methods/** do **not**. Do not invent content — anchor every + body to the authoritative source named in §Content sources below. The one-line manifest + descriptions in [`.pi/agents/state.ts`](file:///Users/lunelson/Code/hashintel/brunch-next/src/.pi/agents/state.ts) + (`GOAL_RESOURCES`, `STRATEGY_RESOURCES`, `LENS_RESOURCES`, `METHOD_RESOURCES`) already encode + each resource's intended one-line intent; the body expands that intent, it must not contradict it. +- **Concurrency note (new-thread-critical):** another agent is actively building the + `elicitation-backlog` frontier in `src/graph/` and `src/db/`. This card touches **only** + `src/.pi/skills/**/*.md` (plus optionally `state.ts` descriptions / `compose.test.ts`). Do **not** + edit `graph/`, `db/`, or the elicitation-backlog card — that is another tenant's blast radius. - **Drift note (handled in reconciliation, not here):** the Seam 3b *exchange-tool `.description()` / `promptGuidelines`* ● row is **already done** — all 7 exchange tools under `src/.pi/extensions/exchanges/` carry `description` + `promptGuidelines`. That row is reclassified `built` in the ledger; it is **out of scope** for this card. -- **Main open risk:** prose quality is eyeball-judged — verification is review-based, not a test. +- **Main open risk:** prose *quality* stays partly judgment-based, but acceptance does not depend on + it — a required structural test (§Verification Approach) gives every body an objective non-trivial-depth + floor and a self-checkable facet checklist (§Content sources) replaces "read it and decide." Posture: **earned** (inherited from cross-cut Seam 3a/3b — Fill=`earned`; settled scaffolding, just unbuilt bodies). This is content materialization into existing topology, not a new seam. @@ -38,10 +48,40 @@ Frontier-level cross-cutting obligations: - Keep each body scoped to its own axis; do not duplicate cross-axis content (goal vs strategy vs lens vs method are orthogonal, D59-L/D25-L). +### Content sources (per family — read these before writing any body) + +Every body expands its **manifest one-liner** in `.pi/agents/state.ts`; that one-liner is the +binding intent the body may not contradict. Beyond that, each family has a distinct authoritative +anchor and facet checklist: + +```pseudo tree +goals/ (4: grounding-advance, elicit-expand, commit-converge, capture-posture) + authority SPEC D59-L (defines all four goals + grade-derivation) + GOAL_RESOURCES one-liner + no README — D59-L IS the contract + facets what the agent pursues · what evidence advances it · what NOT to claim/do · + how it relates to its grade band (D64-L) · capture-posture never writes spec/graph truth +strategies/ (4 remaining: step-wise-decision-tree, step-wise-disambiguate, propose-graph, project-graph) + authority strategies/README.md §"Prompt resource contents" + STRATEGY_RESOURCES one-liner + SPEC D25-L/D26-L + exemplar strategies/freestyle.md (recently deepened — match this depth) + facets what the agent does · turn structure · commitment mechanism (D26-L) · + available graph ops · category-selection rubric for graph-writing strategies +lenses/ (3: intent, design, oracle) + authority lenses/README.md §"Topology-driven question ranking" + LENS_RESOURCES one-liner + SPEC D25-L/D56-L + facets topical/plane focus · favored kinds/edges · how it shapes interpretation · + topology-driven "what to ask next" heuristics from the README table +methods/ (6: run-structured-exchange, infer-and-capture, commit-graph, read-context, generate-proposal, review-for-gaps) + authority SPEC D58-L ("method resources are the prompt-level home for tool-routing/sequencing guidance") + METHOD_RESOURCES one-liner + no README — D58-L IS the contract + exemplar methods/{generate-proposal,run-structured-exchange,infer-and-capture}.md (already 12–15 lines) + facets concrete tool-routing/sequencing (NOT a restatement of the tool description) · + when to invoke · what to compose it with · what stays out of scope +``` + ### Objective Deepen the thin `.pi/skills/{goals,strategies,lenses,methods}` resource bodies so each carries the -real per-axis instruction its README contract requires, without changing the manifest registry. +real per-axis instruction its authoritative source (§Content sources) requires, without changing the +manifest registry. ### Acceptance Criteria @@ -58,16 +98,26 @@ resource body depth │ └── ✓ each method body gives concrete tool-routing/sequencing guidance (the D58-L method role), │ not a restatement of the tool description └── consistency - ├── ✓ no body contradicts its README contract or another axis's responsibility - └── ✓ manifest descriptions in state.ts still match each deepened body's intent + ├── ✓ no body contradicts its §Content sources authority or another axis's responsibility + ├── ✓ each body expands (does not contradict) its state.ts manifest one-liner + └── ✓ no new capability/authority/tool invented beyond what the source already grants ``` ### Verification Approach +Builder-portable, no human-only step required to pass the card: + ``` -- Inner: review-based — each body read against its family README contract; build/lint proves resources still load. -- Inner (light, if cheap): a structural test asserting each resource exceeds a trivial threshold - and the manifest location resolves to a readable file (extends existing compose/readability tests). +- Self-check (objective): for each body, walk its §Content sources facet checklist and confirm + every facet is addressed in prose; confirm the body still reads as an expansion of its + state.ts one-liner and invents no new authority/tool. +- Structural test (REQUIRED): extend the existing compose/readability test (compose.test.ts) to assert, + for every manifest entry across all four families, that location resolves to a readable file whose + body exceeds a non-trivial line/char threshold (i.e. beyond the current ~5-line placeholders). + This converts "bodies are thin" into a failing assertion before the pass and a passing one after. +- Gate: `npm run verify` (fix → test → build) — proves all resources still load and the manifest + location wiring is intact. +- Human review is optional polish AFTER the gate is green; it is not required for acceptance. ``` ### Cross-cutting obligations @@ -92,9 +142,12 @@ src/.pi/skills/ ├── lenses/{intent,design,oracle}.md ~ └── methods/{run-structured-exchange,infer-and-capture,commit-graph,read-context,generate-proposal,review-for-gaps}.md ~ src/.pi/agents/state.ts ? (only if a manifest description needs to match a deepened body) -src/.pi/agents/compose.test.ts ? (only if a light structural/readability assertion is added) +src/.pi/agents/compose.test.ts ~ (REQUIRED: structural non-trivial-depth + location-resolves assertion) ``` +Stay inside this tree. Do **not** touch `src/graph/**`, `src/db/**`, or `memory/PLAN.md` / +`memory/CROSS_CUT_PLAN.md` — the `elicitation-backlog` builder owns those concurrently. + ### Promotion checklist All **no** — stays a light/earned content card: From b2475d765ff6f2d482398b57040e4490d20d29d1 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 09:51:49 +0200 Subject: [PATCH 03/17] Materialize elicitation backlog substrate Amp-Thread-ID: https://ampcode.com/threads/T-019ea2ec-7506-74ed-a4e0-99b8d800442f Co-authored-by: Amp --- drizzle/0003_outstanding_black_bird.sql | 19 + drizzle/meta/0003_snapshot.json | 766 ++++++++++++++++++ drizzle/meta/_journal.json | 7 + memory/CROSS_CUT_PLAN.md | 8 +- memory/PLAN.md | 25 +- memory/SPEC.md | 4 +- .../cards/elicitation-backlog--substrate.md | 184 ----- src/db/README.md | 15 +- src/db/row-schemas.ts | 5 + src/db/schema.ts | 34 +- src/graph/README.md | 16 +- src/graph/command-executor.test.ts | 231 +++++- src/graph/command-executor.ts | 354 +++++++- src/graph/index.ts | 13 + src/graph/queries.test.ts | 110 +++ src/graph/queries.ts | 59 ++ src/graph/schema/elicitation-backlog.ts | 34 + src/web/README.md | 2 +- 18 files changed, 1670 insertions(+), 216 deletions(-) create mode 100644 drizzle/0003_outstanding_black_bird.sql create mode 100644 drizzle/meta/0003_snapshot.json delete mode 100644 memory/cards/elicitation-backlog--substrate.md create mode 100644 src/graph/schema/elicitation-backlog.ts diff --git a/drizzle/0003_outstanding_black_bird.sql b/drizzle/0003_outstanding_black_bird.sql new file mode 100644 index 000000000..abdb841be --- /dev/null +++ b/drizzle/0003_outstanding_black_bird.sql @@ -0,0 +1,19 @@ +CREATE TABLE `elicitation_backlog` ( + `id` integer PRIMARY KEY AUTOINCREMENT NOT NULL, + `spec_id` integer NOT NULL, + `kind` text NOT NULL, + `question` text NOT NULL, + `status` text DEFAULT 'open' NOT NULL, + `basis` text DEFAULT 'explicit' NOT NULL, + `readiness_band` text NOT NULL, + `plane_affinity` text, + `lens_affinity` text, + `arose_from_entry_id` integer, + `resolved_by_node_id` integer, + `rationale` text, + `created_at_lsn` integer NOT NULL, + `closed_at_lsn` integer, + FOREIGN KEY (`spec_id`) REFERENCES `specs`(`id`) ON UPDATE no action ON DELETE no action, + FOREIGN KEY (`arose_from_entry_id`) REFERENCES `elicitation_backlog`(`id`) ON UPDATE no action ON DELETE no action, + FOREIGN KEY (`resolved_by_node_id`) REFERENCES `nodes`(`id`) ON UPDATE no action ON DELETE no action +); diff --git a/drizzle/meta/0003_snapshot.json b/drizzle/meta/0003_snapshot.json new file mode 100644 index 000000000..f6bcc8e78 --- /dev/null +++ b/drizzle/meta/0003_snapshot.json @@ -0,0 +1,766 @@ +{ + "version": "6", + "dialect": "sqlite", + "id": "d9b2bf4a-2462-4820-b5ef-4ad514c15a1d", + "prevId": "d52c1722-788f-4bc4-9b5d-4bb832520ac4", + "tables": { + "change_log": { + "name": "change_log", + "columns": { + "spec_id": { + "name": "spec_id", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "lsn": { + "name": "lsn", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "operation": { + "name": "operation", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "payload": { + "name": "payload", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "created_at": { + "name": "created_at", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false, + "default": "(datetime('now'))" + } + }, + "indexes": {}, + "foreignKeys": { + "change_log_spec_id_specs_id_fk": { + "name": "change_log_spec_id_specs_id_fk", + "tableFrom": "change_log", + "tableTo": "specs", + "columnsFrom": [ + "spec_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + } + }, + "compositePrimaryKeys": { + "change_log_spec_lsn_pk": { + "columns": [ + "spec_id", + "lsn" + ], + "name": "change_log_spec_lsn_pk" + } + }, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "edges": { + "name": "edges", + "columns": { + "id": { + "name": "id", + "type": "integer", + "primaryKey": true, + "notNull": true, + "autoincrement": true + }, + "spec_id": { + "name": "spec_id", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "category": { + "name": "category", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "source_id": { + "name": "source_id", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "target_id": { + "name": "target_id", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "stance": { + "name": "stance", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "basis": { + "name": "basis", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false, + "default": "'explicit'" + }, + "rationale": { + "name": "rationale", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "created_at_lsn": { + "name": "created_at_lsn", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "updated_at_lsn": { + "name": "updated_at_lsn", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + } + }, + "indexes": {}, + "foreignKeys": { + "edges_spec_id_specs_id_fk": { + "name": "edges_spec_id_specs_id_fk", + "tableFrom": "edges", + "tableTo": "specs", + "columnsFrom": [ + "spec_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + }, + "edges_source_id_nodes_id_fk": { + "name": "edges_source_id_nodes_id_fk", + "tableFrom": "edges", + "tableTo": "nodes", + "columnsFrom": [ + "source_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + }, + "edges_target_id_nodes_id_fk": { + "name": "edges_target_id_nodes_id_fk", + "tableFrom": "edges", + "tableTo": "nodes", + "columnsFrom": [ + "target_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + } + }, + "compositePrimaryKeys": {}, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "elicitation_backlog": { + "name": "elicitation_backlog", + "columns": { + "id": { + "name": "id", + "type": "integer", + "primaryKey": true, + "notNull": true, + "autoincrement": true + }, + "spec_id": { + "name": "spec_id", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "kind": { + "name": "kind", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "question": { + "name": "question", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "status": { + "name": "status", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false, + "default": "'open'" + }, + "basis": { + "name": "basis", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false, + "default": "'explicit'" + }, + "readiness_band": { + "name": "readiness_band", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "plane_affinity": { + "name": "plane_affinity", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "lens_affinity": { + "name": "lens_affinity", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "arose_from_entry_id": { + "name": "arose_from_entry_id", + "type": "integer", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "resolved_by_node_id": { + "name": "resolved_by_node_id", + "type": "integer", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "rationale": { + "name": "rationale", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "created_at_lsn": { + "name": "created_at_lsn", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "closed_at_lsn": { + "name": "closed_at_lsn", + "type": "integer", + "primaryKey": false, + "notNull": false, + "autoincrement": false + } + }, + "indexes": {}, + "foreignKeys": { + "elicitation_backlog_spec_id_specs_id_fk": { + "name": "elicitation_backlog_spec_id_specs_id_fk", + "tableFrom": "elicitation_backlog", + "tableTo": "specs", + "columnsFrom": [ + "spec_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + }, + "elicitation_backlog_arose_from_entry_id_elicitation_backlog_id_fk": { + "name": "elicitation_backlog_arose_from_entry_id_elicitation_backlog_id_fk", + "tableFrom": "elicitation_backlog", + "tableTo": "elicitation_backlog", + "columnsFrom": [ + "arose_from_entry_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + }, + "elicitation_backlog_resolved_by_node_id_nodes_id_fk": { + "name": "elicitation_backlog_resolved_by_node_id_nodes_id_fk", + "tableFrom": "elicitation_backlog", + "tableTo": "nodes", + "columnsFrom": [ + "resolved_by_node_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + } + }, + "compositePrimaryKeys": {}, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "graph_clock": { + "name": "graph_clock", + "columns": { + "spec_id": { + "name": "spec_id", + "type": "integer", + "primaryKey": true, + "notNull": true, + "autoincrement": false + }, + "lsn": { + "name": "lsn", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false, + "default": 0 + } + }, + "indexes": {}, + "foreignKeys": { + "graph_clock_spec_id_specs_id_fk": { + "name": "graph_clock_spec_id_specs_id_fk", + "tableFrom": "graph_clock", + "tableTo": "specs", + "columnsFrom": [ + "spec_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + } + }, + "compositePrimaryKeys": {}, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "node_kind_counters": { + "name": "node_kind_counters", + "columns": { + "id": { + "name": "id", + "type": "integer", + "primaryKey": true, + "notNull": true, + "autoincrement": true + }, + "spec_id": { + "name": "spec_id", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "plane": { + "name": "plane", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "kind": { + "name": "kind", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "next_ordinal": { + "name": "next_ordinal", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false, + "default": 1 + } + }, + "indexes": { + "node_kind_counters_spec_plane_kind_unique": { + "name": "node_kind_counters_spec_plane_kind_unique", + "columns": [ + "spec_id", + "plane", + "kind" + ], + "isUnique": true + } + }, + "foreignKeys": { + "node_kind_counters_spec_id_specs_id_fk": { + "name": "node_kind_counters_spec_id_specs_id_fk", + "tableFrom": "node_kind_counters", + "tableTo": "specs", + "columnsFrom": [ + "spec_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + } + }, + "compositePrimaryKeys": {}, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "nodes": { + "name": "nodes", + "columns": { + "id": { + "name": "id", + "type": "integer", + "primaryKey": true, + "notNull": true, + "autoincrement": true + }, + "spec_id": { + "name": "spec_id", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "plane": { + "name": "plane", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "kind": { + "name": "kind", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "kind_ordinal": { + "name": "kind_ordinal", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "title": { + "name": "title", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "body": { + "name": "body", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "basis": { + "name": "basis", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false, + "default": "'explicit'" + }, + "source": { + "name": "source", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "detail": { + "name": "detail", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "created_at_lsn": { + "name": "created_at_lsn", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "updated_at_lsn": { + "name": "updated_at_lsn", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + } + }, + "indexes": { + "nodes_spec_plane_kind_ordinal_unique": { + "name": "nodes_spec_plane_kind_ordinal_unique", + "columns": [ + "spec_id", + "plane", + "kind", + "kind_ordinal" + ], + "isUnique": true + } + }, + "foreignKeys": { + "nodes_spec_id_specs_id_fk": { + "name": "nodes_spec_id_specs_id_fk", + "tableFrom": "nodes", + "tableTo": "specs", + "columnsFrom": [ + "spec_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + } + }, + "compositePrimaryKeys": {}, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "reconciliation_need": { + "name": "reconciliation_need", + "columns": { + "id": { + "name": "id", + "type": "integer", + "primaryKey": true, + "notNull": true, + "autoincrement": true + }, + "spec_id": { + "name": "spec_id", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "target_kind": { + "name": "target_kind", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "target_edge_id": { + "name": "target_edge_id", + "type": "integer", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "target_a_id": { + "name": "target_a_id", + "type": "integer", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "target_b_id": { + "name": "target_b_id", + "type": "integer", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "kind": { + "name": "kind", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "status": { + "name": "status", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false, + "default": "'open'" + }, + "reason": { + "name": "reason", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "created_at_lsn": { + "name": "created_at_lsn", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "resolved_at_lsn": { + "name": "resolved_at_lsn", + "type": "integer", + "primaryKey": false, + "notNull": false, + "autoincrement": false + } + }, + "indexes": {}, + "foreignKeys": { + "reconciliation_need_spec_id_specs_id_fk": { + "name": "reconciliation_need_spec_id_specs_id_fk", + "tableFrom": "reconciliation_need", + "tableTo": "specs", + "columnsFrom": [ + "spec_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + }, + "reconciliation_need_target_edge_id_edges_id_fk": { + "name": "reconciliation_need_target_edge_id_edges_id_fk", + "tableFrom": "reconciliation_need", + "tableTo": "edges", + "columnsFrom": [ + "target_edge_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + }, + "reconciliation_need_target_a_id_nodes_id_fk": { + "name": "reconciliation_need_target_a_id_nodes_id_fk", + "tableFrom": "reconciliation_need", + "tableTo": "nodes", + "columnsFrom": [ + "target_a_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + }, + "reconciliation_need_target_b_id_nodes_id_fk": { + "name": "reconciliation_need_target_b_id_nodes_id_fk", + "tableFrom": "reconciliation_need", + "tableTo": "nodes", + "columnsFrom": [ + "target_b_id" + ], + "columnsTo": [ + "id" + ], + "onDelete": "no action", + "onUpdate": "no action" + } + }, + "compositePrimaryKeys": {}, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "specs": { + "name": "specs", + "columns": { + "id": { + "name": "id", + "type": "integer", + "primaryKey": true, + "notNull": true, + "autoincrement": true + }, + "name": { + "name": "name", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "slug": { + "name": "slug", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "readiness_grade": { + "name": "readiness_grade", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false, + "default": "'grounding_onboarding'" + } + }, + "indexes": {}, + "foreignKeys": {}, + "compositePrimaryKeys": {}, + "uniqueConstraints": {}, + "checkConstraints": {} + } + }, + "views": {}, + "enums": {}, + "_meta": { + "schemas": {}, + "tables": {}, + "columns": {} + }, + "internal": { + "indexes": {} + } +} \ No newline at end of file diff --git a/drizzle/meta/_journal.json b/drizzle/meta/_journal.json index 13c7dcac5..019f38df1 100644 --- a/drizzle/meta/_journal.json +++ b/drizzle/meta/_journal.json @@ -22,6 +22,13 @@ "when": 1780668000000, "tag": "0002_spec_scoped_graph_clock", "breakpoints": true + }, + { + "idx": 3, + "version": "6", + "when": 1780904720280, + "tag": "0003_outstanding_black_bird", + "breakpoints": true } ] } \ No newline at end of file diff --git a/memory/CROSS_CUT_PLAN.md b/memory/CROSS_CUT_PLAN.md index 1f9545c24..27f430fdf 100644 --- a/memory/CROSS_CUT_PLAN.md +++ b/memory/CROSS_CUT_PLAN.md @@ -117,7 +117,7 @@ DoD: every ● row is `have` or `built`. | goals / strategies / lenses scaffolding + legal-tuple gating | have | ● | — | — | `.pi/agents/state.ts` | | goal/strategy/lens **content depth** | partial | ● | earned | card `memory/cards/crosscut-know--resource-body-depth.md` | scaffolding present, bodies thin | | `freestyle` strategy | built | ● | — | done — pin-only strategy (8de7f166) | AUTO-excluded, no added authority; D66-L | -| "what to ask next" driver | spec | ● | proving | PLAN frontier `elicitation-backlog` | substrate tracer promoted; per-turn driver remains a follow-on after the flat-table proof lands | +| "what to ask next" driver | partial | ● | proving | unscoped follow-on | flat-table substrate landed via FE-823; live per-turn driver + capture-reflection remain follow-on work | ### Seam 3b — KNOW / mechanics (methods) @@ -268,9 +268,9 @@ order is coverage-driven: close ● ledger rows seam by seam. `elicitation_backlog`-driven "what to ask next" (D65-L); goal/strategy/lens/method body depth; exchange-tool `.description()` / `promptGuidelines` fix (**built** — drift correction; all 7 exchange tools already carry both). Skill-commands (Q6) stay deferred. **Scoped:** - `memory/cards/elicitation-backlog--substrate.md` (D65-L substrate tracer; promoted to the active - PLAN frontier `elicitation-backlog`; the per-turn driver + capture-reflection stay an unscoped follow-on) and - `memory/cards/crosscut-know--resource-body-depth.md` (the goal/strategy/lens/method body pass). + FE-823 landed the D65-L substrate tracer (flat table, `createSpec` seed, command/query seam); + the live per-turn driver + capture-reflection remain an unscoped follow-on, and + `memory/cards/crosscut-know--resource-body-depth.md` still holds the goal/strategy/lens/method body pass. 5. **Spec reconcile** — promote the D40-L/D59-L one-line refinements (on confirmation), land Q1 negative-query touch, fold D65-L/D66-L outcomes into SPEC/PLAN. diff --git a/memory/PLAN.md b/memory/PLAN.md index 94c23f914..226452e7c 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -27,7 +27,7 @@ All delivery frontiers must also continue materializing the locked source topolo The multi-spec workspace model is now explicit: a workspace is the cwd; multiple specs may coexist under it; each session binds to exactly one spec; each POC spec owns its own intent graph; cross-spec claim sharing/adoption is deferred (D11-L, D21-L, D61-L). Delivery work must target an explicit selected/current spec and must not accidentally recreate a workspace-global graph. -Planning is currently carrying two shapes at once: canonical frontier sequencing in this file, and a temporary elicitor capability ledger in `memory/CROSS_CUT_PLAN.md`. The authority split must stay hard: `PLAN.md` owns frontier ids, ordering, and dependency judgments; `CROSS_CUT_PLAN.md` only inventories the temporary READ/WRITE/KNOW row surface. The current planning move is therefore to promote any cross-cut row that has escaped row-sized work back into a real frontier. `elicitation-backlog` is the first such promotion; the remaining prompt-resource body-depth pass stays temporary cross-cut completion work. +Planning is currently carrying two shapes at once: canonical frontier sequencing in this file, and a temporary elicitor capability ledger in `memory/CROSS_CUT_PLAN.md`. The authority split must stay hard: `PLAN.md` owns frontier ids, ordering, and dependency judgments; `CROSS_CUT_PLAN.md` only inventories the temporary READ/WRITE/KNOW row surface. The current planning move is therefore to promote any cross-cut row that has escaped row-sized work back into a real frontier. `elicitation-backlog` was the first such promotion and is now landed; the remaining prompt-resource body-depth pass stays temporary cross-cut completion work. After the current elicitor work, the strongest follow-on coverage frontier is `graph-observed-shapes`: decide the observed-shape inventory per consumer, then align graph/RPC/web to it. `runtime-affordances-and-legality` remains the next likely coverage frontier behind that. Exchange/capture breadth is explicitly deferred until its surviving inventory is honest enough to enumerate without recreating the deleted stub surface. @@ -35,14 +35,13 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g ### Active -1. `elicitation-backlog` — proving frontier promoted out of the temporary elicitor coverage ledger; current build target is the substrate tracer in `memory/cards/elicitation-backlog--substrate.md`, while prompt-resource body depth remains temporary cross-cut completion work. +1. `minimal-authority-shell` — now the next delivery-safety frontier after the elicitation-backlog substrate landed; prompt-resource body depth remains temporary cross-cut completion work outside `PLAN.md`. ### Next -1. `minimal-authority-shell` — still the next delivery-safety frontier once the current elicitation substrate lands. -2. `poc-live-ship-gate` — final fresh-cwd runbook remains the delivery gate, but its prepared live-mention-autocomplete slice is currently parked off the critical path. -3. `graph-observed-shapes` — next coverage frontier candidate: decide the observed-shape inventory per consumer, then align graph/RPC/web to it. -4. `runtime-affordances-and-legality` — follow-on coverage frontier for shared posture legality/default surfaces once graph observed shapes stop dominating. +1. `poc-live-ship-gate` — final fresh-cwd runbook remains the delivery gate, but its prepared live-mention-autocomplete slice is currently parked off the critical path. +2. `graph-observed-shapes` — next coverage frontier candidate: decide the observed-shape inventory per consumer, then align graph/RPC/web to it. +3. `runtime-affordances-and-legality` — follow-on coverage frontier for shared posture legality/default surfaces once graph observed shapes stop dominating. ### Parallel / Low-conflict @@ -92,9 +91,9 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g ### elicitation-backlog - **Name:** Elicitation backlog substrate and agenda read-back -- **Linear:** unassigned (promoted from the temporary elicitor cross-cut; no dedicated tracker yet) +- **Linear:** [FE-823](https://linear.app/hash/issue/FE-823/elicitation-backlog-substrate-and-agenda-read-back) - **Kind:** structural / bounded feature -- **Status:** active +- **Status:** done - **Certainty:** proving - **Retires:** A24-L — test whether a flat prospective register is sufficient before any plane/pointer promotion. - **Lights up:** `createSpec` seed → `CommandExecutor` backlog mutation → per-spec read-back on the real graph boundary. @@ -110,7 +109,7 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g - **Cross-cutting obligations:** Preserve D4-L/D20-L command boundary, D16-L/A4-L one `{specId, lsn}` mutation clock, D63-L basis-as-provenance-directness, D52-L graph-owned table + read, and D65-L flat-table-only modeling — no graph node/plane and no unknown→unknown edges. - **Traceability:** D4-L, D8-L, D16-L, D20-L, D52-L, D63-L, D64-L, D65-L / A24-L. - **Design docs:** `memory/SPEC.md` D65-L; `docs/design/GRAPH_MODEL.md`. -- **Current execution pointer:** `memory/cards/elicitation-backlog--substrate.md`; the remaining prompt-resource body pass stays in `memory/CROSS_CUT_PLAN.md` as temporary coverage completion work. +- **Current execution pointer:** Done 2026-06-08 on FE-823. Materialized `elicitation_backlog` as a flat table plus generated migration, seeded grounding questions at `createSpec`, routed create/close mutations through `CommandExecutor` on the shared spec-local LSN/change-log seam, and added graph-owned per-spec read-back. The remaining prompt-resource body pass stays in `memory/CROSS_CUT_PLAN.md` as temporary coverage completion work; the live per-turn driver remains a follow-on, not frontier completion debt. ### minimal-authority-shell @@ -275,6 +274,8 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g - **Design docs:** `.fixtures/seeds/bilal-port/README.md`; `docs/design/GRAPH_MODEL.md`; `docs/praxis/manual-testing.md`. ## Recently Completed +- 2026-06-08 `elicitation-backlog` (FE-823) — Done: materialized `elicitation_backlog` as a flat spec-scoped table with generated migration, seeded the grounding agenda at `createSpec`, routed create/close entry mutations through `CommandExecutor` on the shared `{specId, lsn}` / `change_log` boundary, and added graph-owned per-spec open-entry read-back. Reconciled D65-L/A24-L and updated graph/db topology docs. Verified: `src/graph/command-executor.test.ts`, `src/graph/queries.test.ts`, and `npm run verify`. + - 2026-06-06 `project-graph-review-cycle` (FE-809) — Done: `project-graph` now has active review tools at commitment readiness, real agent proposal generation reaches `present_review_set`, approval goes through public `session.submitExchangeResponse`, `CommandExecutor.acceptReviewSet` commits the exact reviewed batch with `basis: explicit`, and graph/session invalidations publish with `{specId, lsn}`. Verified: `src/.pi/agents/state.test.ts`, `src/.pi/__tests__/prompting.test.ts`, `src/probes/project-graph-review-cycle-proof.test.ts`, and real run `.fixtures/runs/project-graph-review-cycle/2026-06-06-project-graph-review-cycle/`. - 2026-06-06 `topology-readmes-and-boundaries` — Done: root product entrypoints moved to `app/`/`workspace/`/`scripts`; reusable graph/session/exchanges/workspace projection helpers moved to `projections/`; reusable markdown/text renderers moved to `renderers/`; `src/projections/topology-boundaries.test.ts` now guards the projection/renderer adapter boundary; and D40-L runtime-state policy now shares `elicit-read-only` tool-policy definitions from `projections/session/runtime-policy.ts` while `.pi/extensions/runtime` remains the Pi tool adapter. Verified: targeted topology/runtime tests and `npm run verify`. @@ -292,8 +293,8 @@ nodes: graph-tool-resilience [done · P0] materialized graph write contract and broadened A14 proof capture-response-to-graph [done · P0] structured answer -> graph truth -> observer update project-graph-review-cycle [done · P1] real project-graph review-set approval loop - elicitation-backlog [active · proving] materialize D65-L prospective agenda substrate and read-back - minimal-authority-shell [next · P1] thin safety posture for current POC paths + elicitation-backlog [done · proving] materialized D65-L prospective agenda substrate and read-back + minimal-authority-shell [active · P1] thin safety posture for current POC paths poc-live-ship-gate [next · P1] final fresh-cwd composed product runbook graph-observed-shapes [next · proving] decide consumer-specific observed-shape inventory, then align graph/RPC/web runtime-affordances-and-legality [next · proving] keep posture legality/default surfaces shared across transports @@ -326,7 +327,7 @@ horizon: geolog-and-petri-execution notes: - - `elicitation-backlog` is the promoted D65-L row from `memory/CROSS_CUT_PLAN.md`; the remaining temporary cross-cut work is `memory/cards/crosscut-know--resource-body-depth.md`. + - `elicitation-backlog` was the promoted D65-L row from `memory/CROSS_CUT_PLAN.md`; the remaining temporary cross-cut work is `memory/cards/crosscut-know--resource-body-depth.md`. - Completed prerequisites: `agents-composition-layer` supplies runtime prompt/resource posture, and `live-graph-observer` supplies the read-only web observer path expected by `capture-response-to-graph` and `poc-live-ship-gate`. - `graph-observed-shapes` is intentionally consumer-specific: do not assume every agent read shape belongs on the web observer. - `exchanges-and-generalized-capture` stays deferred until the surviving inventory is honest enough to close; do not regrow deleted `capture-*` symmetry in the meantime. diff --git a/memory/SPEC.md b/memory/SPEC.md index ea829d757..17243aa4c 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -118,7 +118,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | A20-L | The chosen Drizzle line and row-schema derivation path can be settled during the prep envelope without forcing later M4 rework: Brunch can prove migrations, SQLite fidelity, monotonic counter allocation, change-log writes, and runtime-schema derivation on one representative persistence slice before CRUD proper starts. | high | **validated** | D16-L, D41-L | **Validated by A20-L spike (2026-06-01).** Stack: `drizzle-orm@0.45.2` + `drizzle-kit@0.31.10` + `better-sqlite3@12.8.0` + `drizzle-typebox@0.3.3` + `@sinclair/typebox@0.34.14`. Proved: (1) `drizzle-typebox` derives valid TypeBox insert/select schemas from Drizzle tables; `Value.Check` validates/rejects correctly. (2) Batch `commitGraph`-shaped transaction (multi-node → intra-batch ref resolution → multi-edge → LSN allocation → change-log append) works atomically; full rollback on FK violation or domain-validation throw. (3) `update().returning()` works for atomic monotonic counter increment; `insert().returning()` gives auto-increment IDs for ref resolution; JSON detail column round-trips cleanly. (4) Pi tool parameters (`typebox` v1.x) and Drizzle row schemas (`@sinclair/typebox` v0.34 via `drizzle-typebox`) serve different roles and never cross — shared enum `const` arrays bridge both. | | A21-L | The POC can treat coherence as a bounded product verdict over structural legality plus explicitly detected contradictions, gaps, and unresolved reconciliation needs, without solving a general theory of “spec coherence.” | low | open | D8-L | M8 must sharpen the coherence rubric before implementation: known-bad adversarial briefs should show what counts as incoherent, what is merely immature/underspecified, and what should become a reconciliation need. | | A22-L | The elicitor can perform synchronous post-exchange capture well enough for the POC: high-confidence extractive facts and readiness-grade updates can be committed immediately, while low-confidence implications can be kept out of graph truth and used as disambiguation material. | medium | partially validated | D18-L, D26-L, D45-L, I30-L | 2026-06-05 `capture-response-to-graph` validated the product wiring for narrow labeled text facts (`Goal:`, `Context:`, `Constraint:`, `Criterion:`) on `session.submitExchangeResponse`. 2026-06-07 generalized the same explicit-text capture core onto `session.submitMessage`: ordinary labeled user text now appends to transcript truth, commits through `graph/capture` → `CommandExecutor.commitGraph({basis: explicit})`, targets the transcript binding's spec, and publishes graph invalidations; explicit interruptions are transcript-visible but do not capture or silently answer a pending exchange. Broader LLM capture quality and readiness-grade updates remain fitness evidence. | -| A24-L | A flat `elicitation_backlog` table (prospective memory) is sufficient to drive elicitor questioning and seed grounding without graph structure — no `unknown` plane/node and no unknown→unknown edges; apparent dependency among open questions is mediated by the claims their resolution produces. | medium | open | D65-L | The seeded grounding loop plus capture-reflection across elicitation fixtures; if genuine unknown→unknown dependency or rich traversal emerges, promote the table to a plane (rows→nodes, FK pointers→edges). | +| A24-L | A flat `elicitation_backlog` table (prospective memory) is sufficient to drive elicitor questioning and seed grounding without graph structure — no `unknown` plane/node and no unknown→unknown edges; apparent dependency among open questions is mediated by the claims their resolution produces. | medium | partially validated | D65-L | 2026-06-08 FE-823 materialized the flat table, `createSpec` seed set, `CommandExecutor` create/close mutations, and graph-owned per-spec read-back on the real LSN/change-log seam. Remaining proof is the live per-turn driver plus capture-reflection across elicitation fixtures; if genuine unknown→unknown dependency or rich traversal emerges, promote the table to a plane (rows→nodes, FK pointers→edges). | ### Active Decisions @@ -148,7 +148,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c - **D62-L — Graph nodes have stable spec-scoped human reference codes projected from stored `kind_ordinal`, separate from integer storage IDs.** `NodeId` remains the SQLite integer primary key/FK used internally. The database stores `kind` and `kind_ordinal`; user/agent-facing handles such as `G1`, `CON2`, `R3`, `CR4`, `VM1`, or `SL2` are projection strings formed by a hard-coded presentation lookup from `kind` to a 1–3 capital-letter label plus `kind_ordinal`. The rendered code string is not a graph column. Labels are unique across all node kinds so `#`-mentions can parse by longest-prefix match, then resolve to `(kind, kind_ordinal)` and finally to `NodeId`. `kind_ordinal` is monotonic per `(spec_id, plane, kind)`, allocated by the `CommandExecutor` in the same transaction as node creation from a counter row (`node_kind_counters` or equivalent), not by `MAX(kind_ordinal)+1`; ordinals are never reused after deletion or supersession. DB constraints must make `(spec_id, plane, kind, kind_ordinal)` unique; there is no `(spec_id, code)` uniqueness constraint because `code` is not stored. Context renders and prompt contexts should use projected codes as primary handles and reserve raw integer IDs for internal diagnostics/adapters. Depends on: D14-L, D16-L, D20-L, D54-L, D56-L, D61-L. Supersedes: the string-`NodeId` examples in earlier GRAPH_MODEL text and the previous app's application-only `MAX(kind_ordinal)+1` allocation pattern. - **D63-L — Graph `basis` records item-level approval strength, not the mutation pathway.** Accepted nodes and edges use `basis ∈ explicit | implicit`. `explicit` means the user directly stated the graph item or approved the exact node/edge in a review set; `implicit` means the user accepted a concept/proposal and the agent materialized specific graph items to match it without per-item review (the `propose-graph` direct-commit path). The mutation pathway lives in `change_log.operation` and payload (`commit_graph`, `accept_review_set`, post-exchange capture, etc.), while epistemic attribution lives in `Node.source` and proposal UI metadata may still carry `epistemic_status`. Low-confidence inferred material is still not graph truth; it remains in preface/capture analysis/review drafts/reconciliation needs until clarified or accepted. More abstractly, `basis` is a *provenance-directness* marker — directly from the user (`explicit`) versus agent-materialized from user input (`implicit`) — of which item-level approval strength is the claim-flavored reading; this lets the same `explicit | implicit` distinction apply to non-claim registers such as the elicitation backlog (user-raised vs agent-inferred, D65-L). Depends on: D26-L, D27-L, D53-L, D54-L, D55-L. Supersedes: `basis = accepted_review_set` as a persisted graph enum value and any interpretation of `basis` as a provenance/path field. - **D64-L — Readiness bands are non-exclusive derived node-kind groupings used for elicitor goals, context filters, and grade rubrics; they are not structural legality gates.** Bands are `grounding`, `elicitation`, and `commitment`. A node kind may belong to multiple bands (for example `constraint` can contribute to grounding when it is the constraint anchor and to elicitation when it bounds solution space). Bands guide what the elicitor is trying to complete at a given `readiness_grade`, what graph filters and rendered context can show, and what evidence a readiness validator considers. The `CommandExecutor` must not reject a clear `requirement`, `criterion`, `check`, design node, or other later-band kind merely because the spec is at an earlier grade; readiness controls objectives and unlocks, not what graph truth may contain. Depends on: D45-L, D56-L, D57-L, D59-L, D60-L. Supersedes: treating the intent `basic | structural | reasoning` category as the readiness taxonomy or treating readiness as a per-kind creation whitelist. -- **D65-L — The elicitation backlog is a prospective process-agenda register (the elicitor's "prospective memory"), distinct from both reconciliation needs and graph truth.** The single term `unknown` conflated two concepts with different ontological status and resolution mechanism: (a) a *process gap* — something the user has not answered yet, knowable now by asking — and (b) a *domain gap* — something nobody knows and cannot economically find out now (the deferred `risk` node, Future Direction §Vocabulary evolution). Only (a) drives elicitation, and it is modelled as an **`elicitation_backlog`** entry, not a graph node. The register is forward-looking but **async and unordered** — the name `elicitation_backlog` (chosen over `agenda`/`need`) signals that entries are logged opportunistically and need not drive the next turn: an entry logged now may only become relevant in a later grade or under a different lens. It is seeded at spec creation with grounding-band questions, read by the elicitor every turn to choose what to ask next, and grown by capture-reflection (each round may spawn new entries). Its resolution produces one of: a **claim** (answered → graph node), a **`risk`** (asked but unknowable → durable spec content, deferred), or **more entries**. It is the *prospective* sibling of the *retrospective* `reconciliation_need` coherence register (D8-L) — two registers, two loops; the elicitation register is the elicitor's per-turn agenda, the reconciliation register is the async reviewer's post-mutation repair queue (D29-L). It is a **flat table, not a graph plane/node**, because its only real relations are filter attributes (plane/lens affinity, D64-L grade-band, `open | closed` status) plus foreign-key pointers (`arose_from`, `resolved_by`); apparent unknown→unknown dependency ("answer B before A") is illusory — it is mediated by the claims that resolving a need produces, which already carry `dependency` edges (D51-L). A table with those FK pointers is a degenerate bipartite graph, forward-compatible with promotion to a plane only if genuine unknown→unknown structure later emerges; this keeps the locked graph-of-claims (D54-L/D56-L/D51-L) untouched and supplies the missing substrate for the "what to ask next" objective and generalized capture. `basis` applies via its provenance-directness reading (D63-L): a user-raised need is `explicit`, an agent-inferred need is `implicit`. Open for scope/slice design: the seed mechanism, whether mutations route through `CommandExecutor` and share the spec-local LSN, and whether the register thins or merely complements the `goal` axis (D59-L). Depends on: D8-L, D45-L, D59-L, D63-L, D64-L. Supersedes: treating `unknown` as a graph node kind or cross-plane node/plane for driving elicitation. +- **D65-L — The elicitation backlog is a prospective process-agenda register (the elicitor's "prospective memory"), distinct from both reconciliation needs and graph truth.** The single term `unknown` conflated two concepts with different ontological status and resolution mechanism: (a) a *process gap* — something the user has not answered yet, knowable now by asking — and (b) a *domain gap* — something nobody knows and cannot economically find out now (the deferred `risk` node, Future Direction §Vocabulary evolution). Only (a) drives elicitation, and it is modelled as an **`elicitation_backlog`** entry, not a graph node. The register is forward-looking but **async and unordered** — the name `elicitation_backlog` (chosen over `agenda`/`need`) signals that entries are logged opportunistically and need not drive the next turn: an entry logged now may only become relevant in a later grade or under a different lens. It is seeded at spec creation with grounding-band questions, read by the elicitor every turn to choose what to ask next, and grown by capture-reflection (each round may spawn new entries). Its resolution produces one of: a **claim** (answered → graph node), a **`risk`** (asked but unknowable → durable spec content, deferred), or **more entries**. It is the *prospective* sibling of the *retrospective* `reconciliation_need` coherence register (D8-L) — two registers, two loops; the elicitation register is the elicitor's per-turn agenda, the reconciliation register is the async reviewer's post-mutation repair queue (D29-L). It is a **flat table, not a graph plane/node**, because its only real relations are filter attributes (plane/lens affinity, D64-L grade-band, `open | closed` status) plus foreign-key pointers (`arose_from`, `resolved_by`); apparent unknown→unknown dependency ("answer B before A") is illusory — it is mediated by the claims that resolving a need produces, which already carry `dependency` edges (D51-L). A table with those FK pointers is a degenerate bipartite graph, forward-compatible with promotion to a plane only if genuine unknown→unknown structure later emerges; this keeps the locked graph-of-claims (D54-L/D56-L/D51-L) untouched and supplies the missing substrate for the "what to ask next" objective and generalized capture. `basis` applies via its provenance-directness reading (D63-L): a user-raised need is `explicit`, an agent-inferred need is `implicit`. The substrate is now settled: the backlog is seeded at `createSpec`, create/close mutations route through `CommandExecutor`, and those writes share the spec-local LSN + `change_log` boundary. Still open: whether the register merely complements or eventually thins the `goal` axis (D59-L), and how the live per-turn driver plus capture-reflection should rank and close entries. Depends on: D8-L, D45-L, D59-L, D63-L, D64-L. Supersedes: treating `unknown` as a graph node kind or cross-plane node/plane for driving elicitation. #### Authority & mutation diff --git a/memory/cards/elicitation-backlog--substrate.md b/memory/cards/elicitation-backlog--substrate.md deleted file mode 100644 index 939be556e..000000000 --- a/memory/cards/elicitation-backlog--substrate.md +++ /dev/null @@ -1,184 +0,0 @@ -# Elicitation-backlog substrate (Seam 3a — "what to ask next" driver) - -Frontier: elicitation-backlog -Status: active -Mode: single -Created: 2026-06-07 - -## Orientation - -- **Containing seam:** the KNOW/orient layer's missing substrate. `CROSS_CUT_PLAN.md` - Seam 3a's open ● *"what to ask next" driver* row points at D65-L `elicitation_backlog` — - a **flat table** (prospective process-agenda), the prospective sibling of the retrospective - `reconciliation_need` register (D8-L). Today the elicitor has no per-turn agenda store. -- **Relevant frontier item:** `elicitation-backlog` in `memory/PLAN.md`. This tracer was - promoted out of the temporary elicitor cross-cut because the D65-L substrate now carries - real frontier weight; the row it closes remains tracked in `memory/CROSS_CUT_PLAN.md`. -- **Volatile state:** the sibling `reconciliation_need` is currently **type-only** — see - [src/graph/schema/reconciliation-need.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/graph/schema/reconciliation-need.ts) - ("Phase 1 lock-and-materialize: type definitions only; Drizzle table + CommandExecutor write - paths land with subsequent M4 slices"). So this slice **materializes the prospective register - first**, justified by its direct POC value (it drives "what to ask next"); mirror the - `reconciliation_need` shape so the retrospective register can later materialize symmetrically. - `createSpec` ([command-executor.ts#L449](file:///Users/lunelson/Code/hashintel/brunch-next/src/graph/command-executor.ts#L449)) - is the seed point — it already runs a transaction allocating a spec-local LSN + `change_log` row. -- **Topology note (post-35eff395):** the `snapshot` architecture noun is retired. The `read | project - | render` split now governs: **domain read/query logic stays in the owning domain** - ([src/graph/queries.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/graph/queries.ts), formerly - `snapshot.ts`); `src/projections/` is reserved for **reusable multi-consumer/multi-source DTOs**. - So the backlog read-back lives in `graph/` (single-owner domain read), **not** `src/projections/` - — add a projection only if a later consumer (RPC/web) actually reuses the shape. -- **Main open risk:** D65-L lists three open scope-design items. This card resolves two and - defers one (below). The load-bearing assumption A24-L (a flat table suffices, no graph plane) - is exactly what landing this frontier tracer tests. - -Posture: **proving** (inherited from frontier `elicitation-backlog`; this is still the former cross-cut Seam 3a D65-L tracer). - -Slice-design decisions made here (resolving D65-L "open for scope/slice design"): - -1. **Mutations route through `CommandExecutor`, sharing the spec-local LSN + `change_log`** — - mirrors D8-L reconciliation needs and preserves the single mutation boundary (D4-L/D20-L). - This is the card's recommended resolution of D65-L's open routing fork; building it ratifies it. -2. **Seed at spec creation** (`createSpec`) with a small grounding-band question set. -3. **Goal-axis relationship (complement vs thin `goal`, D59-L) is DEFERRED** — not decided here; - this slice only stands up the substrate and a read-back, not the goal-layer interaction. - -Frontier-level cross-cutting obligations this slice carries: - -- **D4-L/D20-L:** all backlog mutations route through the command layer and return structured results. -- **D16-L/A4-L:** each mutation allocates exactly one `{specId, lsn}` through the spec's `graph_clock`. -- **D63-L:** `basis` is provenance-directness — a user-raised need is `explicit`, an - agent-inferred need is `implicit`; do not overload it as a mutation-path field. -- **D52-L:** `graph/` owns the table + mutation **and the read** (domain query in `queries.ts`); - `db/` is imported only by `graph/`. A `src/projections/` DTO is added only if a consumer reuses it. -- **D65-L shape lock:** flat table only — FK pointers (`arose_from`, `resolved_by`), filter - attributes (plane/lens affinity, grade band, `open|closed`); **no** graph node/plane, no - unknown→unknown edges. Keep it forward-compatible with promotion to a plane, do not pre-build one. - -### Target Behavior - -A flat `elicitation_backlog` table is materialized through `CommandExecutor`, seeded with -grounding-band questions at spec creation, and read back per spec through the command and domain-read boundary. - -### Boundary Crossings - -```pseudo -→ elicitation_backlog Drizzle table (db/schema.ts) + generated migration -→ graph/schema/elicitation-backlog.ts domain types (mirror reconciliation-need shape) -→ CommandExecutor: create-entry / close-entry mutations (one spec-local LSN + change_log each) -→ createSpec seed hook (grounding-band questions on new spec) -→ domain read (graph/queries.ts): list open backlog entries for a spec -→ SPEC reconciliation (A24-L progress; D65-L routing/seed forks resolved) -``` - -### Risks and Assumptions - -``` -- RISK: materializing the prospective register before the retrospective sibling creates schema asymmetry. - → MITIGATION: mirror the reconciliation_need type/column shape (id, specId, kind/affinity, - target/FK pointers, rationale, createdAtLsn, resolvedAtLsn) so the sibling materializes symmetrically. -- RISK: the seed question set hard-codes content that should be data/config. - → MITIGATION: keep the seed list a single small named constant in graph/ (not scattered); - it is a starting agenda, not a closed vocabulary — entries are mutable through the command path. -- RISK: backlog mutation drifts into a second mutation engine separate from commitGraph. - → MITIGATION: reuse the CommandExecutor transaction/clock/change_log helpers; backlog ops are - new operations on the same boundary, not a parallel writer. -- ASSUMPTION: a flat table (FK pointers + filter attrs) is sufficient to drive elicitor questioning - without a graph plane or unknown→unknown edges. - → IMPACT IF FALSE: if genuine unknown→unknown dependency or rich traversal emerges, the table - promotes to a plane (rows→nodes, FK→edges) — a larger reshape touching the locked graph model. - → VALIDATE: seed→store→read tracer plus later capture-reflection across fixtures; rich - dependency that the FK pointers cannot express is the falsifier. - → [→ memory/SPEC.md §Assumptions A24-L] -- ASSUMPTION: routing backlog mutations through CommandExecutor (sharing the spec-local LSN) is the - right home, not a separate store. - → IMPACT IF FALSE: backlog gets its own clock/audit; rework of the mutation surface. - → VALIDATE: the tracer's change_log + LSN assertions; mirrors the settled D8-L need register. -``` - -### Posture check - -Proving tracer scoring on two axes: - -- **Proof of life:** stands up an entirely new substrate end-to-end — seed at spec creation → - command-layer store → read-back — that no current store provides. -- **Uncertainty:** retires the load-bearing half of A24-L (flat table suffices). The tracer breaks - if the flat shape cannot carry seeded grounding-band agenda items with their FK pointers. - -It deliberately does **not** build the per-turn "what to ask next" prompt injection or -capture-reflection spawning/closing — those depend on what the seeded substrate reveals and are -held back by the anti-speculation gate (see follow-on). - -### Acceptance Criteria - -```pseudo tree -elicitation_backlog substrate -├── table + types -│ ├── ✓ elicitation_backlog table exists with a generated migration and mirrors the reconciliation_need shape -│ └── ✓ domain types enumerate status (open|closed), basis (explicit|implicit), grade band, and FK pointers -├── command-layer mutation -│ ├── ✓ creating an entry allocates one spec-local LSN and one change_log row -│ ├── ✓ closing an entry sets resolved_by / closed_at_lsn and writes one change_log row -│ └── ✓ a malformed entry returns structural_illegal and writes no rows -├── seed at spec creation -│ ├── ✓ createSpec seeds the grounding-band question set for the new spec -│ └── ✓ seeded entries are open, explicit, and scoped to that spec only (sibling specs unaffected) -└── read-back - └── ✓ listing open backlog entries for a spec returns the seeded set with stable fields -``` - -### Verification Approach - -``` -- Inner: CommandExecutor unit tests — create/close mutation, LSN/change_log, structural_illegal, spec scoping. -- Inner: migration/schema test — table present; seed-on-createSpec count and field assertions. -- Middle: domain-read test (graph/queries) — seeded entries read back per spec; sibling-spec isolation. -``` - -### Cross-cutting obligations - -``` -- Reuse the CommandExecutor boundary; no direct db/ writes outside graph/; no second clock/audit. -- Flat table only — no graph node/plane, no unknown→unknown edges (D65-L). -- basis stays provenance-directness (D63-L); seeded grounding questions are explicit. -- Mirror reconciliation_need shape for forward-symmetric materialization. -``` - -### Expected touched paths (tentative) - -```pseudo tree -src/db/ -├── schema.ts ~ (elicitation_backlog table + enum arrays) -└── row-schemas.ts ? -drizzle/ -└── 0003_*.sql + (generated migration) -src/graph/ -├── schema/elicitation-backlog.ts + (domain types; mirror reconciliation-need.ts) -├── command-executor.ts ~ (create/close entry + seed hook in createSpec) -├── command-executor.test.ts ~ -├── command-executor/ -│ └── elicitation-backlog-types.ts +? -├── queries.ts ~ (domain read: list backlog entries per spec) -├── queries.test.ts ~ -└── index.ts ~ -src/projections/ ? (only if a consumer reuses the read shape — not by default) -memory/SPEC.md ~ (A24-L progress; D65-L routing/seed forks resolved) -docs/design/GRAPH_MODEL.md ? (if the need-register section gains a prospective sibling note) -``` - -### Foreseeable follow-on (NOT scoped — anti-speculation gate) - -The per-turn **"what to ask next" driver** — compose-time injection of open backlog entries -into the elicitor turn (D58-L), plus **capture-reflection** that spawns new entries and closes -resolved ones on each exchange/message — is intentionally **not pre-scoped**. Its exact read -shape and capture-reflection wiring would shift based on what the seeded substrate reveals -(entry volume, field usefulness, goal-axis relationship). Scope it after this tracer lands. - -### Traceability - -- **SPEC:** D65-L (the register), A24-L (flat-table assumption — the falsifier), D8-L - (retrospective sibling template), D4-L/D20-L/D16-L (command boundary + LSN), D63-L (basis), - D64-L (grade bands), D52-L (topology). On build, reconcile A24-L progress and resolve the - D65-L routing/seed open items; defer the goal-axis fork. -- **Cross-cut:** advances `CROSS_CUT_PLAN.md` Seam 3a *"what to ask next" driver* ● (substrate - half; behavioral driver remains a follow-on). diff --git a/src/db/README.md b/src/db/README.md index ee3cd5d2a..997bcec18 100644 --- a/src/db/README.md +++ b/src/db/README.md @@ -94,11 +94,16 @@ owned by their boundary. ## Current schema posture -The current graph tables are spec-scoped: `specs`, `nodes`, `edges`, -`node_kind_counters`, `graph_clock`, `change_log`, and -`reconciliation_need`. `graph_clock` is keyed by `spec_id`; `change_log` carries -`spec_id` and is keyed by `(spec_id, lsn)`, so a bare LSN is comparable only -inside one spec. +The current graph and graph-adjacent tables are spec-scoped: `specs`, `nodes`, +`edges`, `node_kind_counters`, `graph_clock`, `change_log`, +`reconciliation_need`, and `elicitation_backlog`. `graph_clock` is keyed by +`spec_id`; `change_log` carries `spec_id` and is keyed by `(spec_id, lsn)`, so +a bare LSN is comparable only inside one spec. + +`elicitation_backlog` is the prospective sibling of `reconciliation_need`: a +flat process-agenda register, not a graph plane or node table. It still lives +here only as storage substrate; graph-owned command/query code continues to own +its semantics. `nodes.kind_ordinal` is persisted as the storage half of the D62-L projected-code contract. `node_kind_counters` owns monotonic per-`(spec_id, plane, kind)` diff --git a/src/db/row-schemas.ts b/src/db/row-schemas.ts index af1cb4a62..0abc61ecd 100644 --- a/src/db/row-schemas.ts +++ b/src/db/row-schemas.ts @@ -13,6 +13,7 @@ import { createInsertSchema, createSelectSchema } from 'drizzle-typebox'; import { changeLog, edges, + elicitationBacklog, graphClock, nodeKindCounters, nodes, @@ -45,3 +46,7 @@ export const selectNodeKindCounterSchema = createSelectSchema(nodeKindCounters); // --- Reconciliation need schemas --- export const insertReconciliationNeedSchema = createInsertSchema(reconciliationNeed); export const selectReconciliationNeedSchema = createSelectSchema(reconciliationNeed); + +// --- Elicitation backlog schemas --- +export const insertElicitationBacklogSchema = createInsertSchema(elicitationBacklog); +export const selectElicitationBacklogSchema = createSelectSchema(elicitationBacklog); diff --git a/src/db/schema.ts b/src/db/schema.ts index a33f3d5a9..6def8eae7 100644 --- a/src/db/schema.ts +++ b/src/db/schema.ts @@ -9,7 +9,14 @@ */ import { sql } from 'drizzle-orm'; -import { integer, primaryKey, sqliteTable, text, uniqueIndex } from 'drizzle-orm/sqlite-core'; +import { + type AnySQLiteColumn, + integer, + primaryKey, + sqliteTable, + text, + uniqueIndex, +} from 'drizzle-orm/sqlite-core'; // --------------------------------------------------------------------------- // Shared enum arrays — the single source for text enum columns, @@ -58,6 +65,12 @@ export const READINESS_GRADES = [ 'planning_ready', ] as const; +export const READINESS_BANDS = ['grounding', 'elicitation', 'commitment'] as const; + +export const LENS_AFFINITIES = ['intent', 'design', 'oracle'] as const; + +export const ELICITATION_BACKLOG_STATUSES = ['open', 'closed'] as const; + // --------------------------------------------------------------------------- // Tables // --------------------------------------------------------------------------- @@ -173,3 +186,22 @@ export const reconciliationNeed = sqliteTable('reconciliation_need', { created_at_lsn: integer().notNull(), resolved_at_lsn: integer(), }); + +export const elicitationBacklog = sqliteTable('elicitation_backlog', { + id: integer().primaryKey({ autoIncrement: true }), + spec_id: integer() + .notNull() + .references(() => specs.id), + kind: text().notNull(), // open taxonomy: grounding anchors today, richer agenda kinds later + question: text().notNull(), + status: text({ enum: ELICITATION_BACKLOG_STATUSES }).notNull().default('open'), + basis: text({ enum: NODE_BASES }).notNull().default('explicit'), + readiness_band: text({ enum: READINESS_BANDS }).notNull(), + plane_affinity: text({ enum: ['intent', 'oracle', 'design', 'plan'] }), + lens_affinity: text({ enum: LENS_AFFINITIES }), + arose_from_entry_id: integer().references((): AnySQLiteColumn => elicitationBacklog.id), + resolved_by_node_id: integer().references(() => nodes.id), + rationale: text(), + created_at_lsn: integer().notNull(), + closed_at_lsn: integer(), +}); diff --git a/src/graph/README.md b/src/graph/README.md index f69301c61..a2a7d4eef 100644 --- a/src/graph/README.md +++ b/src/graph/README.md @@ -8,7 +8,9 @@ SPEC decisions: D4-L, D20-L, D27-L, D51-L, D52-L, D53-L, D54-L, D62-L, D63-L - **CommandExecutor** (`command-executor.ts`) — the single mutation boundary for graph/spec writes. It hides structural validation, transaction mechanics, spec-local LSN allocation, per-kind node ordinal allocation, change-log append, - and structured command results. + and structured command results. It also owns prospective-register writes for + `elicitation_backlog` (`createSpec` seeding plus create/close entry commands), + because the backlog shares the same spec-local LSN and audit boundary. - **commitGraph** — atomic batch mutation for `propose-graph`: one tool call, one transaction, one selected-spec LSN, all-or-nothing. It accepts product @@ -28,16 +30,17 @@ SPEC decisions: D4-L, D20-L, D27-L, D51-L, D52-L, D53-L, D54-L, D62-L, D63-L projection. - **Readers / query functions** (`queries.ts`) — graph reads at multiple detail levels: active-context and graph-truth overview, node - neighborhood, selected-spec graph-code lookup, and open reconciliation needs. - These return typed domain objects or internal ids, not Drizzle rows. + neighborhood, selected-spec graph-code lookup, open reconciliation needs, and + open elicitation-backlog entries. These return typed domain objects or + internal ids, not Drizzle rows. - **Preview harness helpers** (`render-preview.ts`) — deterministic fixture-seed + selected-spec read helpers for render-preview scripts/tests that need real graph data without bypassing the command/read seams. - **Domain schema types** (`schema/`) — `GraphNode`, `GraphEdge`, - `ReconciliationNeed`, kind/category types, per-kind node ordinals, and derived - intent-kind grouping. + `ReconciliationNeed`, `ElicitationBacklogEntry`, kind/category types, + per-kind node ordinals, and derived intent-kind grouping. - **Policy** (`policy/category-policy.ts`) — edge-category semantics such as cascade behavior, reconciliation triggers, and projection effects. @@ -86,6 +89,7 @@ graph/ CommandExecutor command input/result types createSpec + create/close elicitation-backlog entry updateReadinessGrade createNode per-kind node ordinal allocation @@ -114,6 +118,7 @@ graph/ getGraphOverview getNodeNeighborhood resolveGraphNodeCode + getOpenElicitationBacklogEntries getOpenReconciliationNeeds row -> domain mapping @@ -125,6 +130,7 @@ graph/ openWorkspaceCommandExecutor(cwd) schema/ + elicitation-backlog.ts nodes.ts edges.ts reconciliation-need.ts diff --git a/src/graph/command-executor.test.ts b/src/graph/command-executor.test.ts index 984cc426d..55e68cbd7 100644 --- a/src/graph/command-executor.test.ts +++ b/src/graph/command-executor.test.ts @@ -9,7 +9,15 @@ import { eq } from 'drizzle-orm'; import { describe, expect, it, beforeEach } from 'vitest'; import { createDb, type BrunchDb } from '../db/connection.js'; -import { graphClock, changeLog, nodes, nodeKindCounters, reconciliationNeed, specs } from '../db/schema.js'; +import { + changeLog, + elicitationBacklog, + graphClock, + nodeKindCounters, + nodes, + reconciliationNeed, + specs, +} from '../db/schema.js'; import { CommandExecutor } from './command-executor.js'; function createTestDb(): BrunchDb { @@ -397,6 +405,70 @@ describe('CommandExecutor', () => { ).toEqual([{ specId: result.specId, lsn: 1 }]); }); + it('seeds explicit grounding backlog entries for the new spec at create-spec LSN', () => { + const result = executor.createSpec({ name: 'Grounded Spec', slug: 'grounded-spec' }); + expect(result.status).toBe('success'); + if (result.status !== 'success') throw new Error('unreachable'); + + expect( + db + .select({ + kind: elicitationBacklog.kind, + question: elicitationBacklog.question, + status: elicitationBacklog.status, + basis: elicitationBacklog.basis, + readinessBand: elicitationBacklog.readiness_band, + planeAffinity: elicitationBacklog.plane_affinity, + lensAffinity: elicitationBacklog.lens_affinity, + createdAtLsn: elicitationBacklog.created_at_lsn, + }) + .from(elicitationBacklog) + .where(eq(elicitationBacklog.spec_id, result.specId)) + .all(), + ).toEqual([ + { + kind: 'domain_anchor_question', + question: 'What is the thing or domain we are specifying?', + status: 'open', + basis: 'explicit', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + createdAtLsn: result.lsn, + }, + { + kind: 'protagonist_anchor_question', + question: 'Who is this for, or who is most affected by it?', + status: 'open', + basis: 'explicit', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + createdAtLsn: result.lsn, + }, + { + kind: 'pain_anchor_question', + question: 'What problem, pain, or pull is driving this work?', + status: 'open', + basis: 'explicit', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + createdAtLsn: result.lsn, + }, + { + kind: 'constraint_anchor_question', + question: 'What constraints or non-negotiable boundaries already shape it?', + status: 'open', + basis: 'explicit', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + createdAtLsn: result.lsn, + }, + ]); + }); + it('scopes create_spec audit LSNs to the newly created spec', () => { const specA = executor.createSpec({ name: 'Spec A', slug: 'spec-a' }); const specB = executor.createSpec({ name: 'Spec B', slug: 'spec-b' }); @@ -606,6 +678,163 @@ describe('CommandExecutor', () => { }); }); + describe('createElicitationBacklogEntry', () => { + it('creates an open backlog entry and preserves the arose-from pointer', () => { + const parent = executor.createElicitationBacklogEntry({ + specId, + kind: 'domain_anchor_question', + question: 'What is the thing or domain we are specifying?', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + }); + expect(parent.status).toBe('success'); + if (parent.status !== 'success') throw new Error('unreachable'); + + const child = executor.createElicitationBacklogEntry({ + specId, + kind: 'follow_on_question', + question: 'Which user is blocked most by the current version?', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + aroseFromEntryId: parent.id, + }); + + expect(child.status).toBe('success'); + if (child.status !== 'success') throw new Error('unreachable'); + + expect( + db.select().from(elicitationBacklog).where(eq(elicitationBacklog.id, child.id)).get(), + ).toMatchObject({ + spec_id: specId, + kind: 'follow_on_question', + question: 'Which user is blocked most by the current version?', + status: 'open', + basis: 'explicit', + readiness_band: 'grounding', + plane_affinity: 'intent', + lens_affinity: 'intent', + arose_from_entry_id: parent.id, + created_at_lsn: child.lsn, + closed_at_lsn: null, + }); + }); + + it('rejects malformed entries without writing rows or advancing the clock', () => { + const result = executor.createElicitationBacklogEntry({ + specId, + kind: ' ', + question: ' ', + readinessBand: 'later' as never, + }); + + expect(result.status).toBe('structural_illegal'); + if (result.status !== 'structural_illegal') throw new Error('unreachable'); + expect(result.diagnostics.map((diagnostic) => diagnostic.field)).toEqual( + expect.arrayContaining(['kind', 'question', 'readinessBand']), + ); + expect(db.select().from(elicitationBacklog).all()).toEqual([]); + expect(graphClockLsn(db, specId)).toBe(0); + expect(db.select().from(changeLog).all()).toEqual([]); + }); + }); + + describe('closeElicitationBacklogEntry', () => { + it('closes an open entry and records resolvedByNodeId and closedAtLsn', () => { + const entry = executor.createElicitationBacklogEntry({ + specId, + kind: 'domain_anchor_question', + question: 'What is the thing or domain we are specifying?', + readinessBand: 'grounding', + }); + expect(entry.status).toBe('success'); + if (entry.status !== 'success') throw new Error('unreachable'); + + const node = executor.createNode({ + specId, + plane: 'intent', + kind: 'goal', + title: 'Clarified goal', + }); + expect(node.status).toBe('success'); + if (node.status !== 'success') throw new Error('unreachable'); + + const close = executor.closeElicitationBacklogEntry({ + specId, + id: entry.id, + resolvedByNodeId: node.nodeId, + }); + + expect(close.status).toBe('success'); + if (close.status !== 'success') throw new Error('unreachable'); + expect(close.lsn).toBeGreaterThan(node.lsn); + expect( + db + .select({ + status: elicitationBacklog.status, + resolvedByNodeId: elicitationBacklog.resolved_by_node_id, + closedAtLsn: elicitationBacklog.closed_at_lsn, + }) + .from(elicitationBacklog) + .where(eq(elicitationBacklog.id, entry.id)) + .get(), + ).toEqual({ + status: 'closed', + resolvedByNodeId: node.nodeId, + closedAtLsn: close.lsn, + }); + }); + + it('rejects a resolved-by node from another spec', () => { + const entry = executor.createElicitationBacklogEntry({ + specId, + kind: 'domain_anchor_question', + question: 'What is the thing or domain we are specifying?', + readinessBand: 'grounding', + }); + expect(entry.status).toBe('success'); + if (entry.status !== 'success') throw new Error('unreachable'); + + const otherSpec = executor.createSpec({ name: 'Other Spec', slug: 'other-spec' }); + expect(otherSpec.status).toBe('success'); + if (otherSpec.status !== 'success') throw new Error('unreachable'); + const otherNode = executor.createNode({ + specId: otherSpec.specId, + plane: 'intent', + kind: 'goal', + title: 'Sibling goal', + }); + expect(otherNode.status).toBe('success'); + if (otherNode.status !== 'success') throw new Error('unreachable'); + + const close = executor.closeElicitationBacklogEntry({ + specId, + id: entry.id, + resolvedByNodeId: otherNode.nodeId, + }); + + expect(close.status).toBe('structural_illegal'); + if (close.status !== 'structural_illegal') throw new Error('unreachable'); + expect(close.diagnostics[0]!.field).toBe('resolvedByNodeId'); + expect( + db + .select({ + status: elicitationBacklog.status, + resolvedByNodeId: elicitationBacklog.resolved_by_node_id, + closedAtLsn: elicitationBacklog.closed_at_lsn, + }) + .from(elicitationBacklog) + .where(eq(elicitationBacklog.id, entry.id)) + .get(), + ).toEqual({ + status: 'open', + resolvedByNodeId: null, + closedAtLsn: null, + }); + }); + }); + // --- resolveReconciliationNeed --- describe('resolveReconciliationNeed', () => { diff --git a/src/graph/command-executor.ts b/src/graph/command-executor.ts index d97a0bc4b..0ac0d5912 100644 --- a/src/graph/command-executor.ts +++ b/src/graph/command-executor.ts @@ -36,7 +36,8 @@ import type { StructuralIllegal, } from './command-executor/commit-graph-types.js'; import { translateReviewSetPayloadToCommitGraph } from './review-set.js'; -import { type NodeBasis, type NodePlane } from './schema/nodes.js'; +import type { ElicitationBacklogLensAffinity } from './schema/elicitation-backlog.js'; +import { type NodeBasis, type NodePlane, type ReadinessBand } from './schema/nodes.js'; export type ReadinessGrade = (typeof schema.READINESS_GRADES)[number]; export type { @@ -100,6 +101,19 @@ export interface CreateSpecSuccess { readonly lsn: number; } +/** Successful elicitation-backlog creation. */ +export interface ElicitationBacklogSuccess { + readonly status: 'success'; + readonly id: number; + readonly lsn: number; +} + +/** Successful elicitation-backlog close. */ +export interface ElicitationBacklogCloseSuccess { + readonly status: 'success'; + readonly lsn: number; +} + /** Successful spec readiness-grade update. */ export interface UpdateReadinessGradeSuccess { readonly status: 'success'; @@ -122,6 +136,8 @@ export type CommandResult = | ReconNeedSuccess | ReconNeedResolveSuccess | CreateSpecSuccess + | ElicitationBacklogSuccess + | ElicitationBacklogCloseSuccess | UpdateReadinessGradeSuccess | StructuralIllegal | NeedsHuman @@ -140,6 +156,12 @@ export type ResolveReconNeedResult = ReconNeedResolveSuccess | StructuralIllegal /** Result of a createSpec command. */ export type CreateSpecResult = CreateSpecSuccess | StructuralIllegal; +/** Result of a createElicitationBacklogEntry command. */ +export type CreateElicitationBacklogEntryResult = ElicitationBacklogSuccess | StructuralIllegal; + +/** Result of a closeElicitationBacklogEntry command. */ +export type CloseElicitationBacklogEntryResult = ElicitationBacklogCloseSuccess | StructuralIllegal; + /** Result of an updateReadinessGrade command. */ export type UpdateReadinessGradeResult = UpdateReadinessGradeSuccess | StructuralIllegal; @@ -176,6 +198,26 @@ export interface AcceptReviewSetInput { readonly payload: unknown; } +/** Input for creating an elicitation-backlog entry. */ +export interface CreateElicitationBacklogEntryInput { + readonly specId: number; + readonly kind: string; + readonly question: string; + readonly basis?: NodeBasis | undefined; + readonly readinessBand: ReadinessBand; + readonly planeAffinity?: NodePlane | undefined; + readonly lensAffinity?: ElicitationBacklogLensAffinity | undefined; + readonly aroseFromEntryId?: number | undefined; + readonly rationale?: string | undefined; +} + +/** Input for closing an elicitation-backlog entry. */ +export interface CloseElicitationBacklogEntryInput { + readonly specId: number; + readonly id: number; + readonly resolvedByNodeId?: number | undefined; +} + /** Input for creating a single graph node. */ export interface CreateNodeInput { readonly specId: number; @@ -236,6 +278,50 @@ const VALID_KINDS_BY_PLANE: Record = { const KINDS_REQUIRING_DETAIL = new Set(['decision', 'term']); const VALID_READINESS_GRADES = schema.READINESS_GRADES as unknown as string[]; const VALID_NODE_BASES = schema.NODE_BASES as unknown as string[]; +const VALID_READINESS_BANDS = schema.READINESS_BANDS as unknown as string[]; +const VALID_LENS_AFFINITIES = schema.LENS_AFFINITIES as unknown as string[]; + +const SEEDED_ELICITATION_BACKLOG: readonly { + readonly kind: string; + readonly question: string; + readonly basis: NodeBasis; + readonly readinessBand: ReadinessBand; + readonly planeAffinity: NodePlane; + readonly lensAffinity: ElicitationBacklogLensAffinity; +}[] = [ + { + kind: 'domain_anchor_question', + question: 'What is the thing or domain we are specifying?', + basis: 'explicit', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + }, + { + kind: 'protagonist_anchor_question', + question: 'Who is this for, or who is most affected by it?', + basis: 'explicit', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + }, + { + kind: 'pain_anchor_question', + question: 'What problem, pain, or pull is driving this work?', + basis: 'explicit', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + }, + { + kind: 'constraint_anchor_question', + question: 'What constraints or non-negotiable boundaries already shape it?', + basis: 'explicit', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + }, +] as const; function isReadinessGrade(value: string): value is ReadinessGrade { return VALID_READINESS_GRADES.includes(value); @@ -245,6 +331,18 @@ function isNodeBasis(value: string): value is NodeBasis { return VALID_NODE_BASES.includes(value); } +function isNodePlane(value: string): value is NodePlane { + return value === 'intent' || value === 'oracle' || value === 'design' || value === 'plan'; +} + +function isReadinessBand(value: string): value is ReadinessBand { + return VALID_READINESS_BANDS.includes(value); +} + +function isElicitationBacklogLensAffinity(value: string): value is ElicitationBacklogLensAffinity { + return VALID_LENS_AFFINITIES.includes(value); +} + function validateCreateNode(input: CreateNodeInput): Diagnostic[] { const diagnostics: Diagnostic[] = []; @@ -299,6 +397,45 @@ function validateCreateNode(input: CreateNodeInput): Diagnostic[] { return diagnostics; } +function validateCreateElicitationBacklogEntry(input: CreateElicitationBacklogEntryInput): Diagnostic[] { + const diagnostics: Diagnostic[] = []; + + if (!input.kind.trim()) { + diagnostics.push({ field: 'kind', message: 'kind must be non-empty' }); + } + + if (!input.question.trim()) { + diagnostics.push({ field: 'question', message: 'question must be non-empty' }); + } + + if (input.basis !== undefined && !isNodeBasis(input.basis)) { + diagnostics.push({ field: 'basis', message: 'basis must be explicit or implicit' }); + } + + if (!isReadinessBand(input.readinessBand)) { + diagnostics.push({ + field: 'readinessBand', + message: `"${String(input.readinessBand)}" is not a valid readiness band`, + }); + } + + if (input.planeAffinity !== undefined && !isNodePlane(input.planeAffinity)) { + diagnostics.push({ + field: 'planeAffinity', + message: `"${String(input.planeAffinity)}" is not a valid plane affinity`, + }); + } + + if (input.lensAffinity !== undefined && !isElicitationBacklogLensAffinity(input.lensAffinity)) { + diagnostics.push({ + field: 'lensAffinity', + message: `"${String(input.lensAffinity)}" is not a valid lens affinity`, + }); + } + + return diagnostics; +} + function validateDecisionDetail(detail: unknown, diagnostics: Diagnostic[]): void { if (typeof detail !== 'object' || detail === null) { diagnostics.push({ field: 'detail', message: 'must be an object' }); @@ -445,6 +582,23 @@ export class CommandExecutor { return existing.nextOrdinal; } + private seedElicitationBacklog(tx: Pick, specId: number, lsn: number): void { + tx.insert(schema.elicitationBacklog) + .values( + SEEDED_ELICITATION_BACKLOG.map((entry) => ({ + spec_id: specId, + kind: entry.kind, + question: entry.question, + basis: entry.basis, + readiness_band: entry.readinessBand, + plane_affinity: entry.planeAffinity, + lens_affinity: entry.lensAffinity, + created_at_lsn: lsn, + })), + ) + .run(); + } + /** Create a spec row through the command boundary. */ createSpec(input: CreateSpecInput): CreateSpecResult { const diagnostics: Diagnostic[] = []; @@ -471,6 +625,8 @@ export class CommandExecutor { const lsn = this.createInitialSpecClock(tx, row!.id); + this.seedElicitationBacklog(tx, row!.id, lsn); + tx.insert(schema.changeLog) .values({ spec_id: row!.id, @@ -484,6 +640,202 @@ export class CommandExecutor { }); } + /** Create an elicitation-backlog entry through the command boundary. */ + createElicitationBacklogEntry( + input: CreateElicitationBacklogEntryInput, + ): CreateElicitationBacklogEntryResult { + const diagnostics = validateCreateElicitationBacklogEntry(input); + if (diagnostics.length > 0) { + return { status: 'structural_illegal', diagnostics }; + } + + return this.db.transaction((tx) => { + const specRow = tx + .select({ id: schema.specs.id }) + .from(schema.specs) + .where(eq(schema.specs.id, input.specId)) + .get(); + if (!specRow) { + return { + status: 'structural_illegal' as const, + diagnostics: [{ field: 'specId', message: `spec ${input.specId} does not exist` }], + }; + } + + if (input.aroseFromEntryId != null) { + const parent = tx + .select({ id: schema.elicitationBacklog.id, specId: schema.elicitationBacklog.spec_id }) + .from(schema.elicitationBacklog) + .where(eq(schema.elicitationBacklog.id, input.aroseFromEntryId)) + .get(); + + if (!parent) { + return { + status: 'structural_illegal' as const, + diagnostics: [ + { + field: 'aroseFromEntryId', + message: `elicitation backlog entry ${input.aroseFromEntryId} does not exist`, + }, + ], + }; + } + + if (parent.specId !== input.specId) { + return { + status: 'structural_illegal' as const, + diagnostics: [ + { + field: 'aroseFromEntryId', + message: + `elicitation backlog entry ${input.aroseFromEntryId} belongs to a different spec ` + + `(command spec ${input.specId})`, + }, + ], + }; + } + } + + const lsn = this.bumpExistingSpecLsn(tx, input.specId); + + const entry = tx + .insert(schema.elicitationBacklog) + .values({ + spec_id: input.specId, + kind: input.kind.trim(), + question: input.question.trim(), + basis: input.basis ?? 'explicit', + readiness_band: input.readinessBand, + plane_affinity: input.planeAffinity ?? null, + lens_affinity: input.lensAffinity ?? null, + arose_from_entry_id: input.aroseFromEntryId ?? null, + rationale: input.rationale ?? null, + created_at_lsn: lsn, + }) + .returning({ id: schema.elicitationBacklog.id }) + .get(); + + tx.insert(schema.changeLog) + .values({ + spec_id: input.specId, + lsn, + operation: 'create_elicitation_backlog_entry', + payload: JSON.stringify({ + id: entry!.id, + specId: input.specId, + kind: input.kind.trim(), + readinessBand: input.readinessBand, + planeAffinity: input.planeAffinity, + lensAffinity: input.lensAffinity, + ...(input.aroseFromEntryId != null ? { aroseFromEntryId: input.aroseFromEntryId } : {}), + }), + }) + .run(); + + return { status: 'success' as const, id: entry!.id, lsn }; + }); + } + + /** Close an elicitation-backlog entry through the command boundary. */ + closeElicitationBacklogEntry(input: CloseElicitationBacklogEntryInput): CloseElicitationBacklogEntryResult { + return this.db.transaction((tx) => { + const entry = tx + .select() + .from(schema.elicitationBacklog) + .where( + and( + eq(schema.elicitationBacklog.id, input.id), + eq(schema.elicitationBacklog.spec_id, input.specId), + ), + ) + .get(); + + if (!entry) { + return { + status: 'structural_illegal' as const, + diagnostics: [ + { + field: 'id', + message: `elicitation backlog entry ${input.id} does not exist for spec ${input.specId}`, + }, + ], + }; + } + + if (entry.status === 'closed') { + return { + status: 'structural_illegal' as const, + diagnostics: [{ field: 'id', message: `elicitation backlog entry ${input.id} is already closed` }], + }; + } + + if (input.resolvedByNodeId != null) { + const node = tx + .select({ id: schema.nodes.id, specId: schema.nodes.spec_id }) + .from(schema.nodes) + .where(eq(schema.nodes.id, input.resolvedByNodeId)) + .get(); + + if (!node) { + return { + status: 'structural_illegal' as const, + diagnostics: [ + { + field: 'resolvedByNodeId', + message: `node ${input.resolvedByNodeId} does not exist`, + }, + ], + }; + } + + if (node.specId !== input.specId) { + return { + status: 'structural_illegal' as const, + diagnostics: [ + { + field: 'resolvedByNodeId', + message: + `node ${input.resolvedByNodeId} belongs to a different spec ` + + `(command spec ${input.specId})`, + }, + ], + }; + } + } + + const lsn = this.bumpExistingSpecLsn(tx, input.specId); + + tx.update(schema.elicitationBacklog) + .set({ + status: 'closed', + resolved_by_node_id: input.resolvedByNodeId ?? null, + closed_at_lsn: lsn, + }) + .where( + and( + eq(schema.elicitationBacklog.id, input.id), + eq(schema.elicitationBacklog.spec_id, input.specId), + ), + ) + .run(); + + tx.insert(schema.changeLog) + .values({ + spec_id: input.specId, + lsn, + operation: 'close_elicitation_backlog_entry', + payload: JSON.stringify({ + id: input.id, + specId: input.specId, + ...(input.resolvedByNodeId != null ? { resolvedByNodeId: input.resolvedByNodeId } : {}), + }), + }) + .run(); + + return { status: 'success' as const, lsn }; + }); + } + /** Read all spec rows. */ listSpecs(): SpecRecord[] { return this.db.select().from(schema.specs).all().map(specRecordFromRow); diff --git a/src/graph/index.ts b/src/graph/index.ts index 1d0948a43..028fd2b78 100644 --- a/src/graph/index.ts +++ b/src/graph/index.ts @@ -50,6 +50,12 @@ export type { ReconciliationNeedTarget, } from './schema/reconciliation-need.js'; +export type { + ElicitationBacklogEntry, + ElicitationBacklogLensAffinity, + ElicitationBacklogStatus, +} from './schema/elicitation-backlog.js'; + export { CATEGORY_POLICY, type CategoryPolicy, @@ -64,6 +70,7 @@ export { getGraphSliceByReadinessBands, getRelatedNodes, getNodeNeighborhood, + getOpenElicitationBacklogEntries, getOpenReconciliationNeeds, } from './queries.js'; export type { @@ -99,6 +106,10 @@ export type { CommitGraphDryRunResult, CommitGraphResult, CommitGraphSuccess, + CloseElicitationBacklogEntryInput, + CloseElicitationBacklogEntryResult, + CreateElicitationBacklogEntryInput, + CreateElicitationBacklogEntryResult, CreateNodeInput, CreateNodeResult, CreateReconNeedInput, @@ -106,6 +117,8 @@ export type { CreateSpecResult, CreateSpecSuccess, DryRunSuccess, + ElicitationBacklogCloseSuccess, + ElicitationBacklogSuccess, CreateReconNeedResult, Diagnostic, NeedsHuman, diff --git a/src/graph/queries.test.ts b/src/graph/queries.test.ts index c87101109..b047466fb 100644 --- a/src/graph/queries.test.ts +++ b/src/graph/queries.test.ts @@ -15,6 +15,7 @@ import { CommandExecutor } from './command-executor.js'; import { getGraphGaps, getGraphOverview, + getOpenElicitationBacklogEntries, getGraphSliceByKinds, getGraphSliceByReadinessBands, getNodeNeighborhood, @@ -739,3 +740,112 @@ describe('getOpenReconciliationNeeds', () => { expect(needs).toEqual([]); }); }); + +describe('getOpenElicitationBacklogEntries', () => { + let db: BrunchDb; + let executor: CommandExecutor; + let specId: number; + + beforeEach(() => { + db = createTestDb(); + executor = new CommandExecutor(db); + const created = executor.createSpec({ name: 'Test Spec', slug: 'test-spec' }); + expect(created.status).toBe('success'); + if (created.status !== 'success') throw new Error('unreachable'); + specId = created.specId; + }); + + it('returns seeded grounding entries as typed domain objects', () => { + const entries = getOpenElicitationBacklogEntries(db, specId); + + expect( + entries.map((entry) => ({ + kind: entry.kind, + question: entry.question, + status: entry.status, + basis: entry.basis, + readinessBand: entry.readinessBand, + planeAffinity: entry.planeAffinity, + lensAffinity: entry.lensAffinity, + createdAtLsn: entry.createdAtLsn, + })), + ).toEqual([ + { + kind: 'domain_anchor_question', + question: 'What is the thing or domain we are specifying?', + status: 'open', + basis: 'explicit', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + createdAtLsn: 1, + }, + { + kind: 'protagonist_anchor_question', + question: 'Who is this for, or who is most affected by it?', + status: 'open', + basis: 'explicit', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + createdAtLsn: 1, + }, + { + kind: 'pain_anchor_question', + question: 'What problem, pain, or pull is driving this work?', + status: 'open', + basis: 'explicit', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + createdAtLsn: 1, + }, + { + kind: 'constraint_anchor_question', + question: 'What constraints or non-negotiable boundaries already shape it?', + status: 'open', + basis: 'explicit', + readinessBand: 'grounding', + planeAffinity: 'intent', + lensAffinity: 'intent', + createdAtLsn: 1, + }, + ]); + }); + + it('filters to open entries for the requested spec only', () => { + const other = executor.createSpec({ name: 'Other Spec', slug: 'other-spec' }); + expect(other.status).toBe('success'); + if (other.status !== 'success') throw new Error('unreachable'); + + const created = executor.createElicitationBacklogEntry({ + specId, + kind: 'follow_on_question', + question: 'What evidence would prove this is working?', + readinessBand: 'elicitation', + planeAffinity: 'oracle', + lensAffinity: 'oracle', + }); + expect(created.status).toBe('success'); + if (created.status !== 'success') throw new Error('unreachable'); + + const resolvedNode = executor.createNode({ + specId, + plane: 'intent', + kind: 'goal', + title: 'Goal clarified', + }); + expect(resolvedNode.status).toBe('success'); + if (resolvedNode.status !== 'success') throw new Error('unreachable'); + + const close = executor.closeElicitationBacklogEntry({ + specId, + id: created.id, + resolvedByNodeId: resolvedNode.nodeId, + }); + expect(close.status).toBe('success'); + + expect(getOpenElicitationBacklogEntries(db, specId)).toHaveLength(4); + expect(getOpenElicitationBacklogEntries(db, other.specId)).toHaveLength(4); + }); +}); diff --git a/src/graph/queries.ts b/src/graph/queries.ts index dffb7c90b..16b0a5716 100644 --- a/src/graph/queries.ts +++ b/src/graph/queries.ts @@ -15,6 +15,7 @@ import type { BrunchDb } from '../db/connection.js'; import * as schema from '../db/schema.js'; import type { Lsn } from './atoms.js'; import type { GraphEdge } from './schema/edges.js'; +import type { ElicitationBacklogEntry } from './schema/elicitation-backlog.js'; import { NODE_KIND_METADATA, parseGraphNodeCode, @@ -630,3 +631,61 @@ export function getOpenReconciliationNeeds(db: BrunchDb, specId: number): Reconc .all(); return rows.map(rowToReconNeed); } + +function rowToElicitationBacklogEntry( + row: typeof schema.elicitationBacklog.$inferSelect, +): ElicitationBacklogEntry { + type MutableElicitationBacklogEntry = { + -readonly [K in keyof ElicitationBacklogEntry]: ElicitationBacklogEntry[K]; + }; + + const entry: MutableElicitationBacklogEntry = { + id: String(row.id), + specId: row.spec_id, + kind: row.kind, + question: row.question, + status: row.status as ElicitationBacklogEntry['status'], + basis: row.basis as ElicitationBacklogEntry['basis'], + readinessBand: row.readiness_band as ElicitationBacklogEntry['readinessBand'], + createdAtLsn: row.created_at_lsn, + }; + + if (row.plane_affinity != null) { + entry.planeAffinity = row.plane_affinity as NonNullable; + } + + if (row.lens_affinity != null) { + entry.lensAffinity = row.lens_affinity as NonNullable; + } + + if (row.arose_from_entry_id != null) { + entry.aroseFromEntryId = String(row.arose_from_entry_id); + } + + if (row.resolved_by_node_id != null) { + entry.resolvedByNodeId = row.resolved_by_node_id; + } + + if (row.rationale != null) { + entry.rationale = row.rationale; + } + + if (row.closed_at_lsn != null) { + entry.closedAtLsn = row.closed_at_lsn; + } + + return entry; +} + +/** + * Return all open elicitation-backlog entries for a single spec. + */ +export function getOpenElicitationBacklogEntries(db: BrunchDb, specId: number): ElicitationBacklogEntry[] { + const rows = db + .select() + .from(schema.elicitationBacklog) + .where(and(eq(schema.elicitationBacklog.status, 'open'), eq(schema.elicitationBacklog.spec_id, specId))) + .orderBy(schema.elicitationBacklog.created_at_lsn, schema.elicitationBacklog.id) + .all(); + return rows.map(rowToElicitationBacklogEntry); +} diff --git a/src/graph/schema/elicitation-backlog.ts b/src/graph/schema/elicitation-backlog.ts new file mode 100644 index 000000000..96aba4b6d --- /dev/null +++ b/src/graph/schema/elicitation-backlog.ts @@ -0,0 +1,34 @@ +/** + * Elicitation-backlog type definitions. + * + * Canonical reference: memory/SPEC.md D65-L + * + * The elicitation_backlog is the elicitor's prospective process-agenda register: + * open questions the user has not answered yet, seeded at spec creation and grown + * later by capture-reflection. It is a flat table, not a graph node/plane. + */ + +import * as schema from '../../db/schema.js'; +import type { Lsn, NodeId } from '../atoms.js'; +import type { NodeBasis, NodePlane, ReadinessBand } from './nodes.js'; + +export type ElicitationBacklogStatus = (typeof schema.ELICITATION_BACKLOG_STATUSES)[number]; + +export type ElicitationBacklogLensAffinity = (typeof schema.LENS_AFFINITIES)[number]; + +export interface ElicitationBacklogEntry { + readonly id: string; + readonly specId: number; + readonly kind: string; + readonly question: string; + readonly status: ElicitationBacklogStatus; + readonly basis: NodeBasis; + readonly readinessBand: ReadinessBand; + readonly planeAffinity?: NodePlane; + readonly lensAffinity?: ElicitationBacklogLensAffinity; + readonly aroseFromEntryId?: string; + readonly resolvedByNodeId?: NodeId; + readonly rationale?: string; + readonly createdAtLsn: Lsn; + readonly closedAtLsn?: Lsn; +} diff --git a/src/web/README.md b/src/web/README.md index f49d0c746..face26946 100644 --- a/src/web/README.md +++ b/src/web/README.md @@ -200,7 +200,7 @@ web/ graph overview / node-neighborhood route features/ - structured-exchange/ + exchanges/ PendingExchangePanel.tsx response controls for request_answer / request_choice / request_choices / request_review From 06784029a57b0b4f619b9e4e5332ceab5751c35f Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 09:58:01 +0200 Subject: [PATCH 04/17] Deepen prompt resource bodies --- src/.pi/agents/compose.test.ts | 21 ++++++++++++++++++- src/.pi/skills/goals/capture-posture.md | 10 ++++++--- src/.pi/skills/goals/commit-converge.md | 10 ++++++--- src/.pi/skills/goals/elicit-expand.md | 10 ++++++--- src/.pi/skills/goals/grounding-advance.md | 10 ++++++--- src/.pi/skills/lenses/design.md | 10 ++++++--- src/.pi/skills/lenses/intent.md | 10 ++++++--- src/.pi/skills/lenses/oracle.md | 10 ++++++--- src/.pi/skills/methods/commit-graph.md | 10 ++++++--- src/.pi/skills/methods/read-context.md | 10 ++++++--- src/.pi/skills/methods/review-for-gaps.md | 10 ++++++--- src/.pi/skills/strategies/freestyle.md | 13 +++++++----- src/.pi/skills/strategies/project-graph.md | 10 ++++++--- src/.pi/skills/strategies/propose-graph.md | 10 ++++++--- .../strategies/step-wise-decision-tree.md | 10 ++++++--- .../strategies/step-wise-disambiguate.md | 10 ++++++--- 16 files changed, 126 insertions(+), 48 deletions(-) diff --git a/src/.pi/agents/compose.test.ts b/src/.pi/agents/compose.test.ts index 2f520504d..637ffa48f 100644 --- a/src/.pi/agents/compose.test.ts +++ b/src/.pi/agents/compose.test.ts @@ -1,4 +1,4 @@ -import { access } from 'node:fs/promises'; +import { access, readFile } from 'node:fs/promises'; import { dirname, relative } from 'node:path'; import { fileURLToPath } from 'node:url'; @@ -10,6 +10,7 @@ import { } from '../../projections/session/runtime-state.js'; import type { WorkspacePostureState } from '../../session/workspace-session-coordinator.js'; import { composeAgentPrompt } from './compose.js'; +import { GOAL_RESOURCES, LENS_RESOURCES, METHOD_RESOURCES, STRATEGY_RESOURCES } from './state.js'; const projectRoot = dirname(dirname(dirname(dirname(fileURLToPath(import.meta.url))))); @@ -269,4 +270,22 @@ describe('composeAgentPrompt', () => { await expect(access(entry.location)).resolves.toBeUndefined(); } }); + + it('keeps every manifest prompt resource readable and non-trivial', async () => { + const entries = [ + ...Object.values(GOAL_RESOURCES), + ...Object.values(STRATEGY_RESOURCES), + ...Object.values(LENS_RESOURCES), + ...Object.values(METHOD_RESOURCES), + ]; + + for (const entry of entries) { + expect(relative(projectRoot, entry.location).startsWith('src/.pi/skills/')).toBe(true); + const body = await readFile(entry.location, 'utf8'); + expect( + body.length, + `${entry.name} should carry prompt-resource guidance beyond a placeholder`, + ).toBeGreaterThanOrEqual(700); + } + }); }); diff --git a/src/.pi/skills/goals/capture-posture.md b/src/.pi/skills/goals/capture-posture.md index 1706248c9..ccdc2f9c9 100644 --- a/src/.pi/skills/goals/capture-posture.md +++ b/src/.pi/skills/goals/capture-posture.md @@ -1,5 +1,9 @@ -# Goal: capture-posture +# capture-posture -Confirm workspace posture that affects how Brunch should work: certainty, stakes, audience, horizon, migration, and sourcing posture. +Pursue this goal when workspace posture is missing, stale, or contradicted by how the user wants the work done. Your job is to confirm operating constraints such as certainty, stakes, audience, horizon, migration posture, and sourcing posture so later prompts apply the right discipline. -Posture is workspace-scoped product state. Do not write it as spec graph truth unless the user separately frames a claim about the product being specified. +Evidence advances this goal when the user explicitly chooses or corrects posture values, or when they state working constraints that can be confirmed back to them. Ask small confirmation questions: whether compatibility matters, whether the audience is external, whether dependencies should be resisted, or whether the horizon is only the current slice. + +Do not store posture as spec truth, graph truth, or a readiness-grade fact. Do not infer it silently from code style or from your own preference. Capture-posture may influence the runtime header and subsequent behavior, but it never creates requirements, design decisions, or graph nodes by itself. + +This goal is always legal because posture affects every readiness band. It is orthogonal to strategy and lens: you can confirm posture during ordinary chat or structured exchange, but keep the payload about how to work, not what the product specification means. diff --git a/src/.pi/skills/goals/commit-converge.md b/src/.pi/skills/goals/commit-converge.md index a30e610c9..d0a170be5 100644 --- a/src/.pi/skills/goals/commit-converge.md +++ b/src/.pi/skills/goals/commit-converge.md @@ -1,5 +1,9 @@ -# Goal: commit-converge +# commit-converge -Reduce open ambiguity into reviewable commitments. Use this only when the spec has enough readiness for commitment work. +Pursue this goal when the spec is ready to reduce uncertainty into reviewable commitments. Your job is to help the user decide what should become accepted graph truth, not to keep generating more alternatives indefinitely. -Prefer atomic review-set or graph-command paths. Do not partially commit a batch or silently downgrade rejected material into graph truth. +Evidence advances this goal when it produces exact commitments: accepted requirements, constraints, invariants, decisions with rejected alternatives, criteria, examples, checks, or review-set items. Prefer summarizing the candidate commitment, naming the evidence or tradeoff, and asking for approval, changes, or rejection. If using a graph-writing strategy, keep the commitment mechanism honest: direct user statements and approved review-set items are explicit; concept-level materialization through `propose-graph` is implicit. + +Do not claim convergence because an idea sounds plausible. Do not hide rejected alternatives, skip rationale, or turn a concept-level acceptance into item-level explicit approval. If the user is still discovering the problem, route back toward grounding or expansion rather than forcing a commitment ritual. + +This goal maps to the commitment readiness band in D64-L. It may use `project-graph` review sets, `propose-graph` direct commits, or single-exchange confirmation, but the goal is always the same: make the chosen graph state reviewable and durable. diff --git a/src/.pi/skills/goals/elicit-expand.md b/src/.pi/skills/goals/elicit-expand.md index ddccf0f76..b12f52d8b 100644 --- a/src/.pi/skills/goals/elicit-expand.md +++ b/src/.pi/skills/goals/elicit-expand.md @@ -1,5 +1,9 @@ -# Goal: elicit-expand +# elicit-expand -Grow the selected spec while ambiguity is still useful. Surface options, tradeoffs, missing claims, and candidate relationships without forcing premature closure. +Pursue this goal when the selected spec has enough frame for productive exploration, but ambiguity is still useful. Your job is to expand graph truth and elicitation backlog coverage without prematurely locking a design or plan. -Keep new material tied to the selected spec. If a claim is low-confidence, ask or mark it as uncertain rather than committing it as graph truth. +Evidence advances this goal when the user supplies clear candidate claims: requirements, assumptions, constraints, examples, criteria, decisions, or terms that make the spec more complete. It also advances when a question exposes a meaningful fork, gap, or unknown that should stay in the elicitation backlog until answered. Use the active lens to decide which part of the graph to expand, but keep the objective broad: make the specification richer and better distinguished. + +Do not collapse every answer into a commitment. Do not overfit to one implementation path, and do not convert low-confidence implications into graph truth. If the user gives tentative language, preserve that tentativeness as an assumption, backlog entry, or follow-up rather than laundering it into a requirement. + +This goal belongs to the elicitation readiness band in D64-L. It sits between grounding and convergence: ask questions that widen useful option space, then leave durable commitments to `commit-converge` when the user is ready to approve or reject concrete claims. diff --git a/src/.pi/skills/goals/grounding-advance.md b/src/.pi/skills/goals/grounding-advance.md index a570e00fc..245e2dee5 100644 --- a/src/.pi/skills/goals/grounding-advance.md +++ b/src/.pi/skills/goals/grounding-advance.md @@ -1,5 +1,9 @@ -# Goal: grounding-advance +# grounding-advance -Establish the selected spec's basic frame: what it is, who it is for, what problem it answers, and what value would make it worth continuing. +Pursue this goal when the selected spec still needs its basic initiative frame. Your job is to establish enough grounding-band evidence for the user and agent to know what problem the spec answers, who it is for, what value it seeks, and what constraints or context make the effort real. -Stay elicitation-first. Prefer one structured question or contrast at a time. Do not claim the grade is ready to advance without concrete grounding evidence. +Evidence advances this goal when it produces explicit graph-worthy material such as goals, thesis/context statements, canonical terms, or constraint anchors. Later-band facts may still be captured when the user clearly states them, but they do not by themselves prove grounding readiness. Prefer questions that turn vague project talk into named actors, pains, success signals, boundaries, and vocabulary. + +Do not claim the spec is ready merely because the user has supplied requirements or implementation ideas. Do not refuse useful later-band content; capture it honestly while continuing to name the missing frame. Do not invent a posture, audience, or problem statement to fill the gap. + +This goal maps to the lower readiness bands in D64-L: it is about gathering grounding evidence and judging whether the spec can move beyond onboarding. When in doubt, ask for the smallest missing anchor rather than proposing a whole plan. \ No newline at end of file diff --git a/src/.pi/skills/lenses/design.md b/src/.pi/skills/lenses/design.md index 7b011bd66..f3becd685 100644 --- a/src/.pi/skills/lenses/design.md +++ b/src/.pi/skills/lenses/design.md @@ -1,5 +1,9 @@ -# Lens: design +# design -Focus on modules, interfaces, dependency direction, ownership, and boundary pressure. +Use this lens when the spec pressure is about modules, interfaces, ownership, boundaries, or architecture. The plane focus is design: how accepted intent could be realized without prematurely treating implementation detail as product truth. -Design observations can inform the spec, but do not overfit implementation structure into user intent. When a boundary choice is durable, make the decision explicit. +Favor design-plane modules and interfaces, plus realization or boundary edges back to intent claims. Useful questions ask what owns a responsibility, what information crosses a boundary, what should be hidden, what depends on what, and where invalid states should be made unrepresentable. When design uncovers a missing requirement, capture or ask through the intent lens rather than smuggling it in as architecture. + +Interpretation rule: design statements are commitments about shape, dependency direction, and information hiding. Separate a user preference for an implementation from a requirement the implementation serves. If two modules seem to own the same fact, ask which boundary should own mutation or projection. + +Topology-driven next questions: inspect requirements with no realization, modules with unclear interfaces, conflicting boundary edges into the same target, or assumptions that many design nodes depend on. Prefer the question that makes dependency direction or ownership legible. diff --git a/src/.pi/skills/lenses/intent.md b/src/.pi/skills/lenses/intent.md index 09f261538..3039e25fc 100644 --- a/src/.pi/skills/lenses/intent.md +++ b/src/.pi/skills/lenses/intent.md @@ -1,5 +1,9 @@ -# Lens: intent +# intent -Focus on intent-plane claims: goals, thesis, terms, context, requirements, assumptions, constraints, invariants, decisions, criteria, and examples. +Use this lens when the conversation is about what the product/spec means: goals, thesis/context, terms, requirements, assumptions, constraints, invariants, decisions, criteria, and examples. The plane focus is intent; design and oracle material may appear only as support or downstream consequence. -Ask what claim would become clearer or safer if captured. Avoid turning design or oracle observations into intent claims unless the user frames them that way. +Favor graph kinds and edges that clarify claim shape. Goals should derive requirements; assumptions with high fanout should be validated or downgraded; decisions should name rejected alternatives and rationale; constraints should bind a target through boundary edges; examples should illustrate or challenge requirements. Proof/support edges may be noted when evidence is already present, but do not turn verification planning into the center of this lens. + +Interpretation rule: translate user language into the smallest honest intent claim. "Must" often points to requirement, "probably" to assumption, "we picked" to decision, "always true" to invariant, and concrete cases to examples. If the category support is weak, ask a disambiguating question rather than guessing. + +Topology-driven next questions: look for goals with no derived requirements, requirements with no examples, decisions with empty rejected alternatives, and conflicting boundaries. Ask about the most graph-shaping absence first. diff --git a/src/.pi/skills/lenses/oracle.md b/src/.pi/skills/lenses/oracle.md index 395f35f46..b72b3ddbe 100644 --- a/src/.pi/skills/lenses/oracle.md +++ b/src/.pi/skills/lenses/oracle.md @@ -1,5 +1,9 @@ -# Lens: oracle +# oracle -Focus on proof obligations, checks, validation methods, evidence, and blind spots. +Use this lens when the conversation is about how claims will be checked, witnessed, or kept honest. The plane focus is oracle: checks, validation methods, evidence, obligations, criteria, and blind spots. -Prefer observable obligations over generic test labels. Name what a check would prove and what rival behavior it would fail to rule out. +Favor oracle-plane checks and validation methods, criteria/examples in intent when they express expected behavior, and proof/support edges from evidence to claims. Ask what would convince the user, what counterexample would break the claim, what fixture or probe would reveal failure, and which obligation remains unwitnessed. + +Interpretation rule: do not confuse an implementation task with an oracle. A good oracle says what observation would discriminate success from failure. If the user gives a metric, ask what claim it validates; if they give a requirement, ask what evidence would prove it. Treat absence honestly as verification debt, not as a passed check. + +Topology-driven next questions: prioritize requirements with no incoming proof, criteria with no outgoing proof target, high-fanout assumptions with low confidence, and review/proposal material that lacks evidence. Ask the smallest question that turns an unwitnessed claim into a checkable obligation. diff --git a/src/.pi/skills/methods/commit-graph.md b/src/.pi/skills/methods/commit-graph.md index 62d11be53..5bb4719a0 100644 --- a/src/.pi/skills/methods/commit-graph.md +++ b/src/.pi/skills/methods/commit-graph.md @@ -1,5 +1,9 @@ -# Method: commit-graph +# commit-graph -Commit graph truth only through Brunch graph tools backed by CommandExecutor. Treat `structural_illegal`, `policy_blocked`, and `version_conflict` results as meaningful diagnostics. +Use this method only after the active strategy has established a legal commitment path. It is sequencing guidance for graph writes, not permission to treat every answer as durable truth. -Do not mutate storage directly. Do not split one conceptual batch into hidden partial writes. +Before committing, read enough selected-spec context to resolve existing projected codes and avoid duplicate or contradictory nodes. Decide the basis from the commitment path: explicit for direct user statements or approved review-set items, implicit for `propose-graph` concept-level materialization. Prepare one coherent batch of nodes and edges; edges must use the closed graph category set and justify stance where proof/support is used. + +Invoke `commit_graph` when the batch can be validated atomically and the user-facing commitment is already settled. On `structural_illegal`, use diagnostics to repair and retry within the current strategy's budget; do not expose half-written state or manually patch around CommandExecutor. On ambiguity, stop and ask or route through a proposal/review strategy. + +Compose this with `read-context` before the write and `infer-and-capture` when the write follows a completed exchange. Out of scope: direct database writes, raw file edits, invented edge categories, partial acceptance, or using graph commits for workspace posture. diff --git a/src/.pi/skills/methods/read-context.md b/src/.pi/skills/methods/read-context.md index 010136733..8d4367e21 100644 --- a/src/.pi/skills/methods/read-context.md +++ b/src/.pi/skills/methods/read-context.md @@ -1,5 +1,9 @@ -# Method: read-context +# read-context -Use pushed context handles first. When detail matters, call the relevant read tool for selected-spec graph or node context. +Use this method when pushed prompt context is insufficient for the next elicitation move. It tells you how to sequence selected-spec reads without turning context gathering into a separate research project. -Context reads are read-only. Do not treat them as mutation authority or workspace-global truth. +Start from the handles in the runtime prompt: selected spec, readiness grade, active goal/strategy/lens, workspace posture, and any graph overview. Pull more context only when it will change the next question, proposal, capture decision, or graph write. Prefer compact overview for orientation and focused node neighborhoods for a specific claim or projected code. + +Use read-only context tools such as `read_graph` and `read_session_context` where available. Keep graph truth distinct from active-context projections: accepted records are truth, while rendered summaries are orientation. If the user mentions a node code, resolve it through the product read path rather than guessing from memory. + +Compose this before `generate-proposal`, `commit-graph`, and topology-driven lens questions. Out of scope: filesystem exploration unrelated to the selected spec, direct DB inspection, or treating stale prompt context as proof when a fresh graph read is needed. diff --git a/src/.pi/skills/methods/review-for-gaps.md b/src/.pi/skills/methods/review-for-gaps.md index ff9d9633b..4c199f19c 100644 --- a/src/.pi/skills/methods/review-for-gaps.md +++ b/src/.pi/skills/methods/review-for-gaps.md @@ -1,5 +1,9 @@ -# Method: review-for-gaps +# review-for-gaps -Review the current commitment or proposal for missing evidence, contradictions, weak edges, and unresolved decisions. +Use this method to inspect accepted or proposed commitments for missing support, contradictions, and verification debt. It is a review pass over graph meaning, not a license to rewrite the graph by yourself. -Name the gap and the consequence. Do not invent a broad review framework when one concrete missing proof or claim would unblock the next turn. +Sequence the review from the active lens. For intent, look for goals with no requirements, requirements with no examples, assumptions with high fanout, decisions without rejected alternatives, and conflicting boundaries. For design, look for unclear ownership, unbacked realization edges, and dependency direction that contradicts the stated module boundary. For oracle, look for claims without proof, criteria without targets, and obligations without evidence. + +Invoke context reads first, then either ask a single clarifying question, generate a review-set proposal if item-level approval is needed, or record a gap through the product substrate when the current tools expose one. If the gap is merely a question for the user, keep it prospective; if it is a contradiction in accepted graph truth, route it toward reconciliation. + +Compose with `read-context` and, when proposing repairs, `generate-proposal`. Out of scope: inventing new truth to close the gap, adding broad audit frameworks, or silently downgrading accepted commitments. diff --git a/src/.pi/skills/strategies/freestyle.md b/src/.pi/skills/strategies/freestyle.md index f7817624b..f641d8421 100644 --- a/src/.pi/skills/strategies/freestyle.md +++ b/src/.pi/skills/strategies/freestyle.md @@ -1,6 +1,9 @@ -Use `freestyle` only when the user explicitly pins it. +# freestyle -- Let the user drive with ordinary turns instead of forcing an offer-first structured exchange every turn. -- Keep structured exchange tools available when a typed question, option set, or review would sharpen the next step. -- Grow graph truth only through the ordinary-message capture path or other existing Brunch graph write seams; `freestyle` itself adds no authority. -- Do not treat `freestyle` as permission to skip capture discipline: only directly stated, high-confidence facts should become graph truth. +Use this strategy only when explicitly pinned by the user or system; AUTO must not select it. It lets the user drive with ordinary conversational turns while keeping Brunch structured exchanges available when they become useful. + +Turn structure is light: respond to the user's immediate intent, read context when it changes the answer, and ask structured follow-ups only when a typed exchange would reduce ambiguity or support capture. There is no mandatory offer-first ritual and no pending exchange to satisfy, so slash/skill-style user initiative is acceptable. + +Commitment mechanism is ordinary-turn capture. Directly stated, high-confidence facts may be captured with explicit basis through the same generalized capture path as structured responses. Low-confidence implications, guesses, and broad summaries stay out of graph truth unless the user confirms them. + +Available graph operations are context reads and legal capture/commit paths that the current goal and grade permit. Do not treat freestyle as higher authority, a new operational mode, or a bypass around review-set/direct-commit distinctions. It changes interaction style only; goal and lens still decide what the work is about. diff --git a/src/.pi/skills/strategies/project-graph.md b/src/.pi/skills/strategies/project-graph.md index ea22f85e8..edf38c9a1 100644 --- a/src/.pi/skills/strategies/project-graph.md +++ b/src/.pi/skills/strategies/project-graph.md @@ -1,5 +1,9 @@ -# Strategy: project-graph +# project-graph -Generate a review-set proposal from established material. The proposal should be dry-run-valid before it reaches the user. +Use this strategy when graph material should be reviewed item-by-item before becoming truth. Your job is to derive candidate nodes and edges from existing context, dry-run them, and present a review set the user can approve, request changes on, or reject. -Approval commits the whole batch atomically. Request-changes regenerates or narrows the proposal; rejection does not create graph truth. +Turn structure: read the relevant graph context, generate candidate graph material, dry-run it through the review/proposal path, then surface only dry-run-valid material with `present_review_set` and `request_review`. Include enough rationale, grounding/support metadata, and lens labeling for the user to judge the proposal. If the user requests changes, generate a successor proposal rather than patching truth in place. + +Commitment mechanism: D26-L review-set flow. Nothing is durable until review-set approval; approval commits the whole accepted set atomically through `acceptReviewSet` / CommandExecutor, and exact approved items use `basis: explicit`. Partial acceptance is not representable. + +Available graph operations are read context, generate proposal, dry-run validation, and review exchange; do not call `commit_graph` directly as a shortcut. Use the same closed edge category rubric as graph commits, and abstain from proposing edges whose category cannot be justified. diff --git a/src/.pi/skills/strategies/propose-graph.md b/src/.pi/skills/strategies/propose-graph.md index 1ef645f8f..00daf9839 100644 --- a/src/.pi/skills/strategies/propose-graph.md +++ b/src/.pi/skills/strategies/propose-graph.md @@ -1,5 +1,9 @@ -# Strategy: propose-graph +# propose-graph -Offer a concept-level graph proposal after enough context exists. The user accepts or rejects the concept; accepted concepts may be committed through Brunch graph tools. +Use this strategy when the user has accepted a concept-level direction and a coherent new subgraph would be more useful than one more question. Your job is to offer the concept, get user acceptance of that concept, then materialize the graph through Brunch graph tools. -Never bypass CommandExecutor-backed graph tools. If tool diagnostics reject the batch, use the diagnostics to retry or explain the failure. +Turn structure: read selected-spec context, summarize the proposed concept in user language, state the expected graph shape at a high level, and ask for acceptance, changes, or rejection. Once accepted, generate one `commit_graph` batch with nodes and edges that fit the accepted concept. Keep retries internal when structural diagnostics say the batch is illegal. + +Commitment mechanism: D26-L direct commit. The user accepts the concept, not every node and edge, so created graph items use `basis: implicit` under D63-L. Do not present this as item-level explicit approval and do not use review-set approval language. + +Available graph operations: `read_graph` for context and existing projected codes; `commit_graph` for one atomic batch with intra-batch refs and existing-node refs. Category rubric: dependency for prerequisite/blocks, proof for evidence-to-claim with stance, support for weaker argumentative support, realization for implementation/design fulfillment, boundary for constraints on targets, composition for part-whole, association for loose relation, supersession for replacement. diff --git a/src/.pi/skills/strategies/step-wise-decision-tree.md b/src/.pi/skills/strategies/step-wise-decision-tree.md index 6bdf2f404..fa3673939 100644 --- a/src/.pi/skills/strategies/step-wise-decision-tree.md +++ b/src/.pi/skills/strategies/step-wise-decision-tree.md @@ -1,5 +1,9 @@ -# Strategy: step-wise-decision-tree +# step-wise-decision-tree -Ask one structured question, wait for the answer, then choose the next branch. Keep the branch reason visible so the human can correct the direction. +Use this strategy to ask one structured question at a time and branch from the answer. The user should experience a clear local decision tree: one prompt, bounded response shape, then the next question chosen from what their answer made true or still unclear. -Use radio or checkbox choices when the options are known; use freeform when the next distinction is not yet enumerable. +Turn structure: read the active goal and lens, inspect the pushed or pulled context, choose the single highest-value missing item, then present one typed exchange. Prefer `present_question`/`request_answer` for open text and `present_options` with `request_choice` or `request_choices` when the branch set is already known. After the response, capture only high-confidence direct statements and choose the next branch; do not batch a questionnaire. + +Commitment mechanism: this is a single-exchange flow under D26-L. Graph items directly stated by the user may be captured synchronously with explicit basis; uncertain implications become follow-up questions or backlog entries. + +Available graph operations are read context first, then capture/commit only through Brunch graph tools when the answer supplies clear graph truth. Use the strategy README classification guide lightly: "must" suggests requirement, "probably" suggests assumption, "picked Y over Z" suggests decision, and weak support means abstain rather than guess. diff --git a/src/.pi/skills/strategies/step-wise-disambiguate.md b/src/.pi/skills/strategies/step-wise-disambiguate.md index f0c90f785..1989fc047 100644 --- a/src/.pi/skills/strategies/step-wise-disambiguate.md +++ b/src/.pi/skills/strategies/step-wise-disambiguate.md @@ -1,5 +1,9 @@ -# Strategy: step-wise-disambiguate +# step-wise-disambiguate -Collapse ambiguity with contrastive examples. Present two or three plausible meanings and ask which is closer, what differs, or what should be combined. +Use this strategy when several plausible meanings would lead to different graph truth. Your job is to collapse ambiguity with contrastive examples instead of asking the user to define terms in the abstract. -Use this when words or goals feel overloaded. Preserve the user's chosen distinction as the next working vocabulary. +Turn structure: name the ambiguity, offer two or three concrete interpretations, and ask the user which example is closer or what distinction is missing. Each option should differ on one graph-relevant axis: requirement vs constraint, assumption vs decision, goal vs success criterion, design boundary vs implementation preference, or proof vs example. Use `present_options` when the alternatives are crisp; use `present_question` when the user needs to rewrite the distinction. + +Commitment mechanism: this remains a single-exchange flow. The chosen contrast can be captured as explicit graph truth only when the user's answer states or approves the exact claim. Otherwise, use it to refine the next question. + +Available graph operations are context reads, then capture after the answer. Do not call `commit_graph` for a whole generated subgraph in this strategy. For category selection, treat contrastive signal phrases as evidence, not proof: if the user says "we don't care about X," test constraint vs negative example; if they say "we chose Y because," test decision with rejected alternatives. From 55a53da78f0f2a98563c91b3b26d2e9e154fd6a5 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 10:35:01 +0200 Subject: [PATCH 05/17] Scope two parallel worktree streams: graph-observed-shapes + minimal-authority-shell Prepares three mutually write-disjoint streams to launch from a clean base: - graph-observed-shapes--coverage-ledger: ratify the consumer-specific read-shape inventory (tool=6, RPC/web=2 is intentional) + coverage-guard test - minimal-authority-shell--audit-and-guard: pre-audit found most criteria already met (discriminants exist, needs_human unused, elicit blocks bash/edit/write, D34-L command policy present), so an audit + guard slice Records the parallel-stream plan and the src/.pi/agents/state.ts single-writer invariant in PLAN.md, plus Current execution pointers on both frontiers. Amp-Thread-ID: https://ampcode.com/threads/T-019ea2fc-9f12-767d-bd7a-08497f7307fd Co-authored-by: Amp --- memory/PLAN.md | 4 +- .../graph-observed-shapes--coverage-ledger.md | 168 ++++++++++++++++++ ...inimal-authority-shell--audit-and-guard.md | 167 +++++++++++++++++ 3 files changed, 338 insertions(+), 1 deletion(-) create mode 100644 memory/cards/graph-observed-shapes--coverage-ledger.md create mode 100644 memory/cards/minimal-authority-shell--audit-and-guard.md diff --git a/memory/PLAN.md b/memory/PLAN.md index 226452e7c..d8dc51eaf 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -132,6 +132,7 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g - **Cross-cutting obligations:** This is a minimal shell, not full M6. Do not widen into comprehensive RBAC/permissions unless a current POC path needs it. - **Traceability:** R5, R6, R10 / D20-L, D34-L, D40-L / A18-L, A3-L. - **Design docs:** `memory/SPEC.md` D20-L/D34-L/D40-L; `docs/reference/pi-extensions.md`. +- **Current execution pointer:** Scoped 2026-06-08 — active scope file `memory/cards/minimal-authority-shell--audit-and-guard.md`. Pre-audit during scoping found most criteria already met (CommandResult discriminants exist; `needs_human` defined but never produced; elicit already blocks bash/edit/write; D34-L command policy already at `.pi/extensions/commands/policy.ts`), so the slice is an authority-matrix audit + guard test + A18-L residue naming, not a build-out. The card forbids touching `src/.pi/agents/state.ts` so it can run as an independent worktree stream alongside `resource-body-depth` and `graph-observed-shapes`. ### poc-live-ship-gate @@ -179,7 +180,7 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g - **Cross-cutting obligations:** Do not promote all read shapes everywhere. `list_by_kind` / `list_by_band` are plausible web shapes; `related` / `gaps` may remain agent/RPC-only. Keep graph-owned read logic out of `db/`, and keep `src/renderers/` limited to durable LLM/session text rather than arbitrary observer DTOs. - **Traceability:** D33-L, D51-L, D52-L, D60-L, D64-L. - **Design docs:** `src/graph/README.md`; `src/rpc/README.md`; `src/web/README.md`. -- **Current execution pointer:** To author via `ln-scope` as a `Mode: coverage` ledger once the active frontier closes. +- **Current execution pointer:** Scoped 2026-06-08 — active scope file `memory/cards/graph-observed-shapes--coverage-ledger.md` (the coverage-ledger slice: ratify the consumer-specific read-shape inventory + install a coverage-guard test; no transport shape ships in this slice). Any "required but missing" row spawns a separate follow-on alignment card scoped after the ledger is accepted. ### runtime-affordances-and-legality @@ -328,6 +329,7 @@ horizon: notes: - `elicitation-backlog` was the promoted D65-L row from `memory/CROSS_CUT_PLAN.md`; the remaining temporary cross-cut work is `memory/cards/crosscut-know--resource-body-depth.md`. + - Parallel worktree streams (2026-06-08): three mutually write-disjoint streams may run concurrently from a clean committed base — (A) `crosscut-know--resource-body-depth` → `src/.pi/skills/**`; (B) `graph-observed-shapes--coverage-ledger` → `src/graph/README.md` + `rpc`/`web` READMEs + one guard test; (C) `minimal-authority-shell--audit-and-guard` → `src/.pi/extensions/runtime/` + guard test. Invariant: **`src/.pi/agents/state.ts` is a single-writer file** — only one stream may edit it at a time (A may touch manifest descriptions; B and C must not). `poc-live-ship-gate` stays gated behind `minimal-authority-shell` (hard edge); `runtime-affordances-and-legality` and `exchanges-and-generalized-capture` stay parked (shape not yet forced) and are not cold-startable worktree streams. - Completed prerequisites: `agents-composition-layer` supplies runtime prompt/resource posture, and `live-graph-observer` supplies the read-only web observer path expected by `capture-response-to-graph` and `poc-live-ship-gate`. - `graph-observed-shapes` is intentionally consumer-specific: do not assume every agent read shape belongs on the web observer. - `exchanges-and-generalized-capture` stays deferred until the surviving inventory is honest enough to close; do not regrow deleted `capture-*` symmetry in the meantime. diff --git a/memory/cards/graph-observed-shapes--coverage-ledger.md b/memory/cards/graph-observed-shapes--coverage-ledger.md new file mode 100644 index 000000000..6cf564f30 --- /dev/null +++ b/memory/cards/graph-observed-shapes--coverage-ledger.md @@ -0,0 +1,168 @@ +# Graph observed-shape coverage ledger + +Frontier: graph-observed-shapes +Status: active +Mode: single +Created: 2026-06-08 + +## Orientation + +- **Containing seam:** the graph read surface — domain reads in + [src/graph/queries.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/graph/queries.ts) + exposed to three consumers: the Pi `read_graph` tool + ([src/.pi/extensions/graph/index.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/.pi/extensions/graph/index.ts)), + public RPC ([src/rpc/methods/graph.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/rpc/methods/graph.ts)), + and the web observer ([src/web/queries/graph.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/web/queries/graph.ts)). + Spec-scoped reader wiring is in [src/graph/workspace-store.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/graph/workspace-store.ts) (`SpecScopedReaders` / `forSpec`). +- **Relevant frontier item:** `graph-observed-shapes` in + [memory/PLAN.md](file:///Users/lunelson/Code/hashintel/brunch-next/memory/PLAN.md) §Frontier Definitions + (`Status: next`, `Certainty: proving`). Its execution pointer says: author via `ln-scope` as a + coverage ledger once the active frontier closes. This card is that ledger slice. +- **Volatile state:** the read surface is **asymmetric by consumer**. The `read_graph` tool exposes + 6 shapes (`overview`, `neighborhood`, `list_by_kind`, `list_by_band`, `gaps`, `related`); RPC and + web expose only 2 (`overview`, `neighborhood`). Two further graph-owned register reads + (`getOpenReconciliationNeeds`, `getOpenElicitationBacklogEntries`) have **no transport consumer + yet** (tests only; `elicitation_backlog` read-back is the per-turn-driver follow-on from FE-823). +- **Main open risk / insight:** the asymmetry is **probably correct, not a gap**. The frontier's real + job is to *decide and ratify* which shapes each consumer needs — agent/RPC-only shapes are allowed + to stay agent/RPC-only — and to guard that decision so new shapes don't bleed onto the web + accidentally. The risk is treating "tool has 6, web has 2" as a coverage hole and over-promoting. + +Posture: **proving** (inherited from `graph-observed-shapes`). Reshaped to give the decision teeth: +landing this slice *stabilizes the D60-L read-shape ownership seam* (invariants axis) via a durable +ledger + a coverage-guard test, rather than being a pure study/doc step. + +Frontier-level cross-cutting obligations (from the frontier definition): + +- **D60-L:** read-shape ownership stays explicit; each required consumer shape has exactly one + canonical owner (the domain read in `graph/queries.ts`), not adapter-local formatting standing in + for a durable read shape. +- **D33-L:** web is a read-only observer; web adoption of a shape must be deliberate, never accidental + bleed-through from agent/RPC needs. +- **D52-L:** `src/projections/` exists only for reusable multi-consumer DTOs. Single-owner reads stay + in their owning domain. Do not create a graph projection module to host a single-consumer shape. +- Keep graph-owned read logic out of `db/`; keep `renderers/` limited to durable LLM/session text, + not arbitrary observer DTOs. + +### Target Behavior + +A closed observed-shape coverage ledger exists as a durable artifact that classifies every +`src/graph/queries.ts` read shape as required or deferred per consumer with one named canonical owner, +and a guard test asserts each consumer's actual graph-read surface equals its ledger-required set. + +### Boundary Crossings + +``` +→ src/graph/queries.ts (the canonical read shapes — owners) +→ src/graph/README.md (ledger artifact: shape × consumer matrix + owner column) +→ src/rpc/README.md (consumer-subset note pointing at the ledger) +→ src/web/README.md (consumer-subset note pointing at the ledger) +→ a coverage-guard test (asserts actual surfaces == ledger-required sets) +``` + +### Risks and Assumptions + +``` +- RISK: the ledger could be read as a mandate to add the 4 tool-only shapes to RPC/web. + → MITIGATION: the ledger marks list_by_kind/list_by_band as "web-eligible, DEFERRED until a web + feature needs them" and related/gaps as "agent/RPC-only"; no transport shape is added in this + slice. Any "required but missing" row spawns a SEPARATE follow-on alignment card (scoped after + the ledger is accepted, because its scope depends on this card's decisions). +- RISK: a coverage-guard test that hardcodes string lists could rot silently. + → MITIGATION: derive the actual sets from the real surfaces where cheap (read_graph mode union, + web query-keys graph group, RPC graph method names) and compare to the ledger's declared sets, + so adding a real shape without updating the ledger fails the test. +- ASSUMPTION: the current asymmetry (tool 6 / RPC 2 / web 2) is intentional, not a delivery gap. + → IMPACT IF FALSE: if a POC web feature actually needs list_by_kind/list_by_band now, this slice + under-delivers and an alignment card is needed immediately — but that card is cheap and additive + and does not invalidate the ledger. + → VALIDATE: the ledger decision itself; the frontier definition already states list_by_kind/ + list_by_band are "plausible web shapes" (eligible, not yet required) and related/gaps "may + remain agent/RPC-only". + → [→ memory/SPEC.md D60-L read-shape ownership] +``` + +### Posture check + +Proving posture. This slice scores on the **invariants** axis: it locates and stabilizes the +read-shape ownership seam (D60-L) by ratifying the consumer-specific inventory and installing a +regression guard against accidental web/RPC bleed-through. It is reshaped from a pure decision/doc +step into a slice with a failing-then-passing test, so it *tells us something*: it proves the +tool-vs-transport asymmetry is the intended contract. No high-impact assumption is left unretired — +the only assumption (asymmetry is intentional) is the decision this card closes. + +### Acceptance Criteria + +```pseudo tree +observed-shape coverage ledger +├── ledger artifact (src/graph/README.md) +│ ├── ✓ every src/graph/queries.ts read shape appears as a row (8 shapes incl. both register reads) +│ ├── ✓ each row marks required | deferred | n/a for each consumer (tool, RPC, web) +│ ├── ✓ each required shape names exactly one canonical owner (graph/queries.ts function) +│ └── ✓ deferred rows carry a one-line reason (e.g. "web-eligible, await web feature"; +│ "agent/RPC-only"; "agent-internal register read, no transport consumer yet") +├── decisions encoded +│ ├── ✓ overview + neighborhood = required for tool, RPC, and web (already present) +│ ├── ✓ list_by_kind + list_by_band = required tool; web-eligible but DEFERRED; RPC follows web +│ ├── ✓ gaps + related = required tool; agent/RPC-only; NOT web +│ └── ✓ reconciliation_needs + elicitation_backlog = agent-internal; deferred from RPC/web +├── consumer-subset notes +│ ├── ✓ src/rpc/README.md states its graph subset {overview, nodeNeighborhood} + points at the ledger +│ └── ✓ src/web/README.md states its graph subset {overview, nodeNeighborhood} + points at the ledger +└── guard test + ├── ✓ asserts read_graph tool mode set == ledger tool-required set + ├── ✓ asserts RPC graph method set == ledger RPC-required set {overview, nodeNeighborhood} + └── ✓ asserts web graph query-key group == ledger web-required set {overview, nodeNeighborhood} +``` + +### Verification Approach + +``` +- Inner: unit/structural test — the coverage-guard test (derives actual consumer surfaces, compares + to declared ledger-required sets); existing graph query / RPC / web query tests still pass. +- Inner (gate): `npm run verify` (fix → test → build) proves no surface or wiring regressed. +- Middle/Outer: none — no new transport shape ships in this slice, so no observer/probe change is + needed. (A future alignment card, if one is spawned, owns its own middle-tier read-path proof.) +``` + +### Cross-cutting obligations + +``` +- D60-L: one canonical owner per required shape; no adapter-local read shape masquerading as durable. +- D33-L: web stays read-only; no web shape added in this slice; ledger makes web adoption deliberate. +- D52-L: no new src/projections/ module for a single-consumer shape; the only shared DTOs are the + existing GraphOverview / NeighborhoodResult types already imported by web — confirm, don't expand. +- Keep graph read logic out of db/; keep renderers/ for durable text, not observer DTOs. +``` + +### Expected touched paths (tentative) + +```pseudo tree +src/graph/ +├── README.md ~ (ledger artifact: shape × consumer matrix + owner column) +├── observed-shapes-coverage.test.ts + (coverage-guard test) — OR extend an existing graph test +└── queries.ts ? (read-only; touched only if a row needs an owner comment) +src/rpc/README.md ~ (graph consumer-subset note → ledger) +src/web/README.md ~ (graph consumer-subset note → ledger) +``` + +No overlap with the active `crosscut-know--resource-body-depth` builder (`src/.pi/skills/**`) or any +`src/db/**` work. This card writes only to `src/graph/`, `src/rpc/README.md`, `src/web/README.md`. + +## Follow-on note (do NOT pre-scope here) + +If the ledger marks any shape **required but missing** for a transport consumer, that alignment +(graph → RPC → web wiring for that shape) is a separate card scoped *after* this ledger is accepted — +its scope depends on this card's decisions, so per the chain anti-speculation rule it is not +pre-scoped. The expected outcome is that **no transport shape is currently required-but-missing**, so +the frontier likely closes with ratification + guard rather than new wiring. + +### Traceability + +- **SPEC:** D60-L (read-shape ownership), D33-L (web read-only observer), D52-L (projections = + reusable multi-consumer DTOs only), D51-L (graph code projection), D64-L (readiness bands feeding + `list_by_band`). +- **Frontier:** closes the `graph-observed-shapes` "closed enumerated coverage ledger" and + "one canonical owner per required shape" acceptance leaves; ratifies the consumer-specific + asymmetry the frontier was created to make legible. +- **Design docs:** `src/graph/README.md`, `src/rpc/README.md`, `src/web/README.md`. diff --git a/memory/cards/minimal-authority-shell--audit-and-guard.md b/memory/cards/minimal-authority-shell--audit-and-guard.md new file mode 100644 index 000000000..0a0f9ce9d --- /dev/null +++ b/memory/cards/minimal-authority-shell--audit-and-guard.md @@ -0,0 +1,167 @@ +# Minimal POC authority shell — audit and guard + +Frontier: minimal-authority-shell +Status: active +Mode: single +Created: 2026-06-08 + +## Orientation + +- **Containing seam:** the POC authority surface over current graph/session write paths — + `CommandExecutor` result discriminants in + [src/graph/command-executor.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/graph/command-executor.ts), + the `elicit` tool policy in + [src/projections/session/runtime-policy.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/projections/session/runtime-policy.ts) + applied by [src/.pi/extensions/runtime/index.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/.pi/extensions/runtime/index.ts), + the D34-L command containment in + [src/.pi/extensions/commands/policy.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/.pi/extensions/commands/policy.ts), + and the public RPC mutation surfacing in + [src/rpc/methods/session.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/rpc/methods/session.ts). +- **Relevant frontier item:** `minimal-authority-shell` (FE-810) in + [memory/PLAN.md](file:///Users/lunelson/Code/hashintel/brunch-next/memory/PLAN.md) §Frontier Definitions + (`Status: next` / now active, `Kind: hardening`, `Certainty: proving`). Branch to create: + `ln/fe-810-minimal-authority-shell`. +- **Volatile state (pre-audited during scoping — start informed, not cold):** + - The `CommandResult` union **already defines** `success | structural_illegal | needs_human | + policy_blocked | version_conflict`; mutation paths already return `success` / `structural_illegal`. + - `needs_human` is **defined but never produced** by any current path — no `return { status: + 'needs_human' }` exists. So criterion (3) is mostly "confirm it is representable end-to-end and + no path assumes a TUI-only dialog," not a large build. + - `elicit` policy **already blocks** `bash | edit | write` (allow-list `read | grep | find | ls`) + via the `tool_call` and `user_bash` hooks; `setActiveTools` hides the rest. + - D34-L command containment **already exists** at `.pi/extensions/commands/policy.ts`. + - Public RPC mutations (`session.submitExchangeResponse`) **already surface** structured + discriminants (`captured | no_capture | structural_illegal | accepted | request_changes | + rejected`) rather than throwing for expected outcomes. +- **Main open risk:** **over-building.** Most criteria are already met; the real work is an audit + + regression guard + naming the A18-L residue, NOT inventing M6 RBAC, a new authority service, or a + `needs_human` producer that no POC path actually needs. + +Posture: **proving** (inherited from `minimal-authority-shell`). Reshaped to score on the +**invariants** axis: landing this slice locks the "CommandExecutor discriminants are the only graph +mutation outcome surface" invariant with a guard test and ratifies the elicit tool-authority +contract, so accidental future bypass fails a test rather than silently shipping. + +Frontier-level cross-cutting obligations: + +- **D20-L:** `CommandExecutor` result discriminants are the only graph mutation outcome surface for + agent, RPC, and capture writes — no path throws for an expected authority/validation outcome. +- **D34-L:** keep command containment in `.pi/extensions/commands/policy.ts`; do not reintroduce a + branch-only module or treat command-name collisions as allowlisting. +- **D40-L:** tool authority is a pure derivation over the shared projected runtime policy; do not add + a second authority list. **Do not modify `src/.pi/agents/state.ts`** in this slice — import its + `activeToolNamesForPosture` read-only; the manifest/legality file is reserved for other streams. +- **A18-L:** strict interactive built-in suppression remains a Pi upstream/API limit; name it + explicitly as accepted residue, do not pretend to close it. + +### Target Behavior + +The current POC graph/session write and tool-authority paths are proven by a single authority-matrix +guard test to route every mutation outcome through `CommandExecutor` discriminants, block the +identified side-effecting tools in `elicit`, and represent `needs_human` as a structured headless/RPC +result rather than a TUI-only dialog — with the A18-L residue named, not closed. + +### Boundary Crossings + +``` +→ src/graph/command-executor.ts (CommandResult discriminants — the outcome vocabulary) +→ src/projections/session/runtime-policy.ts (elicit allow/block policy — read/confirm) +→ src/.pi/extensions/runtime/index.ts (policy application hooks — read/confirm) +→ src/rpc/methods/session.ts (discriminant → RPC shape mapping; needs_human representable) +→ a new authority-matrix guard test (asserts the four criteria over current POC paths) +``` + +### Risks and Assumptions + +``` +- RISK: the slice balloons into full M6 RBAC / a standalone authority service. + → MITIGATION: acceptance is audit + guard + residue-naming; the frontier explicitly forbids a new + authority service. If the audit finds a genuine missing producer/blocker, fill ONLY that one + concrete gap; anything larger routes back to ln-plan, it does not expand this card. +- RISK: adding a needs_human producer the POC does not actually reach (speculative). + → MITIGATION: only assert needs_human is representable end-to-end (type + RPC/headless mapping + + no TUI-dialog assumption). Do not invent a POC path that produces it unless one already reaches + a human-only action; the audit determines this. +- ASSUMPTION: the elicit block-list (bash/edit/write) is the complete set of "side-effecting tools + identified as unsafe for the POC." + → IMPACT IF FALSE: a side-effecting tool stays callable in elicit; small, additive fix to the + shared policy block-list. + → VALIDATE: the audit enumerates registered tools vs the elicit allow/block sets and asserts no + side-effecting tool is reachable. + → [→ memory/SPEC.md A18-L, D34-L] +``` + +### Posture check + +Proving posture, invariants axis. Landing this slice **locates and locks** the authority seam: the +guard test makes the D20-L "discriminants are the only mutation outcome" and the elicit tool-authority +contract executable, so the next person who adds a bypassing write path or an unguarded +side-effecting tool fails a test. It tells us something concrete — it converts "the POC looks safe" +into "the POC's authority contract is asserted." No high-impact assumption is left unretired; the one +assumption (block-list completeness) is validated by the audit the card performs. + +### Acceptance Criteria + +```pseudo tree +minimal authority shell +├── discriminant surface (D20-L) +│ ├── ✓ every current graph mutation path (agent graph tool, capture write, review accept) +│ │ returns a CommandResult discriminant; none throws for an expected authority/validation outcome +│ └── ✓ RPC/headless maps each discriminant to a structured response shape (no TUI-only assumption) +├── elicit tool authority (D40-L) +│ ├── ✓ elicit blocks every identified side-effecting tool (bash/edit/write) via tool_call + user_bash +│ ├── ✓ no registered side-effecting tool is reachable in elicit (allow-list is complete for the POC) +│ └── ✓ tool authority derives from the shared projected policy only (no second list; state.ts untouched) +├── needs_human representability (criterion 3) +│ ├── ✓ a needs_human CommandResult maps to a structured headless/RPC result, not a thrown TUI dialog +│ └── ✓ if no current POC path produces needs_human, that is recorded as intended (representable, unused) +└── scope discipline + ├── ✓ no new standalone authority service introduced + └── ✓ A18-L strict-built-in-suppression residue is named explicitly, not silently treated as closed +``` + +### Verification Approach + +``` +- Inner: an authority-matrix guard test (new) over current POC paths — asserts discriminant coverage, + elicit block/allow completeness, and needs_human structured representability. Existing + command-executor / runtime-policy / rpc handler tests still pass. +- Inner (gate): `npm run verify` (fix → test → build). +- Outer: manual smoke ONLY if a TUI-visible policy path changes (likely none; this is audit + guard). +``` + +### Cross-cutting obligations + +``` +- D20-L: discriminants are the only mutation outcome surface; no throw for expected outcomes. +- D34-L: command containment stays in .pi/extensions/commands/policy.ts. +- D40-L: tool authority is a pure derivation; DO NOT modify src/.pi/agents/state.ts (read-only import). +- A18-L: name strict built-in suppression as accepted Pi-upstream residue. +- This is a minimal shell, not M6: no RBAC, no permissions matrix, no authority service. +``` + +### Expected touched paths (tentative) + +```pseudo tree +src/.pi/extensions/runtime/ +└── authority-matrix.test.ts + (the guard test — primary deliverable) +src/projections/session/runtime-policy.ts ? (read/confirm; touch only if block-list incomplete) +src/.pi/extensions/runtime/index.ts ? (read/confirm; touch only if a hook gap is found) +src/graph/command-executor.ts ? (read-only unless a discriminant gap is found) +src/rpc/methods/session.ts ? (touch only if needs_human mapping is missing) +``` + +Lane discipline for parallel worktrees: +- **Does not** write `src/.pi/skills/**` (the `resource-body-depth` builder owns that). +- **Does not** write `src/graph/README.md`, `src/rpc/README.md`, `src/web/README.md`, or + `src/graph/observed-shapes-coverage.test.ts` (the `graph-observed-shapes` ledger owns those). +- **Does not** write `src/.pi/agents/state.ts` (reserved single-writer file; import read-only). + +### Traceability + +- **SPEC:** D20-L (command-result discriminants), D34-L (command containment), D40-L (projected tool + authority), A18-L (strict-built-in-suppression residue), A3-L. +- **Requirements:** R5, R6, R10. +- **Frontier:** satisfies the `minimal-authority-shell` acceptance leaves via audit + guard; any + concrete gap the audit surfaces is filled in-place, anything larger routes back to `ln-plan`. +- **Design docs:** `memory/SPEC.md` D20-L/D34-L/D40-L; `docs/reference/pi-extensions.md`. From 052adb6e28be53deb3b53d217b2f24732c53fe0c Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 10:39:25 +0200 Subject: [PATCH 06/17] Reconcile cross-cut completion: prompt-resource body depth done Stream A (resource-body-depth) landed in 1ca02e38. Mark both Seam 3a/3b content-depth rows built in CROSS_CUT_PLAN.md, record the completion in PLAN.md (Recently Completed + parallel-stream note: A done, B and C remain), and delete the now-exhausted scope card. All cross-cut row-sized work is complete; only the unscoped live 'what to ask next' driver remains. Amp-Thread-ID: https://ampcode.com/threads/T-019ea2fc-9f12-767d-bd7a-08497f7307fd Co-authored-by: Amp --- memory/CROSS_CUT_PLAN.md | 20 +-- memory/PLAN.md | 6 +- .../crosscut-know--resource-body-depth.md | 165 ------------------ 3 files changed, 14 insertions(+), 177 deletions(-) delete mode 100644 memory/cards/crosscut-know--resource-body-depth.md diff --git a/memory/CROSS_CUT_PLAN.md b/memory/CROSS_CUT_PLAN.md index 27f430fdf..09b440cdb 100644 --- a/memory/CROSS_CUT_PLAN.md +++ b/memory/CROSS_CUT_PLAN.md @@ -36,7 +36,7 @@ itself. - `memory/PLAN.md` owns frontier ids, sequencing, dependency judgment, and which work is active next. - This file owns only the temporary elicitor READ / WRITE / KNOW row inventory and its aggregate coverage DoD. -- When one row escapes row-sized work, it gets promoted back into PLAN. As of 2026-06-08, the D65-L row is now the active PLAN frontier `elicitation-backlog`; the remaining prompt-resource body-depth pass stays temporary cross-cut work. +- When one row escapes row-sized work, it gets promoted back into PLAN. As of 2026-06-08, the D65-L row is now the active PLAN frontier `elicitation-backlog` (landed), and the prompt-resource body-depth pass landed in 1ca02e38. All ● rows are now `have`/`built`; the only remaining cross-cut residue is the live per-turn "what to ask next" driver, which is an unscoped PLAN follow-on, not row-sized work. ## The seams (locked) @@ -115,7 +115,7 @@ DoD: every ● row is `have` or `built`. | Capability | Status | Req | Fill | Owner / next | Notes | | --- | --- | --- | --- | --- | --- | | goals / strategies / lenses scaffolding + legal-tuple gating | have | ● | — | — | `.pi/agents/state.ts` | -| goal/strategy/lens **content depth** | partial | ● | earned | card `memory/cards/crosscut-know--resource-body-depth.md` | scaffolding present, bodies thin | +| goal/strategy/lens **content depth** | built | ● | — | done — deepened bodies + manifest-wide depth test (1ca02e38) | each body now carries its facet guidance; ≥700-char floor guarded in `compose.test.ts` | | `freestyle` strategy | built | ● | — | done — pin-only strategy (8de7f166) | AUTO-excluded, no added authority; D66-L | | "what to ask next" driver | partial | ● | proving | unscoped follow-on | flat-table substrate landed via FE-823; live per-turn driver + capture-reflection remain follow-on work | @@ -126,7 +126,7 @@ DoD: every ● row is `have` or `built`. | Capability | Status | Req | Fill | Owner / next | Notes | | --- | --- | --- | --- | --- | --- | | 6 method resources scaffolding | have | ● | — | — | run-structured-exchange, infer-and-capture, commit-graph, read-context, generate-proposal, review-for-gaps | -| method **content depth** | partial | ● | earned | content pass | bodies thin | +| method **content depth** | built | ● | — | done — deepened bodies + manifest-wide depth test (1ca02e38) | each method gives tool-routing/sequencing guidance, not tool-description restatement | | generalized capture (free text, files, refs; iterative passes) | built | ● | — | done — labeled-text core on `session.submitMessage` (5f5e6ac8) | POC bar = directly-labeled facts; richer free-text/files/refs remain A22-L fitness evidence; D66-L | | exchange-tool `.description()` / `promptGuidelines` | built | ● | — | done — all 7 exchange tools carry both (drift correction 2026-06-07) | `src/.pi/extensions/exchanges/*` already match the `commit_graph` pattern | | skill-commands (`gap-review`, `arbitrary-enhance`) | new | ○ | proving | Q6 (deferred) | off critical path | @@ -264,13 +264,13 @@ order is coverage-driven: close ● ledger rows seam by seam. This also closed the Seam 3a `freestyle` and Seam 3b generalized-capture ● rows. No posture-switch tool to build (Q4 dissolved); user/system posture surface is deferred to the Q-state affordance reducer. -4. **Seam 3a/3b content pass** — `freestyle` strategy (**built**, 8de7f166) + - `elicitation_backlog`-driven "what to ask next" (D65-L); goal/strategy/lens/method body - depth; exchange-tool `.description()` / `promptGuidelines` fix (**built** — drift correction; - all 7 exchange tools already carry both). Skill-commands (Q6) stay deferred. **Scoped:** - FE-823 landed the D65-L substrate tracer (flat table, `createSpec` seed, command/query seam); - the live per-turn driver + capture-reflection remain an unscoped follow-on, and - `memory/cards/crosscut-know--resource-body-depth.md` still holds the goal/strategy/lens/method body pass. +4. **Seam 3a/3b content pass** — **COMPLETE** (all ● rows built): `freestyle` strategy + (8de7f166), generalized-capture core (5f5e6ac8), exchange-tool `.description()` / + `promptGuidelines` (drift correction 2026-06-07), and goal/strategy/lens/method body depth + (1ca02e38 — deepened bodies + a manifest-wide ≥700-char depth test in `compose.test.ts`). + FE-823 landed the D65-L substrate tracer (flat table, `createSpec` seed, command/query seam). + Skill-commands (Q6) stay deferred; the live per-turn "what to ask next" driver + + capture-reflection remain an unscoped PLAN follow-on. 5. **Spec reconcile** — promote the D40-L/D59-L one-line refinements (on confirmation), land Q1 negative-query touch, fold D65-L/D66-L outcomes into SPEC/PLAN. diff --git a/memory/PLAN.md b/memory/PLAN.md index d8dc51eaf..6c5ba8366 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -275,6 +275,8 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g - **Design docs:** `.fixtures/seeds/bilal-port/README.md`; `docs/design/GRAPH_MODEL.md`; `docs/praxis/manual-testing.md`. ## Recently Completed +- 2026-06-08 cross-cut prompt-resource body-depth pass (Seam 3a/3b) — Done (1ca02e38): deepened every thin `src/.pi/skills/{goals,strategies,lenses,methods}` body to carry its per-axis facet guidance (goals→D59-L, strategies/lenses→README+D25-L, methods→D58-L tool-routing role), and added a manifest-wide readability/depth test in `src/.pi/agents/compose.test.ts` asserting every `{GOAL,STRATEGY,LENS,METHOD}_RESOURCES` location resolves and clears a ≥700-char floor. `state.ts` untouched. This closed the last row-sized cross-cut completion work; `memory/CROSS_CUT_PLAN.md` ● rows are now all built. Verified: `npm run verify` (551 tests, build). + - 2026-06-08 `elicitation-backlog` (FE-823) — Done: materialized `elicitation_backlog` as a flat spec-scoped table with generated migration, seeded the grounding agenda at `createSpec`, routed create/close entry mutations through `CommandExecutor` on the shared `{specId, lsn}` / `change_log` boundary, and added graph-owned per-spec open-entry read-back. Reconciled D65-L/A24-L and updated graph/db topology docs. Verified: `src/graph/command-executor.test.ts`, `src/graph/queries.test.ts`, and `npm run verify`. - 2026-06-06 `project-graph-review-cycle` (FE-809) — Done: `project-graph` now has active review tools at commitment readiness, real agent proposal generation reaches `present_review_set`, approval goes through public `session.submitExchangeResponse`, `CommandExecutor.acceptReviewSet` commits the exact reviewed batch with `basis: explicit`, and graph/session invalidations publish with `{specId, lsn}`. Verified: `src/.pi/agents/state.test.ts`, `src/.pi/__tests__/prompting.test.ts`, `src/probes/project-graph-review-cycle-proof.test.ts`, and real run `.fixtures/runs/project-graph-review-cycle/2026-06-06-project-graph-review-cycle/`. @@ -328,8 +330,8 @@ horizon: geolog-and-petri-execution notes: - - `elicitation-backlog` was the promoted D65-L row from `memory/CROSS_CUT_PLAN.md`; the remaining temporary cross-cut work is `memory/cards/crosscut-know--resource-body-depth.md`. - - Parallel worktree streams (2026-06-08): three mutually write-disjoint streams may run concurrently from a clean committed base — (A) `crosscut-know--resource-body-depth` → `src/.pi/skills/**`; (B) `graph-observed-shapes--coverage-ledger` → `src/graph/README.md` + `rpc`/`web` READMEs + one guard test; (C) `minimal-authority-shell--audit-and-guard` → `src/.pi/extensions/runtime/` + guard test. Invariant: **`src/.pi/agents/state.ts` is a single-writer file** — only one stream may edit it at a time (A may touch manifest descriptions; B and C must not). `poc-live-ship-gate` stays gated behind `minimal-authority-shell` (hard edge); `runtime-affordances-and-legality` and `exchanges-and-generalized-capture` stay parked (shape not yet forced) and are not cold-startable worktree streams. + - `elicitation-backlog` was the promoted D65-L row from `memory/CROSS_CUT_PLAN.md`; the prompt-resource body-depth pass (the last temporary cross-cut completion work) landed in 1ca02e38, so `memory/CROSS_CUT_PLAN.md` now has no row-sized work left — its only residue is the unscoped live "what to ask next" driver. + - Parallel worktree streams (2026-06-08): stream (A) `crosscut-know--resource-body-depth` → `src/.pi/skills/**` is **done** (1ca02e38). Two write-disjoint streams remain cold-startable from a clean committed base — (B) `graph-observed-shapes--coverage-ledger` → `src/graph/README.md` + `rpc`/`web` READMEs + one guard test; (C) `minimal-authority-shell--audit-and-guard` → `src/.pi/extensions/runtime/` + guard test. Invariant: **`src/.pi/agents/state.ts` is a single-writer file** — B and C must not edit it. `poc-live-ship-gate` stays gated behind `minimal-authority-shell` (hard edge); `runtime-affordances-and-legality` and `exchanges-and-generalized-capture` stay parked (shape not yet forced) and are not cold-startable worktree streams. - Completed prerequisites: `agents-composition-layer` supplies runtime prompt/resource posture, and `live-graph-observer` supplies the read-only web observer path expected by `capture-response-to-graph` and `poc-live-ship-gate`. - `graph-observed-shapes` is intentionally consumer-specific: do not assume every agent read shape belongs on the web observer. - `exchanges-and-generalized-capture` stays deferred until the surviving inventory is honest enough to close; do not regrow deleted `capture-*` symmetry in the meantime. diff --git a/memory/cards/crosscut-know--resource-body-depth.md b/memory/cards/crosscut-know--resource-body-depth.md deleted file mode 100644 index 9d5b1fcc2..000000000 --- a/memory/cards/crosscut-know--resource-body-depth.md +++ /dev/null @@ -1,165 +0,0 @@ -# Prompt-resource body depth (Seam 3a/3b content pass) - -Frontier: n/a (cross-cut Seam 3a/3b; D58-L) | tracker/branch = the active cross-cut push -Status: active -Mode: single -Created: 2026-06-07 - -## Orientation - -- **Containing seam:** the KNOW layer's Brunch-owned prompt resources under - `src/.pi/skills/{goals,strategies,lenses,methods}` — the markdown bodies the agent loads with - `read` when an axis is active (D58-L manifest mechanism). `CROSS_CUT_PLAN.md` Seam 3a/3b both - carry a *content depth* ● row: "scaffolding present, bodies thin." -- **Relevant frontier item:** none in `memory/PLAN.md`; this stays the remaining temporary - cross-cut completion work after D65-L `elicitation_backlog` was promoted back into PLAN. - It is the earned content half of cross-cut working-order step 4. -- **Volatile state:** the bodies are genuinely thin — every resource is ~5 lines - (`goals/*`, `lenses/*`, `methods/{commit-graph,read-context,review-for-gaps}`, all four - non-freestyle `strategies/*`); only `methods/{infer-and-capture,generate-proposal,run-structured-exchange}` - reach 12–15 lines (use these three as the **shape exemplar** for body depth). -- **Source-anchoring gotcha (new-thread-critical):** only **strategies/** and **lenses/** have a - README contract; **goals/** and **methods/** do **not**. Do not invent content — anchor every - body to the authoritative source named in §Content sources below. The one-line manifest - descriptions in [`.pi/agents/state.ts`](file:///Users/lunelson/Code/hashintel/brunch-next/src/.pi/agents/state.ts) - (`GOAL_RESOURCES`, `STRATEGY_RESOURCES`, `LENS_RESOURCES`, `METHOD_RESOURCES`) already encode - each resource's intended one-line intent; the body expands that intent, it must not contradict it. -- **Concurrency note (new-thread-critical):** another agent is actively building the - `elicitation-backlog` frontier in `src/graph/` and `src/db/`. This card touches **only** - `src/.pi/skills/**/*.md` (plus optionally `state.ts` descriptions / `compose.test.ts`). Do **not** - edit `graph/`, `db/`, or the elicitation-backlog card — that is another tenant's blast radius. -- **Drift note (handled in reconciliation, not here):** the Seam 3b *exchange-tool - `.description()` / `promptGuidelines`* ● row is **already done** — all 7 exchange tools under - `src/.pi/extensions/exchanges/` carry `description` + `promptGuidelines`. That row is reclassified - `built` in the ledger; it is **out of scope** for this card. -- **Main open risk:** prose *quality* stays partly judgment-based, but acceptance does not depend on - it — a required structural test (§Verification Approach) gives every body an objective non-trivial-depth - floor and a self-checkable facet checklist (§Content sources) replaces "read it and decide." - -Posture: **earned** (inherited from cross-cut Seam 3a/3b — Fill=`earned`; settled scaffolding, -just unbuilt bodies). This is content materialization into existing topology, not a new seam. - -Frontier-level cross-cutting obligations: - -- **D58-L:** bodies stay Brunch-owned markdown loaded on demand; the manifest advertises - `{name, description, location}`, the body carries detail. Do not move detail into code or descriptions. -- **D39-L:** resource location stays code-owned in `.pi/agents/state.ts`; this card edits bodies - only, not the manifest registry. -- Keep each body scoped to its own axis; do not duplicate cross-axis content (goal vs strategy vs - lens vs method are orthogonal, D59-L/D25-L). - -### Content sources (per family — read these before writing any body) - -Every body expands its **manifest one-liner** in `.pi/agents/state.ts`; that one-liner is the -binding intent the body may not contradict. Beyond that, each family has a distinct authoritative -anchor and facet checklist: - -```pseudo tree -goals/ (4: grounding-advance, elicit-expand, commit-converge, capture-posture) - authority SPEC D59-L (defines all four goals + grade-derivation) + GOAL_RESOURCES one-liner - no README — D59-L IS the contract - facets what the agent pursues · what evidence advances it · what NOT to claim/do · - how it relates to its grade band (D64-L) · capture-posture never writes spec/graph truth -strategies/ (4 remaining: step-wise-decision-tree, step-wise-disambiguate, propose-graph, project-graph) - authority strategies/README.md §"Prompt resource contents" + STRATEGY_RESOURCES one-liner + SPEC D25-L/D26-L - exemplar strategies/freestyle.md (recently deepened — match this depth) - facets what the agent does · turn structure · commitment mechanism (D26-L) · - available graph ops · category-selection rubric for graph-writing strategies -lenses/ (3: intent, design, oracle) - authority lenses/README.md §"Topology-driven question ranking" + LENS_RESOURCES one-liner + SPEC D25-L/D56-L - facets topical/plane focus · favored kinds/edges · how it shapes interpretation · - topology-driven "what to ask next" heuristics from the README table -methods/ (6: run-structured-exchange, infer-and-capture, commit-graph, read-context, generate-proposal, review-for-gaps) - authority SPEC D58-L ("method resources are the prompt-level home for tool-routing/sequencing guidance") + METHOD_RESOURCES one-liner - no README — D58-L IS the contract - exemplar methods/{generate-proposal,run-structured-exchange,infer-and-capture}.md (already 12–15 lines) - facets concrete tool-routing/sequencing (NOT a restatement of the tool description) · - when to invoke · what to compose it with · what stays out of scope -``` - -### Objective - -Deepen the thin `.pi/skills/{goals,strategies,lenses,methods}` resource bodies so each carries the -real per-axis instruction its authoritative source (§Content sources) requires, without changing the -manifest registry. - -### Acceptance Criteria - -```pseudo tree -resource body depth -├── goals (4) -│ └── ✓ each goal body states the objective, what evidence advances it, and what NOT to claim/do -├── strategies (4 remaining; freestyle already deepened) -│ └── ✓ each body covers the strategies/README facets: what the agent does, turn structure, -│ commitment mechanism, available graph ops, and category-selection rubric where applicable -├── lenses (3) -│ └── ✓ each lens body states its topical focus, what kinds/edges it favors, and how it shapes interpretation -├── methods (6) -│ └── ✓ each method body gives concrete tool-routing/sequencing guidance (the D58-L method role), -│ not a restatement of the tool description -└── consistency - ├── ✓ no body contradicts its §Content sources authority or another axis's responsibility - ├── ✓ each body expands (does not contradict) its state.ts manifest one-liner - └── ✓ no new capability/authority/tool invented beyond what the source already grants -``` - -### Verification Approach - -Builder-portable, no human-only step required to pass the card: - -``` -- Self-check (objective): for each body, walk its §Content sources facet checklist and confirm - every facet is addressed in prose; confirm the body still reads as an expansion of its - state.ts one-liner and invents no new authority/tool. -- Structural test (REQUIRED): extend the existing compose/readability test (compose.test.ts) to assert, - for every manifest entry across all four families, that location resolves to a readable file whose - body exceeds a non-trivial line/char threshold (i.e. beyond the current ~5-line placeholders). - This converts "bodies are thin" into a failing assertion before the pass and a passing one after. -- Gate: `npm run verify` (fix → test → build) — proves all resources still load and the manifest - location wiring is intact. -- Human review is optional polish AFTER the gate is green; it is not required for acceptance. -``` - -### Cross-cutting obligations - -``` -- Bodies are prompt resources, not code: keep instruction in markdown, not in descriptions/manifest. -- Preserve orthogonality (D59-L/D25-L): a strategy body must not absorb goal/lens content. -- Do not touch the exchange-tool description row (already built) or the manifest registry (D39-L). -``` - -### Assumption dependency - -`None` — this slice's correctness does not hinge on a live `memory/SPEC.md` §Assumption; the -axis scaffolding and the D58-L manifest mechanism are settled and built. - -### Expected touched paths (tentative) - -```pseudo tree -src/.pi/skills/ -├── goals/{grounding-advance,elicit-expand,commit-converge,capture-posture}.md ~ -├── strategies/{step-wise-decision-tree,step-wise-disambiguate,propose-graph,project-graph}.md ~ -├── lenses/{intent,design,oracle}.md ~ -└── methods/{run-structured-exchange,infer-and-capture,commit-graph,read-context,generate-proposal,review-for-gaps}.md ~ -src/.pi/agents/state.ts ? (only if a manifest description needs to match a deepened body) -src/.pi/agents/compose.test.ts ~ (REQUIRED: structural non-trivial-depth + location-resolves assertion) -``` - -Stay inside this tree. Do **not** touch `src/graph/**`, `src/db/**`, or `memory/PLAN.md` / -`memory/CROSS_CUT_PLAN.md` — the `elicitation-backlog` builder owns those concurrently. - -### Promotion checklist - -All **no** — stays a light/earned content card: - -- Changes a requirement? No. — Creates/retires an assumption? No. — Depends on unvalidated - high-impact assumption? No. — Makes/reverses a design decision? No. — New seam invariant? No. -- Changes a cross-cutting verification layer? No. — Crosses >2 seams? No (one resource tree). -- First touch in an unfamiliar seam? No. — Can't name the seam/rationale? No (D58-L, the READMEs). - -### Traceability - -- **SPEC:** D58-L (resource-manifest mechanism), D59-L/D25-L (axis orthogonality). -- **Cross-cut:** closes `CROSS_CUT_PLAN.md` Seam 3a *goal/strategy/lens content depth* ● and - Seam 3b *method content depth* ●. The Seam 3b *exchange-tool description* ● is reclassified - `built` (drift) during reconciliation, not by this card. From aae5961971cce92ab98bfa7e97996eb79b8afb7d Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 10:48:26 +0200 Subject: [PATCH 07/17] Graduate rename-residue lens into review skills; scope PR 177 refactor Add 'rename blast-radius includes the data plane' cue to ln-review Contract integrity (and mirror into ln-judo-review): a reviewer-bot comment on renamed data samples one token of a wider syndrome; regenerate committed artifacts wholesale rather than field-patching, and check legacy-input policy against posture. Add memory/REFACTOR.md scoping the PR 177 (FE-811) reconciliation: regenerate two stale reference runs, add a residue guard, fix the edge-direction label. Output of an ln-induct run on PR 177 comments. Amp-Thread-ID: https://ampcode.com/threads/T-019ea618-c2ac-721d-809a-a72bfd9ce453 Co-authored-by: Amp --- .agents/skills/ln-judo-review/SKILL.md | 2 +- .agents/skills/ln-review/SKILL.md | 1 + memory/REFACTOR.md | 116 +++++++++++++++++++++++++ 3 files changed, 118 insertions(+), 1 deletion(-) create mode 100644 memory/REFACTOR.md diff --git a/.agents/skills/ln-judo-review/SKILL.md b/.agents/skills/ln-judo-review/SKILL.md index 90e1ac453..f3bad9ad5 100644 --- a/.agents/skills/ln-judo-review/SKILL.md +++ b/.agents/skills/ln-judo-review/SKILL.md @@ -32,7 +32,7 @@ Make the change easy, then make the easy change (Beck): if the diff feels tangle Boring code over magic (Hunt & Thomas): generic mechanisms that hide simple data-shape assumptions are a defect, not a feature. -Ambient-contract reliance: an invariant the code assumes but never enforces, threads, or names — uniqueness keys that silently last-win, dedups that drop kept data, hardcoded literals standing in for upstream provenance, persisted absolute paths/`cwd` leaking into committed fixtures, magic shape-checks instead of named predicates. The judo move is to make the contract intentional: enforce it loudly, thread the real value, or name it — not to tidy the assumption in place. (Full cue list in `ln-review` §Contract integrity.) +Ambient-contract reliance: an invariant the code assumes but never enforces, threads, or names — uniqueness keys that silently last-win, dedups that drop kept data, hardcoded literals standing in for upstream provenance, persisted absolute paths/`cwd` leaking into committed fixtures, renames propagated to code/docs but not to committed fixtures or serialized artifacts, magic shape-checks instead of named predicates. The judo move is to make the contract intentional: enforce it loudly, thread the real value, or name it — not to tidy the assumption in place. (Full cue list in `ln-review` §Contract integrity.) Functional core / imperative shell (Gary Bernhardt): when independent work is needlessly serialized, or related updates can leave state half-applied, ask whether orchestration should be separated from business logic — and whether the cleaner structure is parallel or atomic. diff --git a/.agents/skills/ln-review/SKILL.md b/.agents/skills/ln-review/SKILL.md index 07fd98cd5..0ebc24734 100644 --- a/.agents/skills/ln-review/SKILL.md +++ b/.agents/skills/ln-review/SKILL.md @@ -56,6 +56,7 @@ Concrete cues to look for: - A dedup or "first wins / last wins" that silently drops data the caller meant to keep. Repair: thread distinct keys, or fail loudly. - A hardcoded literal standing in for a value that should be carried from upstream (`respondsToPresentTool: 'present_options'` when the originating tool varies). Repair: thread the real provenance. - Persisted or serialized data that assumes an ambient environment (absolute paths, `cwd`, tempdirs, machine-local roots leaking into committed fixtures). Repair: name the portable contract and normalize at the boundary. +- A field/path/identifier rename propagated to code and docs but **not** to committed fixtures, serialized artifacts, or legacy inputs → reference data silently straddles old and new contracts (`graphSnapshotJson` survives in a committed `report.json` after writers moved to `graphOverviewJson`; a `workspace.snapshot` topic lingers in a captured run). A reviewer-bot comment on such data usually samples *one* stale token of a wider syndrome. Repair: regenerate the committed artifacts wholesale (don't field-patch the single token a reviewer happened to flag), add a residue guard that fails on retired tokens, and decide legacy-input policy explicitly against project posture — a generic "accept both for backward compat" can violate a pre-release/no-shim posture. - A magic check inferring readiness/state from an object's incidental shape instead of a named constant or predicate. Repair: name the predicate against the canonical constant. - Ordering or position encoded by a numeric index/splice rather than by structure. Repair: make the order declarative. - A type alias or name that implies a wider contract than it points at. Repair: point it at the real union, or rename. diff --git a/memory/REFACTOR.md b/memory/REFACTOR.md new file mode 100644 index 000000000..a3b694e3c --- /dev/null +++ b/memory/REFACTOR.md @@ -0,0 +1,116 @@ +# Refactor: reconcile PR 177 rename residue + edge-direction label + +> Source: `ln-induct` run on PR 177 (FE-811) review comments. Temporary execution +> aid — delete when complete or superseded (per `AGENTS.md` §ln-refactor). +> Builder works on branch `ln/fe-811-poc-live-ship-blockers`. + +## Problem Statement + +PR 177 renamed several identifiers across code and docs, but the migration +stopped at the code/doc plane and never reached the **data plane**. Two +committed reference runs were generated before the renames and never +regenerated, so they straddle old and new contracts silently: + +- `.fixtures/runs/fixture-curation/fixture-curation-2026-06-05T104440Z/` +- `.fixtures/runs/project-graph-review-cycle/2026-06-06-project-graph-review-cycle/` + +Concretely, against the current writer contract (`src/probes/*` now emit +`graphOverviewJson` → `graph-overview.json`; `src/rpc/product-updates.ts` only +emits topic `workspace.state`): + +- both runs' `report.json` carry `artifacts.graphSnapshotJson` → `graph-snapshot.json` +- both runs ship a stale `graph-snapshot.json` file (writers now produce `graph-overview.json`) +- the `project-graph-review-cycle` run's `report.json:88` carries a stale + `"topic": "workspace.snapshot"` + +The Cursor bot sampled only the first of these in only the artifact field; the +others are the unsampled tail of the same syndrome. Field-patching the one the +bot named would leave the run still wrong on the others — the false confidence +this whole lens predicts. + +Separately, a presentation-layer bug: `formatRelatedNodesResult` in +`src/.pi/extensions/graph/command-adapter.ts:255` labels each result edge +`outgoing`/`incoming` from a one-sided check (`source ∈ anchors`). The query +layer (`src/graph/queries.ts`) correctly traverses multi-hop and node↔node +edges, so at hop ≥ 2 an edge between two non-anchor nodes (source ∉ anchors) is +silently mislabeled `incoming`. + +**Data-plane delta:** + +```pseudo +tree current (per stale run) tree desired + report.json report.json + artifacts.graphSnapshotJson --> artifacts.graphOverviewJson + productUpdates[].topic productUpdates[].topic + "workspace.snapshot" --> "workspace.state" + graph-snapshot.json --> graph-overview.json (file renamed/regenerated) +``` + +## Solution + +Regenerate both reference runs from their committed session transcripts (the +runs are replay-deterministic — the probe reads `session.jsonl` + seed and +derives artifacts; no live model calls), so every committed identifier matches +the current writer contract. Then install a guard so a future rename cannot +silently leave reference-data residue. Finally, fix the edge-direction label to +classify by both endpoints. + +### Non-goals (do not do these) + +- **Do NOT add `snapshottedLsn` backward-compatibility.** Copilot suggested the + reader at `src/projections/session/runtime-state.ts:121` accept both + `seenLsn` and the legacy `snapshottedLsn`. This contradicts the repo's + pre-release posture (`AGENTS.md`: no back-compat shims unless explicitly + required). No committed transcript carries the legacy field. Leave the + single-field read as-is. +- Do not widen scope to other probe runs — the audit confirmed the other five + committed `report.json` files carry no graph-artifact or workspace-topic keys. +- Do not touch the `queries.ts` traversal — it is correct. + +## Commits + +Ordered; each leaves the suite green. Behavioral change last. + +1. **Regenerate the `fixture-curation-2026-06-05T104440Z` reference run.** + Replay its committed `session.jsonl` through the fixture-curation probe so + `report.json` emits `graphOverviewJson` → `graph-overview.json`, and the + stale `graph-snapshot.json` is replaced by `graph-overview.json`. Confirm the + probe test still passes against the regenerated run. + - Touches: `.fixtures/runs/fixture-curation/fixture-curation-2026-06-05T104440Z/*` + - Driver: `src/probes/fixture-curation-loop.ts` (entrypoint writes to the run dir) + +2. **Regenerate the `2026-06-06-project-graph-review-cycle` reference run.** + Same regeneration; this additionally resolves the stale + `"topic":"workspace.snapshot"` to `"workspace.state"`. Confirm + `src/probes/project-graph-review-cycle-proof.test.ts` passes. + - Touches: `.fixtures/runs/project-graph-review-cycle/2026-06-06-project-graph-review-cycle/*` + - Driver: `src/probes/project-graph-review-cycle-proof.ts` + +3. **Add a contract-residue guard test (enforce loudly).** A test that scans + every committed `report.json` under `.fixtures/runs/**` and fails if any + contains a retired contract token (`graphSnapshotJson`, `graph-snapshot`, + `workspace.snapshot`). Green only after commits 1–2. This is the lens's + "enforce it loudly" repair: a future rename that forgets the data plane now + fails CI instead of shipping silent drift. + - Touches: new test near `src/probes/` (e.g. `src/probes/fixture-contract-residue.test.ts`) + - Note: `.fixtures` is gitignored but force-committed — enumerate files via + `git ls-files '.fixtures/**/report.json'`, not a glob that respects ignore. + +4. **Fix the edge-direction label (behavioral).** In + `formatRelatedNodesResult` (`src/.pi/extensions/graph/command-adapter.ts:255`), + classify by both endpoints: `source ∈ anchors → outgoing`, + `target ∈ anchors → incoming`, else `lateral`. Add a regression test that + builds a 2-hop related result containing a node↔node edge and asserts it is + labeled `lateral`, not `incoming`. + - Touches: `src/.pi/extensions/graph/command-adapter.ts` + its test + +## Verification + +- Per commit: `npm run fix` (inner loop). +- Gate before handing off: `npm run verify` (fix → test → build). +- Commit 3's guard must be RED if either regeneration is skipped, GREEN after. +- Commit 4's regression test must be RED against the current one-sided label. +``` + +The lens that produced this is `ln-review` §Contract integrity → "rename +blast-radius includes the data plane." From e1310c145ab18f5b0445f8f713f6154767e3beb4 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 10:54:10 +0200 Subject: [PATCH 08/17] Ratify graph observed-shape ledger --- memory/PLAN.md | 7 +- memory/SPEC.md | 2 +- .../graph-observed-shapes--coverage-ledger.md | 2 +- src/graph/README.md | 19 ++- src/graph/observed-shapes-coverage.test.ts | 113 ++++++++++++++++++ src/rpc/README.md | 2 +- src/web/README.md | 2 +- 7 files changed, 138 insertions(+), 9 deletions(-) create mode 100644 src/graph/observed-shapes-coverage.test.ts diff --git a/memory/PLAN.md b/memory/PLAN.md index 6c5ba8366..72cbbc019 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -40,8 +40,7 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g ### Next 1. `poc-live-ship-gate` — final fresh-cwd runbook remains the delivery gate, but its prepared live-mention-autocomplete slice is currently parked off the critical path. -2. `graph-observed-shapes` — next coverage frontier candidate: decide the observed-shape inventory per consumer, then align graph/RPC/web to it. -3. `runtime-affordances-and-legality` — follow-on coverage frontier for shared posture legality/default surfaces once graph observed shapes stop dominating. +2. `runtime-affordances-and-legality` — follow-on coverage frontier for shared posture legality/default surfaces once graph observed shapes stop dominating. ### Parallel / Low-conflict @@ -165,7 +164,7 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g - **Name:** Graph observed-shape inventory by consumer - **Linear:** unassigned - **Kind:** structural -- **Status:** next +- **Status:** done - **Certainty:** proving - **Lights up:** One canonical observed-shape matrix across graph readers, RPC methods, and web observer surfaces. - **Stabilizes:** D60-L read-shape ownership, D33-L web read-only observer scope, and the rule that `src/projections/` exists only for reusable multi-consumer DTOs. @@ -180,7 +179,7 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g - **Cross-cutting obligations:** Do not promote all read shapes everywhere. `list_by_kind` / `list_by_band` are plausible web shapes; `related` / `gaps` may remain agent/RPC-only. Keep graph-owned read logic out of `db/`, and keep `src/renderers/` limited to durable LLM/session text rather than arbitrary observer DTOs. - **Traceability:** D33-L, D51-L, D52-L, D60-L, D64-L. - **Design docs:** `src/graph/README.md`; `src/rpc/README.md`; `src/web/README.md`. -- **Current execution pointer:** Scoped 2026-06-08 — active scope file `memory/cards/graph-observed-shapes--coverage-ledger.md` (the coverage-ledger slice: ratify the consumer-specific read-shape inventory + install a coverage-guard test; no transport shape ships in this slice). Any "required but missing" row spawns a separate follow-on alignment card scoped after the ledger is accepted. +- **Current execution pointer:** Done 2026-06-08. `src/graph/README.md` now owns the closed observed-shape ledger: `read_graph` requires the six agent shapes, RPC and web require only `overview` + `neighborhood`, `list_by_kind` / `list_by_band` remain web-eligible deferred, and register reads remain deferred until a per-turn driver/consumer needs them. `src/graph/observed-shapes-coverage.test.ts` guards the tool/RPC/web required subsets; no transport shape shipped in this frontier. ### runtime-affordances-and-legality diff --git a/memory/SPEC.md b/memory/SPEC.md index 17243aa4c..f3d809456 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -251,7 +251,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c - **D58-L — Brunch prompt composition is a thin runtime header plus a gated prompt-resource manifest, not eager selection of every objective pack.** `.pi/agents/compose(agentId, sessionState, spec, workspace, context)` runs before Pi provider requests through Brunch's prompt extension and emits: **(1) agent control header** — keyed agent identity, model/thinking expectation, foreground role derived from `op_mode`, and mode/tool-authority summary; **(2) runtime-state header** — current pinned/AUTO `goal`, `strategy`, and `lens`, `spec.readiness_grade`, and workspace posture; **(3) resource manifests** — XML-style ``, ``, ``, and `` entries filtered by `.pi/agents/state.ts` legal tuples, grade, `op_mode`, and the agent allow-list, each carrying `{name, description, location}` for a Brunch-owned markdown resource under `src/.pi/{agents,skills}/`; the `{name, description, location}` triples are code-owned in `.pi/agents/state.ts`, not filesystem-discovered, honoring D39-L sealing; **(4) compact pushed context** — only the minimal context handles and rendered context needed to orient the turn, with deeper context access still governed by D60-L. Detailed goal/strategy/lens/method instructions live in Brunch prompt resources and are loaded by the agent with `read` when needed, following the same simple mechanism Pi uses for skills. Method resources are the prompt-level home for Brunch tool-routing and sequencing guidance; tool definitions remain boundary schemas/execution hooks, not the whole Brunch guide to when or how tools should be composed. `AUTO` means the axis is unpinned: the manifest lists legal choices and router instructions tell the agent to choose only from the current manifest, reading the selected resource before applying it when detail matters. Pinned axes point to the pinned resource; code enforces legality and tool gating but does not choose or concatenate large semantic packs on the agent's behalf. Pi-native skills may still carry startup-scoped capabilities, but runtime-state-gated availability is Brunch's manifest, not ambient Pi discovery. `.pi/agents/` is the keyed agent prompt assembly layer (`definitions/`, `contexts/`); `.pi/skills/` carries goal/strategy/lens/method resources; `.pi/agents/contexts/` is the D60-L agent-context orchestration layer (code), not a manifest resource family or general renderer bucket. Reusable text renderers may migrate to `renderers/` under D52-L. Composition is projection, not a behavioral state machine. Depends on: D23-L, D25-L, D39-L, D40-L, D52-L, D59-L, D60-L. Supersedes: the flat "base + mode + role + strategy + lens + grade + …" layering; the fixed all-packs concatenation in `compose-brunch-prompt.ts`; "role preset / runtime bundle" as the composition unit; direct Layer-2 eager prompt-pack injection as the default mechanism; top-level `src/agents/` for Pi-only agents; and `capability` as a parallel name for `method` / ``. - **D59-L — `goal` is a grade-derived, AUTO-able objective axis, distinct from strategy.** A *goal* is what the session agent currently pursues; a *strategy* is the reusable interaction shape used to pursue it — a goal is pursued *via* a strategy *through* a lens (three orthogonal axes). The goal set is derived/gated by `spec.readiness_grade`: `grounding-advance` (fill grounding and advance the grade), `elicit-expand` (expand the elicited specification graph while ambiguity remains productive), `commit-converge` (reduce / lock down reviewable commitments), plus an always-on `capture-posture` (capture or confirm dev `posture`, D45-L). `goal` defaults to the grade-derived objective, may be pinned, or left `AUTO`; in either case D58-L manifests advertise the legal resource(s) rather than injecting the whole goal body. For now `goal` is **internal/grade-derived and not part of the user posture-change surface** (it is too contingent to expose as a user-mutable axis); the pin affordance is reserved for system/internal logic, and unlike `strategy`/`lens` the user does not switch it (D40-L, Q4). `elicit-expand` and `commit-converge` intentionally form the diverge/converge pair for the elicitation diamond; `elicit-I` / `elicit-II` are retired because they were phase-like labels, not objectives. "Advance the grade" is a goal, not a strategy — though the `grounding-advance` goal may carry a dedicated default interaction pattern. Depends on: D45-L, D57-L, D58-L. Supersedes: conflating the elicit-lifecycle objective with strategy selection. - **D66-L — `freestyle` is a structure-optional elicitation strategy; it and generalized free-text capture are one slice.** `freestyle` joins the strategy axis (D25-L) as a fifth value alongside `step-wise-decision-tree`, `step-wise-disambiguate`, `propose-graph`, and `project-graph`. The four existing strategies impose structured-exchange turn discipline (offer-first `present_*`/`request_*` ritual, D37-L); `freestyle` makes that discipline *optional* — the turn may be ordinary user-driven chat, structured-exchange tools remain available (not prohibited), and user-invoked slash/skill-commands are ergonomic here precisely because no pending structured exchange is consuming the turn. It is **initiative/interaction-style, not authority**: it is not a new `op_mode`, adds no tool authority, and `op_mode`-gated tool policy (D40-L) is unchanged. Because freestyle has no mandatory exchange, the only way it grows graph truth is **generalized capture**, so the two land together: post-exchange capture (D18-L) is now wired onto the ordinary-message path (`session.submitMessage`, D49-L) over the same `session exchange` unit — which already spans plain user text — routing high-confidence directly-stated facts through `CommandExecutor.commitGraph({basis: explicit})` exactly as the structured-response capture tracer does, while low-confidence implications stay in preface / `capture_*` analysis (D47-L, D50-L) and never become graph truth. Freestyle therefore *composes with*, and does not replace, the `goal` (D59-L) and `lens` (D25-L) axes: the user still pursues `grounding-advance` / `elicit-expand` / etc., just through free chat, and freestyle capture can both resolve and spawn `elicitation_backlog` entries (D65-L). **AUTO must not select `freestyle`** — it is an explicit user pin only (a "let me just talk" escape hatch); the runtime manifest now omits it under AUTO while still allowing explicit pins, so spontaneous AUTO entry cannot silently abandon the offer-first product thesis (R16). Remaining open quality questions are limited to capture scope beyond directly-labeled facts (fitness evidence under A22-L, materially harder without a structured prompt), whether capture eventually runs on every freestyle turn or on demand, and the exact slash/skill-command surface (the Q6 method-vs-command question). Depends on: D18-L, D25-L, D26-L, D40-L, D45-L, D49-L, D50-L, D59-L, D63-L, D65-L. Refines: R16. Supersedes: treating offer-first (R16) as a universal per-turn session invariant; treating freestyle as a new operational mode or authority posture. -- **D60-L — Agent context splits into pull / projection / render / surface, distinguishes graph-truth from active-context reads, and keeps `workspace.state` separate.** **Agent context** = content the agent reasons over: `cwd` (filesystem kickoff heuristic — `.brunch?`, session count/length, README/markdown sizes, file counts), `graph` (overview/list/query), or `node` (variable-hop neighborhood). **PULL** is typed, read-only data access owned by the data layer (`graph/queries.ts` for graph/node; `session/` for cwd) and bypasses `CommandExecutor` (reads only); the typed value *is* the JSON form. Graph pulls must make the read projection explicit: `graph_truth` includes accepted truth records, while `active_context` hides superseded predecessors and must also omit edges whose endpoints are hidden so active-context reads do not contain dangling references. The graph read family should support the observed query shapes without becoming a generic records API: list nodes by kind(s), list nodes by D64-L readiness band(s), find nodes related to anchor node(s) by edge category/direction/hop depth, and find class-members lacking an edge of a given category in a given direction (gap query — a single named absence shape, not a generic NOT-predicate language). **PROJECTION** is optional info-preserving shaping for reusable DTOs; when multiple adapters need the same structured view, it belongs in `projections/`, but many callers can consume the typed read directly. **RENDER** turns a typed or projected value into either an LLM-friendly string or JSON (trivial serialization). Reusable lossy text/markdown rendering belongs in `renderers/`; `.pi/agents/contexts/` owns the agent-context orchestration decision — which typed pull to expose, how much detail to include, and how lens-plane/grade-depth shape the prompt-facing string — and may call reusable renderers. Rendered projected stable node codes (D62-L) remain the primary handles. **SURFACE** delivers it: *pushed* (compose injects at turn boundary), *pulled* (`read_graph`, `read_workspace_context`, `read_session_context` wrap the relevant reads/renderers — markdown in `toolResult.content`, typed JSON in `toolResult.details` per I33-L), or *rpc/ui*. The separate **workspace projection** (`workspace.state` — workspace/session/spec/chrome product state) is a different subject and keeps that name. Depends on: D35-L, D52-L, D53-L, D62-L, D64-L. Supersedes: pre-rendering context strings in the pull layer, scattering context-build logic across `graph/`, `.pi/agents/contexts/`, and tool adapters, or silently mixing graph-truth and active-context reads. +- **D60-L — Agent context splits into pull / projection / render / surface, distinguishes graph-truth from active-context reads, and keeps `workspace.state` separate.** **Agent context** = content the agent reasons over: `cwd` (filesystem kickoff heuristic — `.brunch?`, session count/length, README/markdown sizes, file counts), `graph` (overview/list/query), or `node` (variable-hop neighborhood). **PULL** is typed, read-only data access owned by the data layer (`graph/queries.ts` for graph/node; `session/` for cwd) and bypasses `CommandExecutor` (reads only); the typed value *is* the JSON form. Graph pulls must make the read projection explicit: `graph_truth` includes accepted truth records, while `active_context` hides superseded predecessors and must also omit edges whose endpoints are hidden so active-context reads do not contain dangling references. The graph read family should support the observed query shapes without becoming a generic records API: list nodes by kind(s), list nodes by D64-L readiness band(s), find nodes related to anchor node(s) by edge category/direction/hop depth, and find class-members lacking an edge of a given category in a given direction (gap query — a single named absence shape, not a generic NOT-predicate language). `src/graph/README.md` owns the consumer coverage ledger: `read_graph` exposes the six agent shapes, while RPC and web deliberately expose only overview + neighborhood until a scoped feature promotes another shape. **PROJECTION** is optional info-preserving shaping for reusable DTOs; when multiple adapters need the same structured view, it belongs in `projections/`, but many callers can consume the typed read directly. **RENDER** turns a typed or projected value into either an LLM-friendly string or JSON (trivial serialization). Reusable lossy text/markdown rendering belongs in `renderers/`; `.pi/agents/contexts/` owns the agent-context orchestration decision — which typed pull to expose, how much detail to include, and how lens-plane/grade-depth shape the prompt-facing string — and may call reusable renderers. Rendered projected stable node codes (D62-L) remain the primary handles. **SURFACE** delivers it: *pushed* (compose injects at turn boundary), *pulled* (`read_graph`, `read_workspace_context`, `read_session_context` wrap the relevant reads/renderers — markdown in `toolResult.content`, typed JSON in `toolResult.details` per I33-L), or *rpc/ui*. The separate **workspace projection** (`workspace.state` — workspace/session/spec/chrome product state) is a different subject and keeps that name. Depends on: D35-L, D52-L, D53-L, D62-L, D64-L. Supersedes: pre-rendering context strings in the pull layer, scattering context-build logic across `graph/`, `.pi/agents/contexts/`, and tool adapters, or silently mixing graph-truth and active-context reads. ### Critical Invariants diff --git a/memory/cards/graph-observed-shapes--coverage-ledger.md b/memory/cards/graph-observed-shapes--coverage-ledger.md index 6cf564f30..a571cc544 100644 --- a/memory/cards/graph-observed-shapes--coverage-ledger.md +++ b/memory/cards/graph-observed-shapes--coverage-ledger.md @@ -1,7 +1,7 @@ # Graph observed-shape coverage ledger Frontier: graph-observed-shapes -Status: active +Status: done Mode: single Created: 2026-06-08 diff --git a/src/graph/README.md b/src/graph/README.md index a2a7d4eef..7b72f3e80 100644 --- a/src/graph/README.md +++ b/src/graph/README.md @@ -1,7 +1,7 @@ # graph/ — Graph domain layer Canonical reference: `docs/design/GRAPH_MODEL.md` -SPEC decisions: D4-L, D20-L, D27-L, D51-L, D52-L, D53-L, D54-L, D62-L, D63-L +SPEC decisions: D4-L, D20-L, D27-L, D51-L, D52-L, D53-L, D54-L, D60-L, D62-L, D63-L ## Owns @@ -49,6 +49,23 @@ SPEC decisions: D4-L, D20-L, D27-L, D51-L, D52-L, D53-L, D54-L, D62-L, D63-L through `db/connection.ts` and returns a `CommandExecutor` plus bound query readers for adapters. +## Observed read-shape ledger + +D60-L read-shape ownership is explicit: every durable graph read shape has one canonical owner in `queries.ts`; adapters may expose only the subset they need. Deferred means eligible or known but not currently exposed for that consumer; `n/a` means deliberately outside that consumer's product role. + +| Shape | Canonical owner | `read_graph` tool | RPC | Web | Reason for deferred / n/a | +| --- | --- | --- | --- | --- | --- | +| `overview` | `getGraphOverview` | required | required | required | — | +| `neighborhood` | `getNodeNeighborhood` | required | required | required | — | +| `list_by_kind` | `getGraphSliceByKinds` | required | deferred | deferred | Web-eligible bounded graph slice; RPC follows a concrete web/client need. | +| `list_by_band` | `getGraphSliceByReadinessBands` | required | deferred | deferred | Web-eligible D64-L evidence slice; RPC follows a concrete web/client need. | +| `gaps` | `getGraphGaps` | required | n/a | n/a | Agent/RPC-only diagnostic shape; not a web observer projection. | +| `related` | `getRelatedNodes` | required | n/a | n/a | Agent/RPC-only traversal helper; not a web observer projection. | +| `reconciliation_needs` | `getOpenReconciliationNeeds` | deferred | deferred | deferred | Agent-internal register read; no transport consumer yet. | +| `elicitation_backlog` | `getOpenElicitationBacklogEntries` | deferred | deferred | deferred | Agent-internal prospective-register read; per-turn driver follow-on owns exposure. | + +`observed-shapes-coverage.test.ts` guards the required subsets against accidental drift: the tool mode union must stay at the six required agent shapes, while RPC and web stay at `overview` + `neighborhood` until a scoped feature deliberately promotes another row. + ## Clock and audit posture `graph_clock` and `change_log` are spec-scoped. `CommandExecutor.createSpec` diff --git a/src/graph/observed-shapes-coverage.test.ts b/src/graph/observed-shapes-coverage.test.ts new file mode 100644 index 000000000..3199c7789 --- /dev/null +++ b/src/graph/observed-shapes-coverage.test.ts @@ -0,0 +1,113 @@ +import { describe, expect, it } from 'vitest'; + +import { ReadGraphParams } from '../.pi/extensions/graph/tool-schemas.js'; +import { graphRpcMethods } from '../rpc/methods/graph.js'; +import { queryKeys } from '../web/query-keys.js'; + +const observedShapeLedger = [ + { + shape: 'overview', + owner: 'getGraphOverview', + tool: 'required', + rpc: 'required', + web: 'required', + }, + { + shape: 'neighborhood', + owner: 'getNodeNeighborhood', + tool: 'required', + rpc: 'required', + web: 'required', + }, + { + shape: 'list_by_kind', + owner: 'getGraphSliceByKinds', + tool: 'required', + rpc: 'deferred', + web: 'deferred', + }, + { + shape: 'list_by_band', + owner: 'getGraphSliceByReadinessBands', + tool: 'required', + rpc: 'deferred', + web: 'deferred', + }, + { + shape: 'gaps', + owner: 'getGraphGaps', + tool: 'required', + rpc: 'not_applicable', + web: 'not_applicable', + }, + { + shape: 'related', + owner: 'getRelatedNodes', + tool: 'required', + rpc: 'not_applicable', + web: 'not_applicable', + }, + { + shape: 'reconciliation_needs', + owner: 'getOpenReconciliationNeeds', + tool: 'deferred', + rpc: 'deferred', + web: 'deferred', + }, + { + shape: 'elicitation_backlog', + owner: 'getOpenElicitationBacklogEntries', + tool: 'deferred', + rpc: 'deferred', + web: 'deferred', + }, +] as const; + +type Consumer = 'tool' | 'rpc' | 'web'; + +function requiredShapesFor(consumer: Consumer): string[] { + return observedShapeLedger + .filter((row) => row[consumer] === 'required') + .map((row) => row.shape) + .sort(); +} + +function graphRpcShape(method: string): string { + return method.replace(/^graph\./, '').replace(/^nodeNeighborhood$/, 'neighborhood'); +} + +function webGraphShape(keyFactoryName: string): string { + return keyFactoryName.replace(/^nodeNeighborhood$/, 'neighborhood'); +} + +describe('graph observed-shape coverage ledger', () => { + it('names exactly one canonical graph query owner for every observed read shape', () => { + expect(observedShapeLedger).toHaveLength(8); + expect(observedShapeLedger.map((row) => row.owner)).toEqual([ + 'getGraphOverview', + 'getNodeNeighborhood', + 'getGraphSliceByKinds', + 'getGraphSliceByReadinessBands', + 'getGraphGaps', + 'getRelatedNodes', + 'getOpenReconciliationNeeds', + 'getOpenElicitationBacklogEntries', + ]); + }); + + it('keeps the read_graph tool surface aligned to the ledger-required shapes', () => { + expect([...ReadGraphParams.properties.mode.enum].sort()).toEqual(requiredShapesFor('tool')); + }); + + it('keeps the public RPC graph surface aligned to the ledger-required shapes', () => { + const actual = graphRpcMethods.map((definition) => graphRpcShape(definition.method)).sort(); + + expect(actual).toEqual(requiredShapesFor('rpc')); + }); + + it('keeps the web graph query-key surface aligned to the ledger-required shapes', () => { + const actual = Object.keys(queryKeys.graph).map(webGraphShape).sort(); + + expect(actual).toEqual(requiredShapesFor('web')); + }); +}); diff --git a/src/rpc/README.md b/src/rpc/README.md index 93aeddadb..507ea4d45 100644 --- a/src/rpc/README.md +++ b/src/rpc/README.md @@ -44,7 +44,7 @@ canonical stores: worldUpdate entries ``` -RPC handlers must not become a generic records API, REST read model, or canonical view store. Reads are named projections over the store that owns the fact. Mutations route through the owning product seam: session transcript operations through `session.*`, synchronous high-confidence response capture through `session.submitExchangeResponse` → `graph/capture` → `CommandExecutor`, review-set approval through `session.submitExchangeResponse` → `CommandExecutor.acceptReviewSet`, and other graph mutations through the agent/tool or `CommandExecutor` path that owns them. `dev.*` is the only exception family: methods in that namespace are explicitly gated local harnesses, absent from default discovery and absent from the read-only sidecar. +RPC handlers must not become a generic records API, REST read model, or canonical view store. Reads are named projections over the store that owns the fact. The current graph read subset is deliberately limited to `graph.overview` and `graph.nodeNeighborhood`; `src/graph/README.md` owns the observed-shape ledger and decides which graph-owned shapes are required, deferred, or not applicable per consumer. Mutations route through the owning product seam: session transcript operations through `session.*`, synchronous high-confidence response capture through `session.submitExchangeResponse` → `graph/capture` → `CommandExecutor`, review-set approval through `session.submitExchangeResponse` → `CommandExecutor.acceptReviewSet`, and other graph mutations through the agent/tool or `CommandExecutor` path that owns them. `dev.*` is the only exception family: methods in that namespace are explicitly gated local harnesses, absent from default discovery and absent from the read-only sidecar. ## Method registry diff --git a/src/web/README.md b/src/web/README.md index face26946..b0c8dfaad 100644 --- a/src/web/README.md +++ b/src/web/README.md @@ -4,7 +4,7 @@ Canonical references: `docs/architecture/prd.md` §Browser / web client, `src/rp This directory owns the browser client for `brunch --mode web`. The browser is a thin remote head over the Brunch host: one React app, one WebSocket-backed Brunch JSON-RPC client, TanStack Router for route/data preloading, and TanStack Query for cache ownership and update scheduling. -The web client must not read SQLite, Pi RPC, local JSONL, or `.brunch/workspace.json` directly. It speaks Brunch public RPC method names and renders product projections. +The web client must not read SQLite, Pi RPC, local JSONL, or `.brunch/workspace.json` directly. It speaks Brunch public RPC method names and renders product projections. Its current graph observer subset is `graph.overview` + `graph.nodeNeighborhood`; `src/graph/README.md` owns the observed-shape ledger and keeps additional graph-owned shapes deliberate rather than accidental bleed-through from agent/RPC needs. ## Current topology From 02fdbe8772f1fef751a7df2903cecccfc4e098a8 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 10:57:37 +0200 Subject: [PATCH 09/17] Distill ln-induct directive body (meta-distill pass) Resolve the competing-thesis framing into engine (induction) + governor (triage gate). Name canonical clusters: ladder of abstraction (step 2), Parnas/blast radius (ownership axis). Merge the posture-check insight into the find/fix principle (a true diagnosis can carry a wrong prescription). Fold the gitignored-data-plane search craft into one clause. Keep 'defects cluster' as a convergent phrase without attribution (the principle's density is convergent, not Beizer-rooted). Gate, stopping rule, output template, routing, and reconciliation tail unchanged. Amp-Thread-ID: https://ampcode.com/threads/T-019ea618-c2ac-721d-809a-a72bfd9ce453 Co-authored-by: Amp --- .agents/skills/ln-induct/SKILL.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/.agents/skills/ln-induct/SKILL.md b/.agents/skills/ln-induct/SKILL.md index 3f4a8ebf3..1472c1431 100644 --- a/.agents/skills/ln-induct/SKILL.md +++ b/.agents/skills/ln-induct/SKILL.md @@ -6,15 +6,16 @@ argument-hint: "[pasted comments/observations, or empty to fetch the current bra # Ln Induct -A bot comment is a *sample*, not a fix. Each point finding is one draw from a latent defect distribution the author can't see. The move: infer the distribution from the samples, then go fishing for the instances nobody sampled. +A bot comment is a *sample*, not a fix. Defects cluster: one finding is a single draw from a latent fault-type the author can't see. -This skill **generates** lenses. `ln-review`'s `contract` category is the **library** of lenses that have already stabilized. `ln-induct` induces a fresh lens from this batch of evidence; when a lens recurs across PRs, step 6 proposes graduating it into `ln-review`. +- **Engine:** infer the type from the sample, then fish for the instances nobody sampled. +- **Governor:** a generative audit wants to manufacture work, so a triage gate (step 3) decides what's worth fishing for. Without it the skill drifts into completionist sprawl and topical caricature (user-global `AGENTS.md` §Local necessity over category default). -Read `memory/SPEC.md` first when it exists (lexicon, live architecture register, §Acknowledged Blind Spots). Read `memory/PLAN.md` for active frontier context when the touched area is in-flight. +This skill **generates** lenses; `ln-review`'s `contract` category is the **library** of stabilized ones. Induce a fresh lens from this batch; when one recurs across PRs, propose graduating it (step 6). -## Anti-sprawl is the point of the skill +**Find and fix stay separate — including the bot's own fix.** Report and route; never auto-implement. A bot finding can be a true diagnosis carrying a wrong prescription, so validate its suggested repair against project posture, not just its claim. Routing to `ln-build`/`ln-refactor` is a separate, human-gated step. -A generative audit *wants* to manufacture work — it goes looking for more. Left ungated it becomes completionist sprawl and topical caricature (`AGENTS.md`, user-global §Local necessity over category default). The triage gate (step 3) is what keeps this a diagnostic instrument and not a make-work generator. **Find and fix are separate**: this skill produces a triaged report and names adjacent work; it does not auto-implement. Routing to `ln-build`/`ln-refactor` is a separate, human-gated step. +Read `memory/SPEC.md` first when it exists (lexicon, live architecture register, §Acknowledged Blind Spots); read `memory/PLAN.md` for active frontier context. ## Input @@ -31,7 +32,7 @@ Normalize each item to `(location, claim, suggested fix)`. Drop nothing yet. ## 2. Abstract each item to a fault type (the lens) -For each item, climb the abstraction ladder from the concrete comment toward the fault *type* behind it. The stopping rule is the whole craft here: +For each item, climb the ladder of abstraction (Hayakawa; Bret Victor) from the concrete comment toward the fault *type* behind it. The stopping rule is the whole craft: > **Stop at the lowest rung that is both mechanically searchable AND names a repair.** @@ -57,10 +58,10 @@ Fail any one → fix in place (or route the single finding), record nothing furt For each promoted lens, fish along **both** axes — not just the easy one: -- **Family axis** (syntactic / structural): find every site sharing the pattern's shape. Grep-shaped, fast. -- **Ownership axis** (responsibility / seam): audit everything a seam *owns*, to catch same-responsibility faults that share no syntax. This is the higher-value, harder sweep. **Force at least one ownership-seam question per promoted lens** — otherwise the skill quietly degenerates into "grep for the pattern." +- **Family axis** (syntactic / structural): every site sharing the pattern's shape. Grep-shaped, fast. +- **Ownership axis** (blast radius): everything a seam *owns* — same-responsibility faults that share no syntax (Parnas: a module's secret, not its shape). Higher-value, harder. **Force at least one ownership-seam question per promoted lens**, or the skill degenerates into "grep for the pattern." -Collect each hit as a candidate finding. Verify it is a real instance, not a false positive that merely matches the shape. +Mind the data plane: it is often gitignored, so `rg` silently reports it clean — enumerate committed artifacts with `git ls-files`, not an ignore-respecting glob. Verify each hit is a real instance, not a shape-only false positive. ## 5. Report From 0ed1f647111549827b918082bd84e2e50ac302ff Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 10:55:18 +0200 Subject: [PATCH 10/17] FE-810 guard minimal authority shell --- memory/PLAN.md | 12 +- memory/SPEC.md | 2 +- ...inimal-authority-shell--audit-and-guard.md | 167 ------------------ .../runtime/authority-matrix.test.ts | 91 ++++++++++ 4 files changed, 99 insertions(+), 173 deletions(-) delete mode 100644 memory/cards/minimal-authority-shell--audit-and-guard.md create mode 100644 src/.pi/extensions/runtime/authority-matrix.test.ts diff --git a/memory/PLAN.md b/memory/PLAN.md index 72cbbc019..8d04b19ca 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -35,7 +35,7 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g ### Active -1. `minimal-authority-shell` — now the next delivery-safety frontier after the elicitation-backlog substrate landed; prompt-resource body depth remains temporary cross-cut completion work outside `PLAN.md`. +- None. ### Next @@ -116,7 +116,7 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g - **Linear:** [FE-810](https://linear.app/hash/issue/FE-810/minimal-poc-authority-shell-over-graphsession-actions) - **Branch:** to create — `ln/fe-810-minimal-authority-shell` - **Kind:** hardening -- **Status:** next +- **Status:** done - **Certainty:** proving - **Stabilizes:** D20-L/D40-L command-result and elicit-mode authority seams for the current POC graph/session paths. - **Objective:** Fill only the authority behavior required for a credible POC: graph writes keep returning structured command results, `elicit` suppresses obvious side-effecting tools, and headless/RPC paths surface structured `needs_human` where the POC actually reaches human-only actions. @@ -131,7 +131,7 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g - **Cross-cutting obligations:** This is a minimal shell, not full M6. Do not widen into comprehensive RBAC/permissions unless a current POC path needs it. - **Traceability:** R5, R6, R10 / D20-L, D34-L, D40-L / A18-L, A3-L. - **Design docs:** `memory/SPEC.md` D20-L/D34-L/D40-L; `docs/reference/pi-extensions.md`. -- **Current execution pointer:** Scoped 2026-06-08 — active scope file `memory/cards/minimal-authority-shell--audit-and-guard.md`. Pre-audit during scoping found most criteria already met (CommandResult discriminants exist; `needs_human` defined but never produced; elicit already blocks bash/edit/write; D34-L command policy already at `.pi/extensions/commands/policy.ts`), so the slice is an authority-matrix audit + guard test + A18-L residue naming, not a build-out. The card forbids touching `src/.pi/agents/state.ts` so it can run as an independent worktree stream alongside `resource-body-depth` and `graph-observed-shapes`. +- **Current execution pointer:** Done 2026-06-08. Added `src/.pi/extensions/runtime/authority-matrix.test.ts` as the minimal authority guard: it locks the `CommandResult` discriminant vocabulary (including structured `needs_human` representability), proves `elicit-read-only` derives allowed/blocked tool authority from the shared projected runtime policy, and verifies the POC side-effecting tools (`bash`, `edit`, `write`) are not reachable in `elicit`. No standalone authority service was introduced, `src/.pi/agents/state.ts` stayed untouched, and A18-L strict built-in suppression remains named residue rather than closed. ### poc-live-ship-gate @@ -274,6 +274,8 @@ After the current elicitor work, the strongest follow-on coverage frontier is `g - **Design docs:** `.fixtures/seeds/bilal-port/README.md`; `docs/design/GRAPH_MODEL.md`; `docs/praxis/manual-testing.md`. ## Recently Completed +- 2026-06-08 `minimal-authority-shell` (FE-810) — Done: added the authority-matrix guard test over the current POC authority seam. The guard locks `CommandExecutor` mutation-result discriminants as the graph outcome vocabulary, proves `needs_human` is structured data rather than a TUI-only dialog, and asserts `elicit` tool authority comes from the shared projected runtime policy while blocking the identified side-effecting tools (`bash`, `edit`, `write`). No new authority service; `src/.pi/agents/state.ts` untouched; A18-L strict built-in suppression remains accepted Pi-upstream/API residue. Verified: `src/.pi/extensions/runtime/authority-matrix.test.ts` and `npm run verify`. + - 2026-06-08 cross-cut prompt-resource body-depth pass (Seam 3a/3b) — Done (1ca02e38): deepened every thin `src/.pi/skills/{goals,strategies,lenses,methods}` body to carry its per-axis facet guidance (goals→D59-L, strategies/lenses→README+D25-L, methods→D58-L tool-routing role), and added a manifest-wide readability/depth test in `src/.pi/agents/compose.test.ts` asserting every `{GOAL,STRATEGY,LENS,METHOD}_RESOURCES` location resolves and clears a ≥700-char floor. `state.ts` untouched. This closed the last row-sized cross-cut completion work; `memory/CROSS_CUT_PLAN.md` ● rows are now all built. Verified: `npm run verify` (551 tests, build). - 2026-06-08 `elicitation-backlog` (FE-823) — Done: materialized `elicitation_backlog` as a flat spec-scoped table with generated migration, seeded the grounding agenda at `createSpec`, routed create/close entry mutations through `CommandExecutor` on the shared `{specId, lsn}` / `change_log` boundary, and added graph-owned per-spec open-entry read-back. Reconciled D65-L/A24-L and updated graph/db topology docs. Verified: `src/graph/command-executor.test.ts`, `src/graph/queries.test.ts`, and `npm run verify`. @@ -296,7 +298,7 @@ nodes: capture-response-to-graph [done · P0] structured answer -> graph truth -> observer update project-graph-review-cycle [done · P1] real project-graph review-set approval loop elicitation-backlog [done · proving] materialized D65-L prospective agenda substrate and read-back - minimal-authority-shell [active · P1] thin safety posture for current POC paths + minimal-authority-shell [done · P1] thin safety posture for current POC paths poc-live-ship-gate [next · P1] final fresh-cwd composed product runbook graph-observed-shapes [next · proving] decide consumer-specific observed-shape inventory, then align graph/RPC/web runtime-affordances-and-legality [next · proving] keep posture legality/default surfaces shared across transports @@ -330,7 +332,7 @@ horizon: notes: - `elicitation-backlog` was the promoted D65-L row from `memory/CROSS_CUT_PLAN.md`; the prompt-resource body-depth pass (the last temporary cross-cut completion work) landed in 1ca02e38, so `memory/CROSS_CUT_PLAN.md` now has no row-sized work left — its only residue is the unscoped live "what to ask next" driver. - - Parallel worktree streams (2026-06-08): stream (A) `crosscut-know--resource-body-depth` → `src/.pi/skills/**` is **done** (1ca02e38). Two write-disjoint streams remain cold-startable from a clean committed base — (B) `graph-observed-shapes--coverage-ledger` → `src/graph/README.md` + `rpc`/`web` READMEs + one guard test; (C) `minimal-authority-shell--audit-and-guard` → `src/.pi/extensions/runtime/` + guard test. Invariant: **`src/.pi/agents/state.ts` is a single-writer file** — B and C must not edit it. `poc-live-ship-gate` stays gated behind `minimal-authority-shell` (hard edge); `runtime-affordances-and-legality` and `exchanges-and-generalized-capture` stay parked (shape not yet forced) and are not cold-startable worktree streams. + - Parallel worktree streams (2026-06-08): stream (A) `crosscut-know--resource-body-depth` → `src/.pi/skills/**` is **done** (1ca02e38), and stream (C) `minimal-authority-shell--audit-and-guard` → `src/.pi/extensions/runtime/` is **done**. One write-disjoint stream remains cold-startable from a clean committed base — (B) `graph-observed-shapes--coverage-ledger` → `src/graph/README.md` + `rpc`/`web` READMEs + one guard test. Invariant: **`src/.pi/agents/state.ts` is a single-writer file** — B must not edit it. `poc-live-ship-gate` is now unblocked by `minimal-authority-shell`; `runtime-affordances-and-legality` and `exchanges-and-generalized-capture` stay parked (shape not yet forced) and are not cold-startable worktree streams. - Completed prerequisites: `agents-composition-layer` supplies runtime prompt/resource posture, and `live-graph-observer` supplies the read-only web observer path expected by `capture-response-to-graph` and `poc-live-ship-gate`. - `graph-observed-shapes` is intentionally consumer-specific: do not assume every agent read shape belongs on the web observer. - `exchanges-and-generalized-capture` stays deferred until the surviving inventory is honest enough to close; do not regrow deleted `capture-*` symmetry in the meantime. diff --git a/memory/SPEC.md b/memory/SPEC.md index f3d809456..2246d8089 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -281,7 +281,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | I22-L | Brunch TUI startup must not render prior session transcript entries or enter an agent loop until the user has explicitly activated a spec/session decision; creating a new spec implicitly creates its first session, creating a new session for an existing spec lands in a binding-only session, resuming a prior transcript is opt-in, and RPC/headless startup exposes structured initial-selection state rather than invoking TUI picker code. | covered (FE-744 coordinator tests; hierarchical spec/session picker model + component tests; `workspace.selectionState` / `workspace.activate` JSON-RPC contract tests with source assertion that RPC does not import TUI picker code; `src/probes/scripts/verify-startup-no-resume.sh` pty/ANSI-stripped TUI probe oracle proving stale transcript text is absent before explicit activation) | D11-L, D21-L, D22-L, D36-L | | I23-L | Every structured elicitation interaction that owns the response surface persists durable semantic display only through Pi `toolResult` rows rendered by `renderResult`; `renderCall` and live `ctx.ui.*` surfaces are transient. A structured-exchange tuple has a recoverable `present_*` result and, when required, exactly one matching terminal `request_*` result before the next agent turn consumes it. The target details model is checked by `schema` + `v`, `exchange_id`, and `tool_meta`; request outcomes are an exactly-one property-presence union; user-authored text is `comment` and runtime-authored text is `message`; present-side status/kind/expected-request aliases and capture graph payloads are invalid in the Zod-authored schema layer. `toolResult.content` is rich markdown suitable for both TUI transcript display and model context; `toolResult.details` carries structured projection/recovery data. | covered for current structured-exchange tools (registered sequential `present_question`, `present_options`, `present_review_set`, `request_answer`, `request_choice`, `request_choices`, and `request_review`; runtime details are emitted from canonical `schema`/`v`/snake_case Zod shapes; tests cover non-semantic `renderCall`, markdown `renderResult`, present/request details, unmatched-present recovery, active-vs-stub registry, JSON-editor fallback for multi-choice, terminal `answered`/`cancelled`/`unavailable` projection closure, option content/rationale parity, review-set `nodes`/`edges` details parity, invalid review proposal non-recovery, review pending-exchange recovery, public-RPC deterministic permutations, capture response-to-graph proof, and same-assistant-message `present_options → request_choice` ordering over a real Pi RPC run. The Zod-authored schema layer is covered by JSON Schema export, drift-rejection, and source-boundary tests for present/request/capture details. `present_candidates` remains a named stub and intentionally unregistered.) | D12-L, D13-L, D17-L, D37-L, D38-L, D41-L | | I24-L | A Brunch-launched Pi runtime does not load ambient user/project Pi context files, extensions, skills, prompt templates, themes, or behavior-shaping settings unless Brunch's sealed Pi settings/extension boundary explicitly allows them; Brunch-owned extension-discovered resources are identified as intentional product resources. | covered for TUI-launch settings/extension boundary by contract tests: ambient resource flags and explicit extension factories are preserved; hostile ambient global/project settings are ignored by the in-memory Brunch settings policy before and after reload; audited Pi settings getters are tracked in `src/.pi/brunch-pi-settings.ts`. Subagent subprocess inheritance remains future coverage under I29-L. | D2-L, D39-L | -| I25-L | The active `op_mode`, `strategy`, `lens`, and `goal` are reconstructable from linear `brunch.agent_runtime_state` entries at turn start and through `session.runtimeState`; concrete axis ids stay separate from the `auto` selection sentinel; the foreground session-agent role is derived from `op_mode`, not separately stored; tool gating follows the reconstructed `op_mode` so `elicit` cannot use execute/dangerous tools such as raw `bash`/`write` unless explicitly permitted. Runtime-state projection remains transcript-backed and exposes empty/default mention, world-watermark, and lifecycle slots without inventing hidden extension memory. | covered (`src/session/runtime-state.test.ts` covers default state, cumulative last-writer-wins posture, mention/world/lifecycle slot projection, and non-linear rejection; `src/rpc/handlers.test.ts` covers explicit-target `session.runtimeState` discovery/params/spec validation; `src/.pi/__tests__/operational-mode.test.ts` covers append/project/switch helpers over the reconciled axis vocabulary, AUTO selection for every objective axis, init idempotence, previous-state values, malformed/illegal tuple rejection, role derivation from `op_mode`, and Pi JSONL reload projection; `prompting.test.ts` covers prompt/tool-policy projection from the same transcript-backed runtime state, including selected-spec grade activation for commitment-grade `present_review_set` / `request_review` proposal tools). | D17-L, D23-L, D40-L, D58-L, D59-L | +| I25-L | The active `op_mode`, `strategy`, `lens`, and `goal` are reconstructable from linear `brunch.agent_runtime_state` entries at turn start and through `session.runtimeState`; concrete axis ids stay separate from the `auto` selection sentinel; the foreground session-agent role is derived from `op_mode`, not separately stored; tool gating follows the reconstructed `op_mode` so `elicit` cannot use execute/dangerous tools such as raw `bash`/`write` unless explicitly permitted. Runtime-state projection remains transcript-backed and exposes empty/default mention, world-watermark, and lifecycle slots without inventing hidden extension memory. | covered (`src/session/runtime-state.test.ts` covers default state, cumulative last-writer-wins posture, mention/world/lifecycle slot projection, and non-linear rejection; `src/rpc/handlers.test.ts` covers explicit-target `session.runtimeState` discovery/params/spec validation; `src/.pi/__tests__/operational-mode.test.ts` covers append/project/switch helpers over the reconciled axis vocabulary, AUTO selection for every objective axis, init idempotence, previous-state values, malformed/illegal tuple rejection, role derivation from `op_mode`, and Pi JSONL reload projection; `prompting.test.ts` covers prompt/tool-policy projection from the same transcript-backed runtime state, including selected-spec grade activation for commitment-grade `present_review_set` / `request_review` proposal tools; `src/.pi/extensions/runtime/authority-matrix.test.ts` covers the current POC authority matrix for `elicit-read-only`, blocking `bash`/`edit`/`write`, and structured `needs_human` result representability while leaving A18-L strict built-in suppression as residue). | D17-L, D23-L, D40-L, D58-L, D59-L | | I27-L | Session display names are presentation metadata only: every Brunch-created session gets a neutral workspace-global default `session_info` label (`Untitled Session N`) at creation, unchanged defaults do not collide across specs in one cwd, later user/generated names may replace the default, and no naming path mutates spec identity, session binding, or graph truth. | planned (creation/boundary tests for workspace-global default allocation across specs and replacement sessions; session-lifecycle naming tests with empty transcript/auth failure/success paths; picker/chrome projection tests read session names when present) | D6-L, D21-L, D35-L, D42-L | | I26-L | Runtime schema-library imports stay deliberately scoped: Zod may appear only in D41-L-acknowledged product/protocol schema seams such as `src/.pi/extensions/exchanges/schemas/`; TypeBox remains valid for unrelated Pi tool parameters, small config/frontmatter contracts, and future Drizzle-derived row schemas; no boundary may hand-author parallel Zod and TypeBox sources for the same shape. Drizzle row/insert/update schemas are not hand-authored alongside their target tables. | covered (structured-exchange schema tests prove Zod parse/export and assert semantic details contracts stay in `src/.pi/extensions/exchanges/schemas/`; the legacy `shared/model.ts` details interface is retired; structured-exchange TypeBox usage is quarantined to the single Pi `TSchema` cast adapter in `src/.pi/extensions/exchanges/pi-schema.ts`; grep-based architectural boundary test in `architecture.test.ts` enforces no direct `db/` imports outside `graph/`; Drizzle derivation via `drizzle-typebox` in `row-schemas.ts`) | D41-L | | I28-L | Auto-compaction output preserves the configured anchor set byte-stable: every entry kind listed in [src/.pi/extensions/compaction/index.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/.pi/extensions/compaction/index.ts) is reconstructable post-compaction according to its `select` rule (`first | latest | active-leaves | all-unresolved`); LLM-generated narrative summary never replaces or rephrases preserved-anchor content; extension failure falls through to Pi default compaction rather than dropping anchors silently. | planned (compaction round-trip property tests at M9 plus inner-loop anchor-rendering unit tests and TypeBox schema validation of the anchor contract) | D43-L; R15, R13; I3-L, I4-L, I8-L, I12-L | diff --git a/memory/cards/minimal-authority-shell--audit-and-guard.md b/memory/cards/minimal-authority-shell--audit-and-guard.md deleted file mode 100644 index 0a0f9ce9d..000000000 --- a/memory/cards/minimal-authority-shell--audit-and-guard.md +++ /dev/null @@ -1,167 +0,0 @@ -# Minimal POC authority shell — audit and guard - -Frontier: minimal-authority-shell -Status: active -Mode: single -Created: 2026-06-08 - -## Orientation - -- **Containing seam:** the POC authority surface over current graph/session write paths — - `CommandExecutor` result discriminants in - [src/graph/command-executor.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/graph/command-executor.ts), - the `elicit` tool policy in - [src/projections/session/runtime-policy.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/projections/session/runtime-policy.ts) - applied by [src/.pi/extensions/runtime/index.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/.pi/extensions/runtime/index.ts), - the D34-L command containment in - [src/.pi/extensions/commands/policy.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/.pi/extensions/commands/policy.ts), - and the public RPC mutation surfacing in - [src/rpc/methods/session.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/rpc/methods/session.ts). -- **Relevant frontier item:** `minimal-authority-shell` (FE-810) in - [memory/PLAN.md](file:///Users/lunelson/Code/hashintel/brunch-next/memory/PLAN.md) §Frontier Definitions - (`Status: next` / now active, `Kind: hardening`, `Certainty: proving`). Branch to create: - `ln/fe-810-minimal-authority-shell`. -- **Volatile state (pre-audited during scoping — start informed, not cold):** - - The `CommandResult` union **already defines** `success | structural_illegal | needs_human | - policy_blocked | version_conflict`; mutation paths already return `success` / `structural_illegal`. - - `needs_human` is **defined but never produced** by any current path — no `return { status: - 'needs_human' }` exists. So criterion (3) is mostly "confirm it is representable end-to-end and - no path assumes a TUI-only dialog," not a large build. - - `elicit` policy **already blocks** `bash | edit | write` (allow-list `read | grep | find | ls`) - via the `tool_call` and `user_bash` hooks; `setActiveTools` hides the rest. - - D34-L command containment **already exists** at `.pi/extensions/commands/policy.ts`. - - Public RPC mutations (`session.submitExchangeResponse`) **already surface** structured - discriminants (`captured | no_capture | structural_illegal | accepted | request_changes | - rejected`) rather than throwing for expected outcomes. -- **Main open risk:** **over-building.** Most criteria are already met; the real work is an audit + - regression guard + naming the A18-L residue, NOT inventing M6 RBAC, a new authority service, or a - `needs_human` producer that no POC path actually needs. - -Posture: **proving** (inherited from `minimal-authority-shell`). Reshaped to score on the -**invariants** axis: landing this slice locks the "CommandExecutor discriminants are the only graph -mutation outcome surface" invariant with a guard test and ratifies the elicit tool-authority -contract, so accidental future bypass fails a test rather than silently shipping. - -Frontier-level cross-cutting obligations: - -- **D20-L:** `CommandExecutor` result discriminants are the only graph mutation outcome surface for - agent, RPC, and capture writes — no path throws for an expected authority/validation outcome. -- **D34-L:** keep command containment in `.pi/extensions/commands/policy.ts`; do not reintroduce a - branch-only module or treat command-name collisions as allowlisting. -- **D40-L:** tool authority is a pure derivation over the shared projected runtime policy; do not add - a second authority list. **Do not modify `src/.pi/agents/state.ts`** in this slice — import its - `activeToolNamesForPosture` read-only; the manifest/legality file is reserved for other streams. -- **A18-L:** strict interactive built-in suppression remains a Pi upstream/API limit; name it - explicitly as accepted residue, do not pretend to close it. - -### Target Behavior - -The current POC graph/session write and tool-authority paths are proven by a single authority-matrix -guard test to route every mutation outcome through `CommandExecutor` discriminants, block the -identified side-effecting tools in `elicit`, and represent `needs_human` as a structured headless/RPC -result rather than a TUI-only dialog — with the A18-L residue named, not closed. - -### Boundary Crossings - -``` -→ src/graph/command-executor.ts (CommandResult discriminants — the outcome vocabulary) -→ src/projections/session/runtime-policy.ts (elicit allow/block policy — read/confirm) -→ src/.pi/extensions/runtime/index.ts (policy application hooks — read/confirm) -→ src/rpc/methods/session.ts (discriminant → RPC shape mapping; needs_human representable) -→ a new authority-matrix guard test (asserts the four criteria over current POC paths) -``` - -### Risks and Assumptions - -``` -- RISK: the slice balloons into full M6 RBAC / a standalone authority service. - → MITIGATION: acceptance is audit + guard + residue-naming; the frontier explicitly forbids a new - authority service. If the audit finds a genuine missing producer/blocker, fill ONLY that one - concrete gap; anything larger routes back to ln-plan, it does not expand this card. -- RISK: adding a needs_human producer the POC does not actually reach (speculative). - → MITIGATION: only assert needs_human is representable end-to-end (type + RPC/headless mapping + - no TUI-dialog assumption). Do not invent a POC path that produces it unless one already reaches - a human-only action; the audit determines this. -- ASSUMPTION: the elicit block-list (bash/edit/write) is the complete set of "side-effecting tools - identified as unsafe for the POC." - → IMPACT IF FALSE: a side-effecting tool stays callable in elicit; small, additive fix to the - shared policy block-list. - → VALIDATE: the audit enumerates registered tools vs the elicit allow/block sets and asserts no - side-effecting tool is reachable. - → [→ memory/SPEC.md A18-L, D34-L] -``` - -### Posture check - -Proving posture, invariants axis. Landing this slice **locates and locks** the authority seam: the -guard test makes the D20-L "discriminants are the only mutation outcome" and the elicit tool-authority -contract executable, so the next person who adds a bypassing write path or an unguarded -side-effecting tool fails a test. It tells us something concrete — it converts "the POC looks safe" -into "the POC's authority contract is asserted." No high-impact assumption is left unretired; the one -assumption (block-list completeness) is validated by the audit the card performs. - -### Acceptance Criteria - -```pseudo tree -minimal authority shell -├── discriminant surface (D20-L) -│ ├── ✓ every current graph mutation path (agent graph tool, capture write, review accept) -│ │ returns a CommandResult discriminant; none throws for an expected authority/validation outcome -│ └── ✓ RPC/headless maps each discriminant to a structured response shape (no TUI-only assumption) -├── elicit tool authority (D40-L) -│ ├── ✓ elicit blocks every identified side-effecting tool (bash/edit/write) via tool_call + user_bash -│ ├── ✓ no registered side-effecting tool is reachable in elicit (allow-list is complete for the POC) -│ └── ✓ tool authority derives from the shared projected policy only (no second list; state.ts untouched) -├── needs_human representability (criterion 3) -│ ├── ✓ a needs_human CommandResult maps to a structured headless/RPC result, not a thrown TUI dialog -│ └── ✓ if no current POC path produces needs_human, that is recorded as intended (representable, unused) -└── scope discipline - ├── ✓ no new standalone authority service introduced - └── ✓ A18-L strict-built-in-suppression residue is named explicitly, not silently treated as closed -``` - -### Verification Approach - -``` -- Inner: an authority-matrix guard test (new) over current POC paths — asserts discriminant coverage, - elicit block/allow completeness, and needs_human structured representability. Existing - command-executor / runtime-policy / rpc handler tests still pass. -- Inner (gate): `npm run verify` (fix → test → build). -- Outer: manual smoke ONLY if a TUI-visible policy path changes (likely none; this is audit + guard). -``` - -### Cross-cutting obligations - -``` -- D20-L: discriminants are the only mutation outcome surface; no throw for expected outcomes. -- D34-L: command containment stays in .pi/extensions/commands/policy.ts. -- D40-L: tool authority is a pure derivation; DO NOT modify src/.pi/agents/state.ts (read-only import). -- A18-L: name strict built-in suppression as accepted Pi-upstream residue. -- This is a minimal shell, not M6: no RBAC, no permissions matrix, no authority service. -``` - -### Expected touched paths (tentative) - -```pseudo tree -src/.pi/extensions/runtime/ -└── authority-matrix.test.ts + (the guard test — primary deliverable) -src/projections/session/runtime-policy.ts ? (read/confirm; touch only if block-list incomplete) -src/.pi/extensions/runtime/index.ts ? (read/confirm; touch only if a hook gap is found) -src/graph/command-executor.ts ? (read-only unless a discriminant gap is found) -src/rpc/methods/session.ts ? (touch only if needs_human mapping is missing) -``` - -Lane discipline for parallel worktrees: -- **Does not** write `src/.pi/skills/**` (the `resource-body-depth` builder owns that). -- **Does not** write `src/graph/README.md`, `src/rpc/README.md`, `src/web/README.md`, or - `src/graph/observed-shapes-coverage.test.ts` (the `graph-observed-shapes` ledger owns those). -- **Does not** write `src/.pi/agents/state.ts` (reserved single-writer file; import read-only). - -### Traceability - -- **SPEC:** D20-L (command-result discriminants), D34-L (command containment), D40-L (projected tool - authority), A18-L (strict-built-in-suppression residue), A3-L. -- **Requirements:** R5, R6, R10. -- **Frontier:** satisfies the `minimal-authority-shell` acceptance leaves via audit + guard; any - concrete gap the audit surfaces is filled in-place, anything larger routes back to `ln-plan`. -- **Design docs:** `memory/SPEC.md` D20-L/D34-L/D40-L; `docs/reference/pi-extensions.md`. diff --git a/src/.pi/extensions/runtime/authority-matrix.test.ts b/src/.pi/extensions/runtime/authority-matrix.test.ts new file mode 100644 index 000000000..3208a0069 --- /dev/null +++ b/src/.pi/extensions/runtime/authority-matrix.test.ts @@ -0,0 +1,91 @@ +import type { ExtensionAPI } from '@earendil-works/pi-coding-agent'; +import { describe, expect, it } from 'vitest'; + +import type { CommandResult } from '../../../graph/command-executor.js'; +import { + isToolBlockedForRuntimeState, + TOOL_POLICY_DEFINITIONS, +} from '../../../projections/session/runtime-policy.js'; +import { DEFAULT_BRUNCH_AGENT_STATE } from '../../../session/runtime-state.js'; +import { activeToolNamesForBrunchAgentState, projectBrunchAgentState } from './index.js'; + +const SIDE_EFFECTING_POC_TOOLS = ['bash', 'edit', 'write'] as const; +const REGISTERED_POC_TOOLS = [ + 'read', + 'grep', + 'find', + 'ls', + ...SIDE_EFFECTING_POC_TOOLS, + 'present_question', + 'request_answer', + 'commit_graph', +] as const; + +function piWithRegisteredTools(toolNames: readonly string[]): ExtensionAPI { + return { + getAllTools: () => toolNames.map((name) => ({ name })), + } as ExtensionAPI; +} + +function commandResultStatus(result: CommandResult): CommandResult['status'] { + return result.status; +} + +describe('minimal authority matrix', () => { + it('keeps the CommandExecutor discriminant vocabulary as the graph mutation outcome surface', () => { + const statuses = [ + commandResultStatus({ status: 'success', nodeId: 1, lsn: 1 }), + commandResultStatus({ + status: 'success', + lsn: 1, + createdNodes: {}, + edges: [], + }), + commandResultStatus({ + status: 'structural_illegal', + diagnostics: [{ field: 'nodes', message: 'invalid graph mutation' }], + }), + commandResultStatus({ status: 'needs_human' }), + commandResultStatus({ status: 'policy_blocked' }), + commandResultStatus({ status: 'version_conflict' }), + ]; + + expect(statuses).toEqual([ + 'success', + 'success', + 'structural_illegal', + 'needs_human', + 'policy_blocked', + 'version_conflict', + ]); + }); + + it('derives elicit tool authority from the shared runtime policy and blocks side-effecting POC tools', () => { + const state = projectBrunchAgentState([{ data: { state: DEFAULT_BRUNCH_AGENT_STATE } }]); + const policy = TOOL_POLICY_DEFINITIONS[state.operationalModeDefinition.toolPolicyId]; + + expect(policy.id).toBe('elicit-read-only'); + expect(policy.baseAllowedToolNames).toEqual(['read', 'grep', 'find', 'ls']); + expect(policy.blockedToolNames).toEqual([...SIDE_EFFECTING_POC_TOOLS]); + + for (const toolName of SIDE_EFFECTING_POC_TOOLS) { + expect(isToolBlockedForRuntimeState(state, toolName)).toBe(true); + } + + expect(activeToolNamesForBrunchAgentState(piWithRegisteredTools(REGISTERED_POC_TOOLS), state)).toEqual([ + 'read', + 'grep', + 'find', + 'ls', + 'present_question', + 'request_answer', + ]); + }); + + it('represents needs_human as structured data instead of a TUI-only dialog', () => { + const result = { status: 'needs_human' } satisfies CommandResult; + + expect(commandResultStatus(result)).toBe('needs_human'); + expect(JSON.parse(JSON.stringify(result))).toEqual({ status: 'needs_human' }); + }); +}); From 838733919228518d1e02ee311344ad0ab24b1915 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 11:05:34 +0200 Subject: [PATCH 11/17] Reconcile post-merge drift from parallel B/C streams Both graph-observed-shapes (85e73ba7) and minimal-authority-shell (68474e3f) landed in parallel; each reconciled its own slice but could not see the other's completion. Fix the residual drift: mark graph-observed-shapes done in the dependency node, update the context paragraph and parallel-stream note to reflect all three streams landed with no collisions, and delete the exhausted graph-observed-shapes card (B left it as a done tombstone; ln-scope deletes exhausted cards, matching C's cleanup). Amp-Thread-ID: https://ampcode.com/threads/T-019ea2fc-9f12-767d-bd7a-08497f7307fd Co-authored-by: Amp --- memory/PLAN.md | 6 +- .../graph-observed-shapes--coverage-ledger.md | 168 ------------------ 2 files changed, 3 insertions(+), 171 deletions(-) delete mode 100644 memory/cards/graph-observed-shapes--coverage-ledger.md diff --git a/memory/PLAN.md b/memory/PLAN.md index 8d04b19ca..8958006cb 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -29,7 +29,7 @@ The multi-spec workspace model is now explicit: a workspace is the cwd; multiple Planning is currently carrying two shapes at once: canonical frontier sequencing in this file, and a temporary elicitor capability ledger in `memory/CROSS_CUT_PLAN.md`. The authority split must stay hard: `PLAN.md` owns frontier ids, ordering, and dependency judgments; `CROSS_CUT_PLAN.md` only inventories the temporary READ/WRITE/KNOW row surface. The current planning move is therefore to promote any cross-cut row that has escaped row-sized work back into a real frontier. `elicitation-backlog` was the first such promotion and is now landed; the remaining prompt-resource body-depth pass stays temporary cross-cut completion work. -After the current elicitor work, the strongest follow-on coverage frontier is `graph-observed-shapes`: decide the observed-shape inventory per consumer, then align graph/RPC/web to it. `runtime-affordances-and-legality` remains the next likely coverage frontier behind that. Exchange/capture breadth is explicitly deferred until its surviving inventory is honest enough to enumerate without recreating the deleted stub surface. +The `graph-observed-shapes` coverage frontier has now landed (the consumer-specific read-shape inventory is ratified in `src/graph/README.md` and guarded by a drift test). With `minimal-authority-shell` also done, the active delivery path is `poc-live-ship-gate` (now unblocked). `runtime-affordances-and-legality` remains the next likely coverage frontier but stays parked until a posture/UI pass forces its shape. Exchange/capture breadth is explicitly deferred until its surviving inventory is honest enough to enumerate without recreating the deleted stub surface. ## Sequencing @@ -300,7 +300,7 @@ nodes: elicitation-backlog [done · proving] materialized D65-L prospective agenda substrate and read-back minimal-authority-shell [done · P1] thin safety posture for current POC paths poc-live-ship-gate [next · P1] final fresh-cwd composed product runbook - graph-observed-shapes [next · proving] decide consumer-specific observed-shape inventory, then align graph/RPC/web + graph-observed-shapes [done · proving] ratified consumer-specific observed-shape ledger + drift guard; no transport shape shipped runtime-affordances-and-legality [next · proving] keep posture legality/default surfaces shared across transports probes-and-transcripts-evolution [parallel] continuous evidence substrate topology-readmes-and-boundaries [parallel] attach-to-frontier topology hardening @@ -332,7 +332,7 @@ horizon: notes: - `elicitation-backlog` was the promoted D65-L row from `memory/CROSS_CUT_PLAN.md`; the prompt-resource body-depth pass (the last temporary cross-cut completion work) landed in 1ca02e38, so `memory/CROSS_CUT_PLAN.md` now has no row-sized work left — its only residue is the unscoped live "what to ask next" driver. - - Parallel worktree streams (2026-06-08): stream (A) `crosscut-know--resource-body-depth` → `src/.pi/skills/**` is **done** (1ca02e38), and stream (C) `minimal-authority-shell--audit-and-guard` → `src/.pi/extensions/runtime/` is **done**. One write-disjoint stream remains cold-startable from a clean committed base — (B) `graph-observed-shapes--coverage-ledger` → `src/graph/README.md` + `rpc`/`web` READMEs + one guard test. Invariant: **`src/.pi/agents/state.ts` is a single-writer file** — B must not edit it. `poc-live-ship-gate` is now unblocked by `minimal-authority-shell`; `runtime-affordances-and-legality` and `exchanges-and-generalized-capture` stay parked (shape not yet forced) and are not cold-startable worktree streams. + - Parallel worktree streams (2026-06-08): all three landed — (A) `crosscut-know--resource-body-depth` (1ca02e38), (B) `graph-observed-shapes--coverage-ledger` (85e73ba7), (C) `minimal-authority-shell--audit-and-guard` (68474e3f); each kept to its declared write paths and left `src/.pi/agents/state.ts` untouched, so the parallel run produced no collisions. `poc-live-ship-gate` is now unblocked (its hard dependency `minimal-authority-shell` is done); `runtime-affordances-and-legality` and `exchanges-and-generalized-capture` stay parked (shape not yet forced) and are not cold-startable worktree streams. - Completed prerequisites: `agents-composition-layer` supplies runtime prompt/resource posture, and `live-graph-observer` supplies the read-only web observer path expected by `capture-response-to-graph` and `poc-live-ship-gate`. - `graph-observed-shapes` is intentionally consumer-specific: do not assume every agent read shape belongs on the web observer. - `exchanges-and-generalized-capture` stays deferred until the surviving inventory is honest enough to close; do not regrow deleted `capture-*` symmetry in the meantime. diff --git a/memory/cards/graph-observed-shapes--coverage-ledger.md b/memory/cards/graph-observed-shapes--coverage-ledger.md deleted file mode 100644 index a571cc544..000000000 --- a/memory/cards/graph-observed-shapes--coverage-ledger.md +++ /dev/null @@ -1,168 +0,0 @@ -# Graph observed-shape coverage ledger - -Frontier: graph-observed-shapes -Status: done -Mode: single -Created: 2026-06-08 - -## Orientation - -- **Containing seam:** the graph read surface — domain reads in - [src/graph/queries.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/graph/queries.ts) - exposed to three consumers: the Pi `read_graph` tool - ([src/.pi/extensions/graph/index.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/.pi/extensions/graph/index.ts)), - public RPC ([src/rpc/methods/graph.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/rpc/methods/graph.ts)), - and the web observer ([src/web/queries/graph.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/web/queries/graph.ts)). - Spec-scoped reader wiring is in [src/graph/workspace-store.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/graph/workspace-store.ts) (`SpecScopedReaders` / `forSpec`). -- **Relevant frontier item:** `graph-observed-shapes` in - [memory/PLAN.md](file:///Users/lunelson/Code/hashintel/brunch-next/memory/PLAN.md) §Frontier Definitions - (`Status: next`, `Certainty: proving`). Its execution pointer says: author via `ln-scope` as a - coverage ledger once the active frontier closes. This card is that ledger slice. -- **Volatile state:** the read surface is **asymmetric by consumer**. The `read_graph` tool exposes - 6 shapes (`overview`, `neighborhood`, `list_by_kind`, `list_by_band`, `gaps`, `related`); RPC and - web expose only 2 (`overview`, `neighborhood`). Two further graph-owned register reads - (`getOpenReconciliationNeeds`, `getOpenElicitationBacklogEntries`) have **no transport consumer - yet** (tests only; `elicitation_backlog` read-back is the per-turn-driver follow-on from FE-823). -- **Main open risk / insight:** the asymmetry is **probably correct, not a gap**. The frontier's real - job is to *decide and ratify* which shapes each consumer needs — agent/RPC-only shapes are allowed - to stay agent/RPC-only — and to guard that decision so new shapes don't bleed onto the web - accidentally. The risk is treating "tool has 6, web has 2" as a coverage hole and over-promoting. - -Posture: **proving** (inherited from `graph-observed-shapes`). Reshaped to give the decision teeth: -landing this slice *stabilizes the D60-L read-shape ownership seam* (invariants axis) via a durable -ledger + a coverage-guard test, rather than being a pure study/doc step. - -Frontier-level cross-cutting obligations (from the frontier definition): - -- **D60-L:** read-shape ownership stays explicit; each required consumer shape has exactly one - canonical owner (the domain read in `graph/queries.ts`), not adapter-local formatting standing in - for a durable read shape. -- **D33-L:** web is a read-only observer; web adoption of a shape must be deliberate, never accidental - bleed-through from agent/RPC needs. -- **D52-L:** `src/projections/` exists only for reusable multi-consumer DTOs. Single-owner reads stay - in their owning domain. Do not create a graph projection module to host a single-consumer shape. -- Keep graph-owned read logic out of `db/`; keep `renderers/` limited to durable LLM/session text, - not arbitrary observer DTOs. - -### Target Behavior - -A closed observed-shape coverage ledger exists as a durable artifact that classifies every -`src/graph/queries.ts` read shape as required or deferred per consumer with one named canonical owner, -and a guard test asserts each consumer's actual graph-read surface equals its ledger-required set. - -### Boundary Crossings - -``` -→ src/graph/queries.ts (the canonical read shapes — owners) -→ src/graph/README.md (ledger artifact: shape × consumer matrix + owner column) -→ src/rpc/README.md (consumer-subset note pointing at the ledger) -→ src/web/README.md (consumer-subset note pointing at the ledger) -→ a coverage-guard test (asserts actual surfaces == ledger-required sets) -``` - -### Risks and Assumptions - -``` -- RISK: the ledger could be read as a mandate to add the 4 tool-only shapes to RPC/web. - → MITIGATION: the ledger marks list_by_kind/list_by_band as "web-eligible, DEFERRED until a web - feature needs them" and related/gaps as "agent/RPC-only"; no transport shape is added in this - slice. Any "required but missing" row spawns a SEPARATE follow-on alignment card (scoped after - the ledger is accepted, because its scope depends on this card's decisions). -- RISK: a coverage-guard test that hardcodes string lists could rot silently. - → MITIGATION: derive the actual sets from the real surfaces where cheap (read_graph mode union, - web query-keys graph group, RPC graph method names) and compare to the ledger's declared sets, - so adding a real shape without updating the ledger fails the test. -- ASSUMPTION: the current asymmetry (tool 6 / RPC 2 / web 2) is intentional, not a delivery gap. - → IMPACT IF FALSE: if a POC web feature actually needs list_by_kind/list_by_band now, this slice - under-delivers and an alignment card is needed immediately — but that card is cheap and additive - and does not invalidate the ledger. - → VALIDATE: the ledger decision itself; the frontier definition already states list_by_kind/ - list_by_band are "plausible web shapes" (eligible, not yet required) and related/gaps "may - remain agent/RPC-only". - → [→ memory/SPEC.md D60-L read-shape ownership] -``` - -### Posture check - -Proving posture. This slice scores on the **invariants** axis: it locates and stabilizes the -read-shape ownership seam (D60-L) by ratifying the consumer-specific inventory and installing a -regression guard against accidental web/RPC bleed-through. It is reshaped from a pure decision/doc -step into a slice with a failing-then-passing test, so it *tells us something*: it proves the -tool-vs-transport asymmetry is the intended contract. No high-impact assumption is left unretired — -the only assumption (asymmetry is intentional) is the decision this card closes. - -### Acceptance Criteria - -```pseudo tree -observed-shape coverage ledger -├── ledger artifact (src/graph/README.md) -│ ├── ✓ every src/graph/queries.ts read shape appears as a row (8 shapes incl. both register reads) -│ ├── ✓ each row marks required | deferred | n/a for each consumer (tool, RPC, web) -│ ├── ✓ each required shape names exactly one canonical owner (graph/queries.ts function) -│ └── ✓ deferred rows carry a one-line reason (e.g. "web-eligible, await web feature"; -│ "agent/RPC-only"; "agent-internal register read, no transport consumer yet") -├── decisions encoded -│ ├── ✓ overview + neighborhood = required for tool, RPC, and web (already present) -│ ├── ✓ list_by_kind + list_by_band = required tool; web-eligible but DEFERRED; RPC follows web -│ ├── ✓ gaps + related = required tool; agent/RPC-only; NOT web -│ └── ✓ reconciliation_needs + elicitation_backlog = agent-internal; deferred from RPC/web -├── consumer-subset notes -│ ├── ✓ src/rpc/README.md states its graph subset {overview, nodeNeighborhood} + points at the ledger -│ └── ✓ src/web/README.md states its graph subset {overview, nodeNeighborhood} + points at the ledger -└── guard test - ├── ✓ asserts read_graph tool mode set == ledger tool-required set - ├── ✓ asserts RPC graph method set == ledger RPC-required set {overview, nodeNeighborhood} - └── ✓ asserts web graph query-key group == ledger web-required set {overview, nodeNeighborhood} -``` - -### Verification Approach - -``` -- Inner: unit/structural test — the coverage-guard test (derives actual consumer surfaces, compares - to declared ledger-required sets); existing graph query / RPC / web query tests still pass. -- Inner (gate): `npm run verify` (fix → test → build) proves no surface or wiring regressed. -- Middle/Outer: none — no new transport shape ships in this slice, so no observer/probe change is - needed. (A future alignment card, if one is spawned, owns its own middle-tier read-path proof.) -``` - -### Cross-cutting obligations - -``` -- D60-L: one canonical owner per required shape; no adapter-local read shape masquerading as durable. -- D33-L: web stays read-only; no web shape added in this slice; ledger makes web adoption deliberate. -- D52-L: no new src/projections/ module for a single-consumer shape; the only shared DTOs are the - existing GraphOverview / NeighborhoodResult types already imported by web — confirm, don't expand. -- Keep graph read logic out of db/; keep renderers/ for durable text, not observer DTOs. -``` - -### Expected touched paths (tentative) - -```pseudo tree -src/graph/ -├── README.md ~ (ledger artifact: shape × consumer matrix + owner column) -├── observed-shapes-coverage.test.ts + (coverage-guard test) — OR extend an existing graph test -└── queries.ts ? (read-only; touched only if a row needs an owner comment) -src/rpc/README.md ~ (graph consumer-subset note → ledger) -src/web/README.md ~ (graph consumer-subset note → ledger) -``` - -No overlap with the active `crosscut-know--resource-body-depth` builder (`src/.pi/skills/**`) or any -`src/db/**` work. This card writes only to `src/graph/`, `src/rpc/README.md`, `src/web/README.md`. - -## Follow-on note (do NOT pre-scope here) - -If the ledger marks any shape **required but missing** for a transport consumer, that alignment -(graph → RPC → web wiring for that shape) is a separate card scoped *after* this ledger is accepted — -its scope depends on this card's decisions, so per the chain anti-speculation rule it is not -pre-scoped. The expected outcome is that **no transport shape is currently required-but-missing**, so -the frontier likely closes with ratification + guard rather than new wiring. - -### Traceability - -- **SPEC:** D60-L (read-shape ownership), D33-L (web read-only observer), D52-L (projections = - reusable multi-consumer DTOs only), D51-L (graph code projection), D64-L (readiness bands feeding - `list_by_band`). -- **Frontier:** closes the `graph-observed-shapes` "closed enumerated coverage ledger" and - "one canonical owner per required shape" acceptance leaves; ratifies the consumer-specific - asymmetry the frontier was created to make legible. -- **Design docs:** `src/graph/README.md`, `src/rpc/README.md`, `src/web/README.md`. From bfb1ac0f11f3e1e3b82a3d9df7c2dd294be0fc55 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 11:28:58 +0200 Subject: [PATCH 12/17] Reconcile cross-cut closure and de-fog next coverage frontiers Correct the premature 'cross-cut exhausted' claim: the Seam 3a 'what to ask next' driver row is still partial, so the seam stays open. Dispose it as the promoted elicitation-driver frontier (buildable now). De-fog the remaining coverage frontiers rather than parking them: - runtime-affordances-and-legality reclassified buildable-now; scoped as a coverage ledger card with only review-set/turn-mode rows tripwired. - exchanges-and-generalized-capture is evidence-gated; named forcing function is a capture-quality fitness spike card (A22-L). Amp-Thread-ID: https://ampcode.com/threads/T-019ea2fc-9f12-767d-bd7a-08497f7307fd Co-authored-by: Amp --- memory/CROSS_CUT_PLAN.md | 17 +-- memory/PLAN.md | 49 +++++-- .../cards/capture-quality--fitness-spike.md | 82 ++++++++++++ .../runtime-affordances--coverage-ledger.md | 121 ++++++++++++++++++ 4 files changed, 252 insertions(+), 17 deletions(-) create mode 100644 memory/cards/capture-quality--fitness-spike.md create mode 100644 memory/cards/runtime-affordances--coverage-ledger.md diff --git a/memory/CROSS_CUT_PLAN.md b/memory/CROSS_CUT_PLAN.md index 09b440cdb..b847e0658 100644 --- a/memory/CROSS_CUT_PLAN.md +++ b/memory/CROSS_CUT_PLAN.md @@ -36,7 +36,7 @@ itself. - `memory/PLAN.md` owns frontier ids, sequencing, dependency judgment, and which work is active next. - This file owns only the temporary elicitor READ / WRITE / KNOW row inventory and its aggregate coverage DoD. -- When one row escapes row-sized work, it gets promoted back into PLAN. As of 2026-06-08, the D65-L row is now the active PLAN frontier `elicitation-backlog` (landed), and the prompt-resource body-depth pass landed in 1ca02e38. All ● rows are now `have`/`built`; the only remaining cross-cut residue is the live per-turn "what to ask next" driver, which is an unscoped PLAN follow-on, not row-sized work. +- When one row escapes row-sized work, it gets promoted back into PLAN. As of 2026-06-08, the D65-L substrate row is the landed PLAN frontier `elicitation-backlog`, and the prompt-resource body-depth pass landed in 1ca02e38. **The cross-cut is not yet exhausted:** the Seam 3a `"what to ask next" driver` row is still `partial · ●`, so by the aggregate DoD below this seam stays open. That row has now escaped row-sized work and is disposed as the PLAN frontier `elicitation-driver` — when it lands, this row flips to `built` and the cross-cut closes. ## The seams (locked) @@ -117,7 +117,7 @@ DoD: every ● row is `have` or `built`. | goals / strategies / lenses scaffolding + legal-tuple gating | have | ● | — | — | `.pi/agents/state.ts` | | goal/strategy/lens **content depth** | built | ● | — | done — deepened bodies + manifest-wide depth test (1ca02e38) | each body now carries its facet guidance; ≥700-char floor guarded in `compose.test.ts` | | `freestyle` strategy | built | ● | — | done — pin-only strategy (8de7f166) | AUTO-excluded, no added authority; D66-L | -| "what to ask next" driver | partial | ● | proving | unscoped follow-on | flat-table substrate landed via FE-823; live per-turn driver + capture-reflection remain follow-on work | +| "what to ask next" driver | partial | ● | proving | promoted → PLAN `elicitation-driver` | flat-table substrate landed via FE-823; live per-turn driver + capture-reflection now disposed as the `elicitation-driver` frontier (last open ● row) | ### Seam 3b — KNOW / mechanics (methods) @@ -264,13 +264,14 @@ order is coverage-driven: close ● ledger rows seam by seam. This also closed the Seam 3a `freestyle` and Seam 3b generalized-capture ● rows. No posture-switch tool to build (Q4 dissolved); user/system posture surface is deferred to the Q-state affordance reducer. -4. **Seam 3a/3b content pass** — **COMPLETE** (all ● rows built): `freestyle` strategy - (8de7f166), generalized-capture core (5f5e6ac8), exchange-tool `.description()` / +4. **Seam 3a/3b content pass** — **NEARLY COMPLETE** (one ● row still `partial`): `freestyle` + strategy (8de7f166), generalized-capture core (5f5e6ac8), exchange-tool `.description()` / `promptGuidelines` (drift correction 2026-06-07), and goal/strategy/lens/method body depth - (1ca02e38 — deepened bodies + a manifest-wide ≥700-char depth test in `compose.test.ts`). - FE-823 landed the D65-L substrate tracer (flat table, `createSpec` seed, command/query seam). - Skill-commands (Q6) stay deferred; the live per-turn "what to ask next" driver + - capture-reflection remain an unscoped PLAN follow-on. + (1ca02e38 — deepened bodies + a manifest-wide ≥700-char depth test in `compose.test.ts`) are + all built. FE-823 landed the D65-L substrate tracer (flat table, `createSpec` seed, + command/query seam). Skill-commands (Q6) stay deferred. The remaining open ● row is the live + per-turn "what to ask next" driver + capture-reflection, now disposed as the PLAN frontier + `elicitation-driver`; the seam is not done until it lands. 5. **Spec reconcile** — promote the D40-L/D59-L one-line refinements (on confirmation), land Q1 negative-query touch, fold D65-L/D66-L outcomes into SPEC/PLAN. diff --git a/memory/PLAN.md b/memory/PLAN.md index 8958006cb..d28fea62f 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -27,9 +27,11 @@ All delivery frontiers must also continue materializing the locked source topolo The multi-spec workspace model is now explicit: a workspace is the cwd; multiple specs may coexist under it; each session binds to exactly one spec; each POC spec owns its own intent graph; cross-spec claim sharing/adoption is deferred (D11-L, D21-L, D61-L). Delivery work must target an explicit selected/current spec and must not accidentally recreate a workspace-global graph. -Planning is currently carrying two shapes at once: canonical frontier sequencing in this file, and a temporary elicitor capability ledger in `memory/CROSS_CUT_PLAN.md`. The authority split must stay hard: `PLAN.md` owns frontier ids, ordering, and dependency judgments; `CROSS_CUT_PLAN.md` only inventories the temporary READ/WRITE/KNOW row surface. The current planning move is therefore to promote any cross-cut row that has escaped row-sized work back into a real frontier. `elicitation-backlog` was the first such promotion and is now landed; the remaining prompt-resource body-depth pass stays temporary cross-cut completion work. +Planning is currently carrying two shapes at once: canonical frontier sequencing in this file, and a temporary elicitor capability ledger in `memory/CROSS_CUT_PLAN.md`. The authority split must stay hard: `PLAN.md` owns frontier ids, ordering, and dependency judgments; `CROSS_CUT_PLAN.md` only inventories the temporary READ/WRITE/KNOW row surface. The current planning move is therefore to promote any cross-cut row that has escaped row-sized work back into a real frontier. `elicitation-backlog` (the D65-L *substrate*) was the first such promotion and is landed; the prompt-resource body-depth pass is also built (1ca02e38). **The cross-cut is not yet exhausted:** its Seam 3a `"what to ask next" driver` row is still `partial · ●`, and the seam DoD holds a seam open while any `●` row is `partial`. That row — the *live per-turn elicitation-backlog driver* (read open entries → rank → select next question; capture-reflection grows/closes entries) — is a required elicitor capability that has escaped row-sized work, so per the cross-cut's own rule it is promoted here as the `elicitation-driver` frontier. It is buildable now (the FE-823 read-back exists) and is **not** POC-ship-critical (the POC delivery cut de-scopes elicitation quality), so it sequences as a coverage frontier, not a ship-gate blocker. -The `graph-observed-shapes` coverage frontier has now landed (the consumer-specific read-shape inventory is ratified in `src/graph/README.md` and guarded by a drift test). With `minimal-authority-shell` also done, the active delivery path is `poc-live-ship-gate` (now unblocked). `runtime-affordances-and-legality` remains the next likely coverage frontier but stays parked until a posture/UI pass forces its shape. Exchange/capture breadth is explicitly deferred until its surviving inventory is honest enough to enumerate without recreating the deleted stub surface. +The `graph-observed-shapes` coverage frontier has now landed (the consumer-specific read-shape inventory is ratified in `src/graph/README.md` and guarded by a drift test). With `minimal-authority-shell` also done, the active delivery path is `poc-live-ship-gate` (now unblocked). + +The remaining coverage frontiers are being deliberately de-fogged rather than left parked, because "wait for a forcing function" can hide capability layers we simply never built. Each is reclassified: `runtime-affordances-and-legality` is mostly **buildable-now** — its core is one Brunch-owned `affordances(resolvedState)` derivation over legality/default tables that already exist, so it is being re-inventoried as a coverage ledger (only its `active-review-set` / `turn-mode` rows are genuinely product-state-gated and stay tripwired). `exchanges-and-generalized-capture` is **evidence-gated**: the exchange topology is enumerable now, but capture quality beyond directly-labeled facts (A22-L) needs a measurement, so it is being attacked with a capture-quality spike rather than awaited. The `elicitation-driver` frontier (promoted above) is likewise buildable now. ## Sequencing @@ -40,7 +42,9 @@ The `graph-observed-shapes` coverage frontier has now landed (the consumer-speci ### Next 1. `poc-live-ship-gate` — final fresh-cwd runbook remains the delivery gate, but its prepared live-mention-autocomplete slice is currently parked off the critical path. -2. `runtime-affordances-and-legality` — follow-on coverage frontier for shared posture legality/default surfaces once graph observed shapes stop dominating. +2. `runtime-affordances-and-legality` — coverage frontier for shared posture legality/default surfaces; **buildable-now** and being scoped as an inventory ledger (`memory/cards/runtime-affordances--coverage-ledger.md`), not parked. +3. `elicitation-driver` — coverage frontier for the live per-turn "what to ask next" driver promoted out of the cross-cut; buildable-now on the FE-823 substrate, not POC-ship-critical. +4. `capture-quality-spike` — evidence spike that measures generalized-capture fitness (A22-L) so `exchanges-and-generalized-capture` can graduate from horizon on real evidence rather than waiting (`memory/cards/capture-quality--fitness-spike.md`). ### Parallel / Low-conflict @@ -50,7 +54,7 @@ The `graph-observed-shapes` coverage frontier has now landed (the consumer-speci ### Horizon -- `exchanges-and-generalized-capture` — revisit only when the surviving exchange/capture inventory is honest enough to enumerate; not yet a coverage frontier. +- `exchanges-and-generalized-capture` — exchange topology is enumerable now, but generalized-capture breadth is gated on the `capture-quality-spike` evidence (above), not on waiting; graduates to a coverage frontier once the spike closes the inventory honestly. - `turn-boundary-reconciliation` — M7; graph revisions, `worldUpdate`, mention staleness, side-task/reviewer drains. - `coherence-first-class` — M8; bounded coherence verdicts backed by reconciliation needs. - `compaction-and-conflict-widening` — M9; long-horizon continuity through compaction. @@ -110,6 +114,28 @@ The `graph-observed-shapes` coverage frontier has now landed (the consumer-speci - **Design docs:** `memory/SPEC.md` D65-L; `docs/design/GRAPH_MODEL.md`. - **Current execution pointer:** Done 2026-06-08 on FE-823. Materialized `elicitation_backlog` as a flat table plus generated migration, seeded grounding questions at `createSpec`, routed create/close mutations through `CommandExecutor` on the shared spec-local LSN/change-log seam, and added graph-owned per-spec read-back. The remaining prompt-resource body pass stays in `memory/CROSS_CUT_PLAN.md` as temporary coverage completion work; the live per-turn driver remains a follow-on, not frontier completion debt. +### elicitation-driver + +- **Name:** Live per-turn "what to ask next" driver +- **Linear:** unassigned +- **Kind:** structural / bounded feature +- **Status:** next +- **Certainty:** proving +- **Promoted from:** `memory/CROSS_CUT_PLAN.md` Seam 3a `"what to ask next" driver` row (D65-L), which remained `partial · ●` after the `elicitation-backlog` substrate landed. Per the cross-cut's own DoD a seam stays open while any `●` row is partial, so the row is disposed here as a real frontier rather than residue. +- **Lights up:** open backlog entries → rank → select next question per turn; capture-reflection grows/closes entries. +- **Stabilizes:** D65-L's live elicitation behavior on top of the flat `elicitation_backlog` substrate; closes the cross-cut Seam 3a row. +- **Objective:** Add the per-turn driver that reads open backlog entries for the selected spec, ranks them (band/priority), selects the next question to surface, and reconciles entries from capture-reflection (open new, close answered) — all on the existing FE-823 read/write substrate. +- **Why now / unlocks:** This is buildable now (the FE-823 substrate and per-spec read-back exist) and it closes the last required cross-cut row. It is **not** POC-ship-critical (the POC delivery cut de-scopes elicitation quality), so it sequences as a coverage frontier, not a ship-gate blocker. +- **Acceptance:** + - A driver reads open entries for the selected spec and produces a deterministic ranked selection of the next question. + - Capture-reflection can open new entries and close answered ones through the existing `CommandExecutor` path; no second mutation clock. + - Selection is observable enough for a probe/transcript to prove the loop without inventing a planning plane or pointer. + - The cross-cut Seam 3a row flips from `partial · ●` to done when this lands. +- **Verification:** Inner — ranking/selection and reconciliation tests over seeded backlog. Middle — per-turn driver read-back over a real graph boundary; sibling-spec isolation. Outer — probe showing rank → select → capture-reflection close across turns. +- **Cross-cutting obligations:** Preserve the D4-L/D20-L command boundary and the D16-L/A4-L one-`{specId, lsn}` clock; keep the substrate flat (no graph plane, no unknown→unknown edges); no second planning system. +- **Traceability:** D16-L, D20-L, D52-L, D63-L, D64-L, D65-L / A24-L. +- **Design docs:** `memory/SPEC.md` D65-L; `docs/design/GRAPH_MODEL.md`. + ### minimal-authority-shell - **Name:** Minimal POC authority shell over graph/session actions @@ -200,6 +226,7 @@ The `graph-observed-shapes` coverage frontier has now landed (the consumer-speci - **Cross-cutting obligations:** Keep truth append-only in `brunch.agent_runtime_state`; affordances are pure derivations over shared tables. Do not add xstate or a persisted machine without new evidence. - **Traceability:** D25-L, D40-L, D59-L, D66-L. - **Design docs:** `memory/SPEC.md` D40-L/D59-L; `src/projections/README.md`; `src/session/README.md`. +- **Current execution pointer:** Being scoped as a coverage ledger in `memory/cards/runtime-affordances--coverage-ledger.md`. The classification is **buildable-now, not parked**: the core is one Brunch-owned `affordances(resolvedState)` derivation over legality/default tables that already exist in `src/projections/session/runtime-policy.ts` and `src/.pi/agents/state.ts`. Only the `active-review-set` and freestyle-vs-structured `turn-mode` rows are genuinely product-state-gated; they stay tripwired in the ledger, not built speculatively. ### exchanges-and-generalized-capture @@ -208,7 +235,7 @@ The `graph-observed-shapes` coverage frontier has now landed (the consumer-speci - **Kind:** structural - **Status:** horizon - **Certainty:** proving -- **Blocked by:** An honest, closeable exchange/capture inventory; do not start while the surface still depends on deleted-stub symmetry or speculative breadth. +- **Blocked by:** An honest, closeable exchange/capture inventory. The forcing function is now named and active: `capture-quality-spike` (`memory/cards/capture-quality--fitness-spike.md`) must produce A22-L fitness evidence over free text/files/refs before this graduates. This is evidence-gated, not wait-gated; do not start the breadth frontier while it still depends on deleted-stub symmetry or speculative breadth. - **Stabilizes:** The ownership split between `.pi/extensions/exchanges`, `projections/exchanges`, `renderers/exchanges`, and `session/structured-exchange-loop.ts`. - **Objective:** Revisit richer exchange payload families and generalized capture breadth only after the surviving surface is clear enough to enumerate. - **Why now / unlocks:** Recording this frontier here prevents the deleted `capture-*` topology from silently regrowing while preserving the likely future concern once capture breadth becomes honest. @@ -276,7 +303,7 @@ The `graph-observed-shapes` coverage frontier has now landed (the consumer-speci ## Recently Completed - 2026-06-08 `minimal-authority-shell` (FE-810) — Done: added the authority-matrix guard test over the current POC authority seam. The guard locks `CommandExecutor` mutation-result discriminants as the graph outcome vocabulary, proves `needs_human` is structured data rather than a TUI-only dialog, and asserts `elicit` tool authority comes from the shared projected runtime policy while blocking the identified side-effecting tools (`bash`, `edit`, `write`). No new authority service; `src/.pi/agents/state.ts` untouched; A18-L strict built-in suppression remains accepted Pi-upstream/API residue. Verified: `src/.pi/extensions/runtime/authority-matrix.test.ts` and `npm run verify`. -- 2026-06-08 cross-cut prompt-resource body-depth pass (Seam 3a/3b) — Done (1ca02e38): deepened every thin `src/.pi/skills/{goals,strategies,lenses,methods}` body to carry its per-axis facet guidance (goals→D59-L, strategies/lenses→README+D25-L, methods→D58-L tool-routing role), and added a manifest-wide readability/depth test in `src/.pi/agents/compose.test.ts` asserting every `{GOAL,STRATEGY,LENS,METHOD}_RESOURCES` location resolves and clears a ≥700-char floor. `state.ts` untouched. This closed the last row-sized cross-cut completion work; `memory/CROSS_CUT_PLAN.md` ● rows are now all built. Verified: `npm run verify` (551 tests, build). +- 2026-06-08 cross-cut prompt-resource body-depth pass (Seam 3a/3b) — Done (1ca02e38): deepened every thin `src/.pi/skills/{goals,strategies,lenses,methods}` body to carry its per-axis facet guidance (goals→D59-L, strategies/lenses→README+D25-L, methods→D58-L tool-routing role), and added a manifest-wide readability/depth test in `src/.pi/agents/compose.test.ts` asserting every `{GOAL,STRATEGY,LENS,METHOD}_RESOURCES` location resolves and clears a ≥700-char floor. `state.ts` untouched. This closed the prompt-resource body-depth row, but the cross-cut is **not** exhausted: its Seam 3a `"what to ask next" driver` row (`partial · ●`) remains the last required row, now promoted to the `elicitation-driver` frontier. Verified: `npm run verify` (551 tests, build). - 2026-06-08 `elicitation-backlog` (FE-823) — Done: materialized `elicitation_backlog` as a flat spec-scoped table with generated migration, seeded the grounding agenda at `createSpec`, routed create/close entry mutations through `CommandExecutor` on the shared `{specId, lsn}` / `change_log` boundary, and added graph-owned per-spec open-entry read-back. Reconciled D65-L/A24-L and updated graph/db topology docs. Verified: `src/graph/command-executor.test.ts`, `src/graph/queries.test.ts`, and `npm run verify`. @@ -301,7 +328,9 @@ nodes: minimal-authority-shell [done · P1] thin safety posture for current POC paths poc-live-ship-gate [next · P1] final fresh-cwd composed product runbook graph-observed-shapes [done · proving] ratified consumer-specific observed-shape ledger + drift guard; no transport shape shipped - runtime-affordances-and-legality [next · proving] keep posture legality/default surfaces shared across transports + runtime-affordances-and-legality [next · proving] buildable-now affordance(resolvedState) coverage ledger; review-set/turn-mode rows tripwired + elicitation-driver [next · proving] live per-turn what-to-ask-next driver on FE-823 substrate; closes cross-cut Seam 3a + capture-quality-spike [next · spike] A22-L fitness evidence to graduate exchanges-and-generalized-capture probes-and-transcripts-evolution [parallel] continuous evidence substrate topology-readmes-and-boundaries [parallel] attach-to-frontier topology hardening dev-seed-fixtures [parallel] rich seed data substrate for dev/observer testing @@ -313,6 +342,8 @@ edges: graph-tool-resilience -[hard]-> poc-live-ship-gate project-graph-review-cycle -[optional]-> poc-live-ship-gate minimal-authority-shell -[hard]-> poc-live-ship-gate + elicitation-backlog -[hard]-> elicitation-driver + capture-quality-spike -[evidence]-> exchanges-and-generalized-capture parallel obligations: probes-and-transcripts-evolution -[evidence]-> every P0/P1 frontier @@ -331,8 +362,8 @@ horizon: geolog-and-petri-execution notes: - - `elicitation-backlog` was the promoted D65-L row from `memory/CROSS_CUT_PLAN.md`; the prompt-resource body-depth pass (the last temporary cross-cut completion work) landed in 1ca02e38, so `memory/CROSS_CUT_PLAN.md` now has no row-sized work left — its only residue is the unscoped live "what to ask next" driver. - - Parallel worktree streams (2026-06-08): all three landed — (A) `crosscut-know--resource-body-depth` (1ca02e38), (B) `graph-observed-shapes--coverage-ledger` (85e73ba7), (C) `minimal-authority-shell--audit-and-guard` (68474e3f); each kept to its declared write paths and left `src/.pi/agents/state.ts` untouched, so the parallel run produced no collisions. `poc-live-ship-gate` is now unblocked (its hard dependency `minimal-authority-shell` is done); `runtime-affordances-and-legality` and `exchanges-and-generalized-capture` stay parked (shape not yet forced) and are not cold-startable worktree streams. + - `elicitation-backlog` was the promoted D65-L *substrate* row from `memory/CROSS_CUT_PLAN.md`; the prompt-resource body-depth pass landed in 1ca02e38. The cross-cut is **not** exhausted: its Seam 3a `"what to ask next" driver` row is still `partial · ●`, which by the seam DoD keeps the seam open. That row is now disposed as the `elicitation-driver` frontier (not residue), so the remaining cross-cut obligation has a named owner in `PLAN.md`. + - Parallel worktree streams (2026-06-08): all three landed — (A) `crosscut-know--resource-body-depth` (1ca02e38), (B) `graph-observed-shapes--coverage-ledger` (85e73ba7), (C) `minimal-authority-shell--audit-and-guard` (68474e3f); each kept to its declared write paths and left `src/.pi/agents/state.ts` untouched, so the parallel run produced no collisions. `poc-live-ship-gate` is now unblocked (its hard dependency `minimal-authority-shell` is done). The next coverage frontiers are de-fogged rather than parked: `runtime-affordances-and-legality` (buildable-now ledger) and `elicitation-driver` (buildable-now on the FE-823 substrate) are cold-startable worktree streams; `capture-quality-spike` is an evidence spike that gates `exchanges-and-generalized-capture`. - Completed prerequisites: `agents-composition-layer` supplies runtime prompt/resource posture, and `live-graph-observer` supplies the read-only web observer path expected by `capture-response-to-graph` and `poc-live-ship-gate`. - `graph-observed-shapes` is intentionally consumer-specific: do not assume every agent read shape belongs on the web observer. - `exchanges-and-generalized-capture` stays deferred until the surviving inventory is honest enough to close; do not regrow deleted `capture-*` symmetry in the meantime. diff --git a/memory/cards/capture-quality--fitness-spike.md b/memory/cards/capture-quality--fitness-spike.md new file mode 100644 index 000000000..0a57c32de --- /dev/null +++ b/memory/cards/capture-quality--fitness-spike.md @@ -0,0 +1,82 @@ +# Capture-quality fitness spike + +Frontier: capture-quality-spike (gates exchanges-and-generalized-capture) +Status: active +Mode: single +Created: 2026-06-08 + +## Orientation + +- **Containing seam:** post-exchange / ordinary-message capture. The production path commits only **directly-labeled** high-confidence facts today: `captureExplicitTextFacts` in `src/graph/capture/structured-response.ts` accepts `Goal:`/`Context:`/`Constraint:`/`Criterion:` lines and routes them through `CommandExecutor.commitGraph({basis: explicit})` (wired on `session.submitExchangeResponse` and `session.submitMessage`). Capture beyond labeled facts is unbuilt. +- **Relevant frontier item:** this spike is the **named forcing function** for the horizon frontier `exchanges-and-generalized-capture`. That frontier is *evidence-gated, not wait-gated* (PLAN.md): it cannot graduate until we have real measurement of capture fitness over free text/files/refs. The output of this card is **knowledge + evidence artifacts**, not production capture code. +- **Volatile handoff state:** no `HANDOFF.md`. The `capture-*` projector/renderer stubs were deliberately deleted in the snapshot migration (35eff395) precisely because the capture inventory was not honest yet; **do not** recreate them. The probe precedent is `src/probes/fixture-curation-loop.ts` (an LLM-driven measurement probe that emits report artifacts under `.fixtures/runs/`). +- **Main open risk:** the spike quietly turning into production capture work — adding LLM extraction into `src/graph/capture/` or materializing broad runtime/product seams. It must stay throwaway: measure fitness, record a confidence shift on A22-L, and recommend whether/how the frontier graduates. + +Posture: proving (this is a spike; output is evidence and a confidence shift, not a tracer). + +## Light scope card (spike) + +### Objective + +Produce real evidence of how reliably an LLM-driven capture step can extract high-confidence graph facts from free prose / files / refs **beyond** directly-labeled lines, so `exchanges-and-generalized-capture` can graduate (or stay parked) on measurement rather than guesswork. + +### Acceptance Criteria + +``` +✓ A spike probe under src/probes/ runs a capture-quality measurement over a small fixed scenario set + (free-prose answers, file/ref-bearing answers, implication-heavy answers) and emits a report artifact + under .fixtures/runs/capture-quality/ with per-scenario extraction vs expected-fact comparison. +✓ The report quantifies fitness against the A22-L split: high-confidence facts that SHOULD commit vs + low-confidence implications that should STAY OUT of graph truth (precision/recall or false-commit count). +✓ A short verdict is written (in the run artifact and/or a spike note) recording the confidence shift on + A22-L and a concrete recommendation: graduate the frontier, narrow it, or keep it parked with the next gate. +✓ No production capture behavior changes: src/graph/capture/ logic is not extended, and no capture-* + projector/renderer stubs are reintroduced. +``` + +### Verification Approach + +``` +- Inner: a deterministic harness test (like src/probes/fixture-curation-loop.test.ts) that proves the + probe's report/summarization mechanics WITHOUT requiring a live LLM (fixture-fed transcript in → summary out). +- Outer: the real LLM measurement run, recorded as artifacts under .fixtures/runs/capture-quality/ + (mixed-basis output stays in runs/, never registered as a reusable seed). +``` + +### Cross-cutting obligations + +``` +- Throwaway investigation: knowledge + evidence, not production capture code. +- Do not regrow deleted capture-* topology; do not reintroduce `snapshot` as an architecture noun. +- Any commit the probe demonstrates still routes through CommandExecutor with basis: explicit (D63-L); + the probe must not invent a side channel into graph truth. +- Keep src/renderers/ for durable text only; measurement output is run-artifact data, not a renderer. +``` + +### Assumption dependency + +Depends on: A22-L (capture is "partially validated" — labeled facts proven; broader fitness explicitly open). This spike exists precisely to move A22-L's evidence; building against it is sound because the spike's job is to test it, not assume it. + +### Expected touched paths (tentative) + +```pseudo +src/probes/ +├── capture-quality-loop.ts + # LLM measurement probe + report summarizer +└── capture-quality-loop.test.ts + # deterministic harness mechanics (no live LLM) + +.fixtures/runs/capture-quality/ + # real-run evidence artifacts (transcript, extraction, verdict) + +memory/SPEC.md ? # update A22-L evidence/status after the verdict (reconciliation) +``` + +### Promotion checklist + +- [ ] Does this change a requirement? +- [ ] Does this create, retire, or invalidate an assumption? — *expected:* it will shift A22-L evidence; reconcile SPEC after the verdict. +- [ ] Does this slice depend on an unvalidated high-impact assumption? — it tests one; that is the point of a spike. +- [ ] Does this make or reverse a non-trivial design decision? +- [ ] Does this establish a new seam-level invariant? +- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? +- [ ] Does it cross more than two major seams? +- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? +- [ ] Can you not name the containing seam or current rationale from the live docs? diff --git a/memory/cards/runtime-affordances--coverage-ledger.md b/memory/cards/runtime-affordances--coverage-ledger.md new file mode 100644 index 000000000..230f4d345 --- /dev/null +++ b/memory/cards/runtime-affordances--coverage-ledger.md @@ -0,0 +1,121 @@ +# Runtime affordances coverage ledger + +Frontier: runtime-affordances-and-legality +Status: active +Mode: single +Created: 2026-06-08 + +## Orientation + +- **Containing seam:** the runtime posture legality/default surface. Truth is the append-only `brunch.agent_runtime_state` projection; the legality/default rules already live in `src/projections/session/runtime-policy.ts` (allowed lists + `defaultStrategy`/`defaultLens`/`defaultGoal`) and `src/.pi/agents/state.ts` (`AUTO_EXCLUDED_STRATEGIES`, `isGradeLegal`/grade gating, `selectAxisResources`). The current RPC projection in `src/rpc/methods/session.ts` exposes only the *current* selection per axis (`agent.strategy`/`lens`/`goal`), **not** the available options or the default-on-switch value. +- **Relevant frontier item:** `runtime-affordances-and-legality` (PLAN.md §Frontier Definitions). This is a **coverage** frontier in the same mold as the landed `graph-observed-shapes` — a closed enumerated ledger of which affordance shapes are canonical per consumer, guarded by a drift test, plus one shared derivation so no client re-implements legality. It is buildable now; the legality/default tables already exist. +- **Volatile handoff state:** no `HANDOFF.md`. The `snapshot`→`reads/projections/renderers` migration (35eff395) is landed; use current paths. `graph-observed-shapes` (85e73ba7) is the precedent: `src/graph/README.md` owns its ledger and `src/graph/observed-shapes-coverage.test.ts` guards required-subset coverage **without shipping any transport shape it does not need**. Mirror that discipline exactly. +- **Main open risk:** scope creep into building TUI/web posture-switch UI, or into an xstate/persisted state machine. This card is a **coverage ledger + one pure derivation**, not a control surface. The genuinely product-state-gated rows (`active-review-set` affordances, freestyle-vs-structured `turn-mode`) must stay tripwired in the ledger, not built. + +Posture: proving (inherited from `runtime-affordances-and-legality`). + +Frontier-level cross-cutting obligations this slice carries: + +- Keep runtime truth append-only in `brunch.agent_runtime_state`; affordances are **pure derivations** over the shared legality/default tables, never new persisted state. +- Do not add xstate or a persisted machine (PLAN cross-cutting obligation; SPEC D40-L projection-as-truth). +- Do not duplicate legality/default rules in any client (`web/`, `rpc/`, TUI); the derivation is the single owner. +- Preserve D66-L: `freestyle` is AUTO-excluded; the affordance derivation's available-under-AUTO set must reflect that, matching `AUTO_EXCLUDED_STRATEGIES`. + +## Full scope card + +### Target Behavior + +A single Brunch-owned `affordances(resolvedState)` derivation reports, per posture axis (goal / strategy / lens), the legal options and the default-on-switch value from the existing shared legality/default tables, and a closed coverage ledger in `src/session/README.md` records which affordance rows each consumer (agent, RPC, web) requires versus defers, guarded by a drift test. + +### Boundary Crossings + +``` +→ resolved runtime state (ResolvedBrunchAgentState from src/projections/session/runtime-policy.ts) +→ shared affordance derivation (new pure function over allowed lists + defaults + AUTO/grade rules) +→ coverage ledger (src/session/README.md): required vs deferred affordance rows per consumer +→ drift guard test (asserts the ledger's required subset against the real derivation + RPC schema) +``` + +### Risks and Assumptions + +``` +- RISK: the derivation re-implements legality instead of reusing src/.pi/agents/state.ts logic + → MITIGATION: extract/lift the existing allowed + AUTO-excluded + isGradeLegal logic into the + shared projection seam (projections/session) and have agent manifest composition consume it, + OR have the new derivation import the same source-of-truth tables; do not fork the rules. +- RISK: scope drifts into shipping the affordance shape onto the RPC/web transport + → MITIGATION: follow graph-observed-shapes — the ledger may mark a row "web-eligible deferred"; + shipping a transport shape is a separate later slice, not this card. +- ASSUMPTION: the legality/default knowledge needed for affordances is fully present in + runtime-policy.ts + state.ts and needs no new product state. + → IMPACT IF FALSE: a required affordance row would depend on active-review-set / turn-mode + product state that does not exist yet; that row is then a tripwired deferred row, not a gap. + → VALIDATE: enumerate the ledger rows first; any row that cannot be derived from current tables + is marked product-state-gated with its tripwire, not built. + → memory/SPEC.md D40-L, D59-L +``` + +### Posture check + +Proving slice. It scores on **invariants** (locates and stabilizes the affordance-derivation seam as the single owner of legality/default truth across transports) and **proof of life** (a shared `affordances(resolvedState)` derivation exists and is consumed where legality was previously implicit). It retires the fog that runtime affordances are unbuildable until a UI pass: the ledger proves how much is derivable now. No high-impact assumption is left unretired — rows that cannot be derived become explicit tripwired deferrals. + +### Acceptance Criteria + +``` +✓ affordances-derivation.test.ts — affordances(resolvedState) returns, per axis (goal/strategy/lens), + the legal option set and the default-on-switch value, matching runtime-policy.ts defaults. +✓ affordances-derivation.test.ts — under AUTO the strategy options exclude `freestyle` + (parity with AUTO_EXCLUDED_STRATEGIES); under an explicit pin the pinned legal value is reported. +✓ affordances-derivation.test.ts — grade-illegal options are excluded, matching isGradeLegal. +✓ runtime-affordances-coverage.test.ts — the ledger's required affordance rows per consumer + (agent/RPC/web) are covered by the derivation and the RPC session schema; deferred rows are not forced. +✓ runtime-affordances-coverage.test.ts — `active-review-set` and `turn-mode` rows are present as + deferred/tripwired entries, not as built affordances. +✓ No client (web/, rpc/, TUI) re-derives availability/legality locally; legality has one owner. +✓ No xstate, no persisted machine, no new runtime-state table. +``` + +### Verification Approach + +``` +- Inner: unit tests (oracle: derivation against fixtures) — affordances() vs hand-specified legal/ + default/AUTO/grade expectations over ResolvedBrunchAgentState fixtures. +- Inner: drift/coverage test (oracle: ledger-vs-reality) — required-subset coverage like + graph-observed-shapes; fails if a required row loses its derivation or RPC field. +- Middle: only if a transport shape is actually adopted in this card (default: not adopted). +``` + +### Cross-cutting obligations + +``` +- Affordances are pure derivations over shared tables; runtime truth stays append-only. +- No client-side legality reimplementation; single owner for availability/default rules. +- Preserve D66-L freestyle AUTO-exclusion in the available-under-AUTO set. +- Keep src/renderers/ for durable LLM/session text only; affordances are structured data, not renderers. +``` + +### Expected touched paths (tentative) + +```pseudo +src/projections/session/ +├── affordances.ts + # affordances(resolvedState): legal options + default-on-switch +├── affordances.test.ts + +├── runtime-policy.ts ? # may export shared legality/default helpers if lifted here +└── runtime-state.ts ? # if RuntimeStateProjection gains a required affordance row + +src/.pi/agents/ +└── state.ts ? # consume shared derivation instead of forked legality logic + # (only if it reduces duplication; keep behavior identical) + +src/session/ +├── README.md ~ # owns the closed affordance coverage ledger +└── runtime-affordances-coverage.test.ts + # drift guard for required-vs-deferred rows per consumer + +src/rpc/methods/session.ts ? # only if a required affordance row must surface now +``` + +### Promotion checklist + +- [x] Already a full scope card. +- Build note: if enumerating the ledger reveals that a *required* (not deferred) row depends on + product state that does not exist, stop and route back through `ln-plan` — the frontier shape shifted. From 23fd4354dd9685401c8fdf1fe09932d45afc6716 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 11:41:12 +0200 Subject: [PATCH 13/17] Address planner review: fix affordance input surface and sequencing - runtime-affordances card: widen the core derivation to affordances(resolvedState, readinessGrade); legality is not derivable from ResolvedBrunchAgentState alone (grade is a separate input per BrunchPostureToolPolicyInput / isGradeLegal). - Reorder Next: elicitation-driver precedes runtime-affordances so the last open cross-cut row closes before any fresh coverage frontier; runtime-affordances is parallel-eligible but must not preempt it. - Clarify the generalized-capture cross-cut row: 'built' = POC bar only; richer capture is owned by exchanges-and-generalized-capture, gated on capture-quality-spike, not unfinished in this row. Amp-Thread-ID: https://ampcode.com/threads/T-019ea2fc-9f12-767d-bd7a-08497f7307fd Co-authored-by: Amp --- memory/CROSS_CUT_PLAN.md | 2 +- memory/PLAN.md | 4 ++-- .../runtime-affordances--coverage-ledger.md | 17 ++++++++++------- 3 files changed, 13 insertions(+), 10 deletions(-) diff --git a/memory/CROSS_CUT_PLAN.md b/memory/CROSS_CUT_PLAN.md index b847e0658..c88f14042 100644 --- a/memory/CROSS_CUT_PLAN.md +++ b/memory/CROSS_CUT_PLAN.md @@ -127,7 +127,7 @@ DoD: every ● row is `have` or `built`. | --- | --- | --- | --- | --- | --- | | 6 method resources scaffolding | have | ● | — | — | run-structured-exchange, infer-and-capture, commit-graph, read-context, generate-proposal, review-for-gaps | | method **content depth** | built | ● | — | done — deepened bodies + manifest-wide depth test (1ca02e38) | each method gives tool-routing/sequencing guidance, not tool-description restatement | -| generalized capture (free text, files, refs; iterative passes) | built | ● | — | done — labeled-text core on `session.submitMessage` (5f5e6ac8) | POC bar = directly-labeled facts; richer free-text/files/refs remain A22-L fitness evidence; D66-L | +| generalized capture (free text, files, refs; iterative passes) | built | ● | — | done — labeled-text core on `session.submitMessage` (5f5e6ac8) | `built` = the **POC bar only** (directly-labeled facts). Richer free-text/files/refs capture is **out of this row's scope by design**, not unfinished here: it is gated on the `capture-quality-spike` (A22-L) and owned by the PLAN frontier `exchanges-and-generalized-capture`. D66-L | | exchange-tool `.description()` / `promptGuidelines` | built | ● | — | done — all 7 exchange tools carry both (drift correction 2026-06-07) | `src/.pi/extensions/exchanges/*` already match the `commit_graph` pattern | | skill-commands (`gap-review`, `arbitrary-enhance`) | new | ○ | proving | Q6 (deferred) | off critical path | diff --git a/memory/PLAN.md b/memory/PLAN.md index d28fea62f..d972f4c8c 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -42,8 +42,8 @@ The remaining coverage frontiers are being deliberately de-fogged rather than le ### Next 1. `poc-live-ship-gate` — final fresh-cwd runbook remains the delivery gate, but its prepared live-mention-autocomplete slice is currently parked off the critical path. -2. `runtime-affordances-and-legality` — coverage frontier for shared posture legality/default surfaces; **buildable-now** and being scoped as an inventory ledger (`memory/cards/runtime-affordances--coverage-ledger.md`), not parked. -3. `elicitation-driver` — coverage frontier for the live per-turn "what to ask next" driver promoted out of the cross-cut; buildable-now on the FE-823 substrate, not POC-ship-critical. +2. `elicitation-driver` — **first coverage follow-on**: it closes the last open required cross-cut row (Seam 3a `"what to ask next" driver`) and retires the temporary dual-plan state, so it sequences ahead of any fresh coverage frontier. Buildable-now on the FE-823 substrate; not POC-ship-critical. +3. `runtime-affordances-and-legality` — coverage frontier for shared posture legality/default surfaces; **buildable-now** and scoped as an inventory ledger (`memory/cards/runtime-affordances--coverage-ledger.md`). It is **parallel-eligible but must not preempt closing the cross-cut**: do not let it pull planning back into a second open frontier before `elicitation-driver` lands. (It writes disjoint paths from `elicitation-driver`, so it may run as a concurrent worktree stream — just not *instead of* the cross-cut closer.) 4. `capture-quality-spike` — evidence spike that measures generalized-capture fitness (A22-L) so `exchanges-and-generalized-capture` can graduate from horizon on real evidence rather than waiting (`memory/cards/capture-quality--fitness-spike.md`). ### Parallel / Low-conflict diff --git a/memory/cards/runtime-affordances--coverage-ledger.md b/memory/cards/runtime-affordances--coverage-ledger.md index 230f4d345..912f7bae3 100644 --- a/memory/cards/runtime-affordances--coverage-ledger.md +++ b/memory/cards/runtime-affordances--coverage-ledger.md @@ -8,6 +8,7 @@ Created: 2026-06-08 ## Orientation - **Containing seam:** the runtime posture legality/default surface. Truth is the append-only `brunch.agent_runtime_state` projection; the legality/default rules already live in `src/projections/session/runtime-policy.ts` (allowed lists + `defaultStrategy`/`defaultLens`/`defaultGoal`) and `src/.pi/agents/state.ts` (`AUTO_EXCLUDED_STRATEGIES`, `isGradeLegal`/grade gating, `selectAxisResources`). The current RPC projection in `src/rpc/methods/session.ts` exposes only the *current* selection per axis (`agent.strategy`/`lens`/`goal`), **not** the available options or the default-on-switch value. +- **Input-surface note (load-bearing):** legality is **not** derivable from `ResolvedBrunchAgentState` alone. That type (`runtime-policy.ts`) carries only mode/role/axis selections + definitions; the grade gate is a *separate* input — see `BrunchPostureToolPolicyInput` in `src/.pi/agents/state.ts` (`{ state: ResolvedBrunchAgentState; readinessGrade: ReadinessGrade }`) and the grade-sensitive filter in `selectAxisResources` / `isGradeLegal`. The shared derivation must therefore take **both** resolved state and readiness grade: `affordances(resolvedState, readinessGrade)`. Do not certify a grade-independent function as the legality seam. - **Relevant frontier item:** `runtime-affordances-and-legality` (PLAN.md §Frontier Definitions). This is a **coverage** frontier in the same mold as the landed `graph-observed-shapes` — a closed enumerated ledger of which affordance shapes are canonical per consumer, guarded by a drift test, plus one shared derivation so no client re-implements legality. It is buildable now; the legality/default tables already exist. - **Volatile handoff state:** no `HANDOFF.md`. The `snapshot`→`reads/projections/renderers` migration (35eff395) is landed; use current paths. `graph-observed-shapes` (85e73ba7) is the precedent: `src/graph/README.md` owns its ledger and `src/graph/observed-shapes-coverage.test.ts` guards required-subset coverage **without shipping any transport shape it does not need**. Mirror that discipline exactly. - **Main open risk:** scope creep into building TUI/web posture-switch UI, or into an xstate/persisted state machine. This card is a **coverage ledger + one pure derivation**, not a control surface. The genuinely product-state-gated rows (`active-review-set` affordances, freestyle-vs-structured `turn-mode`) must stay tripwired in the ledger, not built. @@ -25,13 +26,14 @@ Frontier-level cross-cutting obligations this slice carries: ### Target Behavior -A single Brunch-owned `affordances(resolvedState)` derivation reports, per posture axis (goal / strategy / lens), the legal options and the default-on-switch value from the existing shared legality/default tables, and a closed coverage ledger in `src/session/README.md` records which affordance rows each consumer (agent, RPC, web) requires versus defers, guarded by a drift test. +A single Brunch-owned `affordances(resolvedState, readinessGrade)` derivation reports, per posture axis (goal / strategy / lens), the legal options and the default-on-switch value from the existing shared legality/default tables (including the grade gate), and a closed coverage ledger in `src/session/README.md` records which affordance rows each consumer (agent, RPC, web) requires versus defers, guarded by a drift test. ### Boundary Crossings ``` → resolved runtime state (ResolvedBrunchAgentState from src/projections/session/runtime-policy.ts) -→ shared affordance derivation (new pure function over allowed lists + defaults + AUTO/grade rules) + + readiness grade (ReadinessGrade — the separate grade input, cf. BrunchPostureToolPolicyInput) +→ shared affordance derivation (new pure function over allowed lists + defaults + AUTO + grade rules) → coverage ledger (src/session/README.md): required vs deferred affordance rows per consumer → drift guard test (asserts the ledger's required subset against the real derivation + RPC schema) ``` @@ -57,16 +59,17 @@ A single Brunch-owned `affordances(resolvedState)` derivation reports, per postu ### Posture check -Proving slice. It scores on **invariants** (locates and stabilizes the affordance-derivation seam as the single owner of legality/default truth across transports) and **proof of life** (a shared `affordances(resolvedState)` derivation exists and is consumed where legality was previously implicit). It retires the fog that runtime affordances are unbuildable until a UI pass: the ledger proves how much is derivable now. No high-impact assumption is left unretired — rows that cannot be derived become explicit tripwired deferrals. +Proving slice. It scores on **invariants** (locates and stabilizes the affordance-derivation seam as the single owner of legality/default truth across transports) and **proof of life** (a shared `affordances(resolvedState, readinessGrade)` derivation exists and is consumed where legality was previously implicit). It retires the fog that runtime affordances are unbuildable until a UI pass: the ledger proves how much is derivable now. No high-impact assumption is left unretired — rows that cannot be derived become explicit tripwired deferrals. ### Acceptance Criteria ``` -✓ affordances-derivation.test.ts — affordances(resolvedState) returns, per axis (goal/strategy/lens), - the legal option set and the default-on-switch value, matching runtime-policy.ts defaults. +✓ affordances-derivation.test.ts — affordances(resolvedState, readinessGrade) returns, per axis + (goal/strategy/lens), the legal option set and the default-on-switch value, matching runtime-policy.ts defaults. ✓ affordances-derivation.test.ts — under AUTO the strategy options exclude `freestyle` (parity with AUTO_EXCLUDED_STRATEGIES); under an explicit pin the pinned legal value is reported. -✓ affordances-derivation.test.ts — grade-illegal options are excluded, matching isGradeLegal. +✓ affordances-derivation.test.ts — varying readinessGrade changes the legal option set exactly as + isGradeLegal dictates (grade-illegal options excluded); proves the grade input is load-bearing, not ignored. ✓ runtime-affordances-coverage.test.ts — the ledger's required affordance rows per consumer (agent/RPC/web) are covered by the derivation and the RPC session schema; deferred rows are not forced. ✓ runtime-affordances-coverage.test.ts — `active-review-set` and `turn-mode` rows are present as @@ -98,7 +101,7 @@ Proving slice. It scores on **invariants** (locates and stabilizes the affordanc ```pseudo src/projections/session/ -├── affordances.ts + # affordances(resolvedState): legal options + default-on-switch +├── affordances.ts + # affordances(resolvedState, readinessGrade): legal options + default-on-switch ├── affordances.test.ts + ├── runtime-policy.ts ? # may export shared legality/default helpers if lifted here └── runtime-state.ts ? # if RuntimeStateProjection gains a required affordance row From 7a68aa5965caa256cae42f973a44f852b14cc3f4 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 11:45:45 +0200 Subject: [PATCH 14/17] Harden coverage-frontier protocol Amp-Thread-ID: https://ampcode.com/threads/T-019ea2ec-7506-74ed-a4e0-99b8d800442f Co-authored-by: Amp --- .agents/skills/ln-build/SKILL.md | 18 ++-- .agents/skills/ln-consult/SKILL.md | 2 +- .agents/skills/ln-plan/SKILL.md | 8 +- .agents/skills/ln-plan/references/coverage.md | 85 +++++++++++++++++++ .agents/skills/ln-scope/SKILL.md | 16 +++- .agents/skills/ln-sync/SKILL.md | 10 +++ docs/praxis/ln-skills.md | 2 +- 7 files changed, 130 insertions(+), 11 deletions(-) create mode 100644 .agents/skills/ln-plan/references/coverage.md diff --git a/.agents/skills/ln-build/SKILL.md b/.agents/skills/ln-build/SKILL.md index 472d8299e..167effa53 100644 --- a/.agents/skills/ln-build/SKILL.md +++ b/.agents/skills/ln-build/SKILL.md @@ -91,17 +91,23 @@ Never silently continue past a stale-downstream signal. Never silently delete a When a scope file is `Mode: coverage` (see [`ln-scope`](../ln-scope/SKILL.md) §Coverage scope files), it holds a closed enumerated ledger of one capability layer rather than a sequence of full cards. The build loop is row-driven: +Before taking a row, reload [`../ln-plan/references/coverage.md`](../ln-plan/references/coverage.md) if you have not read it in this thread. + 1. take the next open required (`●`) row — one whose Status is `spec`, `new`, or `partial` -2. build it under the **fill mode declared in that row** (`proving` → tracer that retires the row's unknown; `earned` → land and lock the settled capability). A `new` row needs its micro-decision resolved (`ln-disambiguate` / `ln-spec`) before it can be built -3. run red → green → refactor and the verification harness for that row -4. flip the row's Status to `built` in the ledger and reconcile canonical state -5. commit the row-sized change -6. continue until **no `●` row remains in `spec` / `new` / `partial`** — that aggregate DoD, not any single row, completes the coverage frontier +2. **coverage re-orient checkpoint** — verify the row still fits the declared layer boundary, that its named owner is still the right owner, and that its promised behavior is derivable from the row's source-of-truth inputs. If any of those fail, stop and route back through `ln-scope` / `ln-plan` +3. build it under the **fill mode declared in that row** (`proving` → tracer that retires the row's unknown; `earned` → land and lock the settled capability). A `new` row needs its micro-decision resolved (`ln-disambiguate` / `ln-spec`) before it can be built +4. run red → green → refactor and the verification harness for that row +5. flip the row's Status to `built` in the ledger and reconcile canonical state +6. commit the row-sized change +7. continue until **no `●` row remains in `spec` / `new` / `partial`** — that aggregate DoD, not any single row, completes the coverage frontier -The chain stop conditions and Stale-downstream re-orient apply per row. Two coverage-specific rules: +The chain stop conditions and Stale-downstream re-orient apply per row. Coverage-specific rules: - **Do not add rows as you go** except to record a genuinely-missing capability (Status `new`, one-line justification). The ledger is a closed list; filling it never means "do everything that rhymes" (global `AGENTS.md` §completionist sprawl). +- **One new row maximum.** If implementation discovers a second new row or a new sub-seam, the inventory was not actually closed; stop and route back through `ln-plan`. - **A row that grows past ledger-row size** spawns its own `single` scope file; replace the row's Owner / next cell with a pointer rather than fattening the ledger. +- **Do not silently change frontier class.** If the row turns out to be evidence-gated or wait-gated rather than buildable-now, stop and reconcile the classification instead of forcing a ceremonial build. +- **Do not launder ownership.** If the build wants to move single-owner logic into a shared layer (or pull shared logic back into a single owner), stop and re-scope the row explicitly rather than smuggling a topology decision through coverage execution. ## Red diff --git a/.agents/skills/ln-consult/SKILL.md b/.agents/skills/ln-consult/SKILL.md index 5c1027533..02012504f 100644 --- a/.agents/skills/ln-consult/SKILL.md +++ b/.agents/skills/ln-consult/SKILL.md @@ -102,7 +102,7 @@ Spikes are the escape hatch, not the default. | Plausible interpretations diverge; examples would clarify faster than open-ended questioning | structural | `ln-disambiguate` | | Understanding exists, needs a written spec | structural | `ln-spec` | | Spec exists, needs work sequencing | structural | `ln-plan` | -| A capability layer is load-bearing as a whole but vertical slices keep leaving it shallow | structural | `ln-plan` — author a coverage frontier (see `ln-plan` §Horizontal coverage frontiers) | +| A capability layer is load-bearing as a whole but vertical slices keep leaving it shallow | structural | `ln-plan` — author a coverage frontier only if the admission gate in `ln-plan/references/coverage.md` passes | | Verification strategy is the main uncertainty | structural | `ln-oracles` | | Next work item needs precise boundaries | structural or bounded | `ln-scope` | | One settled frontier item needs several small verified commits in sequence | bounded, hardening | `ln-scope` then serial `ln-build` loop, optionally via a `Mode: chain` scope file under `memory/cards/` | diff --git a/.agents/skills/ln-plan/SKILL.md b/.agents/skills/ln-plan/SKILL.md index 1c6d53bbd..92188054b 100644 --- a/.agents/skills/ln-plan/SKILL.md +++ b/.agents/skills/ln-plan/SKILL.md @@ -126,6 +126,8 @@ A plan may contain a mix of postures across its `Active` / `Next` frontiers. Loa ### Horizontal coverage frontiers (frontier *shape*, not a posture) +Load [`references/coverage.md`](references/coverage.md) whenever a candidate frontier might be coverage, or when reclassifying a live coverage frontier. + Posture answers *how to rank the next vertical slice*; it carries **no completeness test**. Vertical tracers touch a horizontal capability layer (for example "the agent's READ tools as a whole") only as far as each claim needs, so a load-bearing layer can stay permanently shallow while every individual slice is still "done." A **coverage frontier** fills that gap. It is a different frontier *shape*, not a third posture: it adds no row-level execution mechanics — each row is still built under `proving` or `earned`. What it adds is a layer-level **aggregate definition of done**: *no required row in a closed enumerated inventory is left open.* @@ -146,11 +148,13 @@ If you cannot close the enumeration, it is not a coverage frontier — stay trac Each ledger row declares its own **fill mode** — `proving` if the row still carries an unknown, `earned` if it is settled-but-unbuilt. `ln-build` closes rows; the frontier completes when no `●` row remains in a `spec` / `new` / `partial` state — the ledger DoD, not a single tracer claim. -**Maturity gate.** The coverage shape is young. Treat it as a recognized scope-file mode, **not** a canonical posture or doc type. Promote it to first-class (a `references/coverage.md` posture, a canonical coverage store) only on rule-of-three — at least three real coverage cases *and* a recurring need for row-level mechanics beyond "closed ledger + per-row proving/earned." Until then, do not add a third posture reference or an alternate planning store. +**Maturity gate.** The rule-of-three is now met in this repo: the elicitor cross-cut, graph observed-shapes, and the current runtime/exchange follow-ons exposed recurring row-level failure modes. Coverage therefore now has a dedicated planning reference, but it remains a **frontier shape**, not a third certainty posture or an alternate planning store. + +**Sequencing precedence.** If a temporary coverage ledger remains open only because a required row has been promoted into `PLAN`, that promoted frontier outranks new unrelated coverage frontiers by default. Do not let "new breadth we could also do" preempt "the last required row that closes the still-live ledger" unless the user explicitly chooses that deprioritization. ## Procedure -0. Read `.pi/POSTURE.md` if present for the project's default certainty posture. For each `Active` / `Next` frontier, check for an explicit `Certainty:` override and load the matching reference (`references/proving.md` or `references/earned.md`). Load both when the plan is mixed. +0. Read `.pi/POSTURE.md` if present for the project's default certainty posture. For each `Active` / `Next` frontier, check for an explicit `Certainty:` override and load the matching reference (`references/proving.md` or `references/earned.md`). Load both when the plan is mixed. If any frontier candidate is or may be coverage-shaped, also load `references/coverage.md`. 1. Read `memory/PLAN.md` if it exists. Identify existing frontier ids and retire/archive stale completed material into `docs/archive/PLAN_HISTORY.md`. 2. Read `memory/SPEC.md` if it exists. Pull only the live requirements, assumptions, decisions, and invariants that still constrain forward work. 3. Explore the codebase enough to understand real boundaries. diff --git a/.agents/skills/ln-plan/references/coverage.md b/.agents/skills/ln-plan/references/coverage.md new file mode 100644 index 000000000..2c421f7b8 --- /dev/null +++ b/.agents/skills/ln-plan/references/coverage.md @@ -0,0 +1,85 @@ +# Planning shape: coverage frontier + +Load this reference whenever a frontier candidate is being classified as a **coverage frontier**, when scoping a `Mode: coverage` ledger, or when syncing a live coverage frontier against code and temporary ledgers. + +Coverage is a **frontier shape**, not a third certainty posture. Each row still executes under `proving` or `earned`. + +## Objective function + +Optimize for **breadth closure** across one named load-bearing layer without widening the layer. A coverage frontier is valuable when the layer's value *is* its closed inventory, and vertical tracers keep leaving that inventory permanently shallow even though each tracer is locally correct. + +## Admission gate + +A frontier is coverage **only when all of these hold**: + +1. **Named layer, load-bearing as a whole.** The thing being planned is a real layer or capability family whose value depends on its breadth (for example: an observed-shape inventory, a renderer family, a tool surface) rather than one vertical claim. +2. **Closeable inventory.** You can enumerate the layer up front without reading future implementation tea leaves. If the list is expected to keep growing as you build, it is not coverage. +3. **Required vs deferred marking.** Rows can be marked `●` vs `○` honestly. +4. **Owner + oracle per required row.** Every required row has one canonical owner and one closure oracle. If you cannot say who owns the row or how you would know it is closed, the row is still fog. +5. **Authority split is explicit.** If a temporary ledger exists outside `memory/PLAN.md`, it inventories rows only. `memory/PLAN.md` still owns frontier ids, sequencing, and promoted work. + +If any gate fails, do **not** use coverage mode. Stay tracer-shallow, or route to `ln-spec`, `ln-design`, `ln-spike`, or ordinary `ln-plan` work first. + +## Buildability classes + +Every coverage frontier must be classified as exactly one of: + +- **Buildable-now** — required rows are derivable from product state and source-of-truth inputs that already exist. +- **Evidence-gated** — the inventory is enumerable, but one or more required rows need a spike, measurement pass, or probe verdict before the frontier can honestly widen. +- **Wait-gated** — the inventory is enumerable, but one or more required rows depend on product state or a forcing function that does not exist yet. Do not scope cold. + +Do not blur these classes. + +- If the frontier needs measurement before widening, it is **evidence-gated**, not buildable-now. +- If the frontier needs a future UI/control/product-state seam to exist before rows can be derived honestly, it is **wait-gated**, not buildable-now. +- A ledger may carry **tripwired deferred rows** inside a buildable-now frontier, but those rows stay `○` and explicitly gated; they do not count as hidden required work. + +## Required frontier content + +Every coverage frontier definition must make these things explicit: + +- **Boundary** — what is in the layer, and what is explicitly out. +- **Aggregate DoD** — usually "no required row remains in `spec` / `new` / `partial`." +- **Inventory authority** — where the closed ledger lives. +- **Classification** — buildable-now, evidence-gated, or wait-gated. +- **Why now / unlocks** — why this breadth pass belongs in sequence now. +- **Promotion / disposal rule** — how temporary-ledger rows escape into `PLAN`, and when the temporary ledger is actually exhausted. + +## Row discipline + +Each row is still a thin vertical fill, not a mini-frontier. Keep rows honest: + +- **One row = one capability.** Not a grab-bag, not "and", not a disguised refactor plan. +- **Declare the canonical owner.** If the logic is single-owner, keep it in the owning domain. Shared layers earn existence only when the row is genuinely reusable or carries shared semantics. +- **Name the source-of-truth inputs.** If the proposed derivation or legality decision needs inputs the row does not actually have, the row is wrongly scoped. +- **Name the closure oracle.** Coverage without a closure oracle is category theatre. +- **Tripwire real product-state gates.** If a row depends on missing product state, mark it deferred/tripwired; do not smuggle it into required work. + +Adding a missing row mid-flight is allowed only when it records a genuinely omitted capability with a one-line justification. If you discover **more than one** new row, or a new sub-seam, the inventory was not actually closed — stop and route back through `ln-plan`. + +## Temporary-ledger protocol + +Temporary ledgers are allowed for a bounded cross-cut, but their authority is narrow. + +- `memory/PLAN.md` owns frontier ids, ordering, and dependency judgment. +- The temporary ledger owns only the row inventory and its aggregate DoD. +- A row that escapes row-sized work gets **promoted** into `PLAN`, but the row stays open in the temporary ledger until that promoted frontier actually lands. +- A temporary ledger is **not exhausted** while any required row is still `spec`, `new`, or `partial` — including a row whose owner cell says "promoted → PLAN ". +- If the last open required row has been promoted into `PLAN`, that promoted frontier gets **sequencing precedence** over new unrelated coverage frontiers unless the user explicitly chooses otherwise. Do not declare the temporary ledger "handled enough" and start fresh breadth work by inertia. + +## Anti-patterns + +- **Category laundering.** Calling something "coverage" because it feels broad, even though the inventory is not actually closeable. +- **Shape laundering.** Smuggling a new abstraction or topology decision under the safer-sounding label of "coverage ledger." +- **Consumer bleed-through.** Promoting a shape to every consumer because one consumer needs it. +- **Wrong-input derivation.** Scoping a shared derivation whose declared inputs cannot possibly justify the promised legality, ranking, or selection behavior. +- **Residue denial.** Declaring a cross-cut or temporary ledger exhausted while a required row is still open, merely because it has an owner now. +- **Sequencing leakage.** Opening a new coverage frontier while the previous temporary ledger's closing row is still the last open required work. +- **Symmetry regrowth.** Reintroducing deleted stubs or families because the layer "ought to have one of those," without a row that earned it. + +## Skill handoffs + +- **`ln-plan`** decides whether the coverage admission gate really passes, classifies the frontier, and sequences promoted rows honestly. +- **`ln-scope`** must name the row boundary, canonical owner, source-of-truth inputs, closure oracle, and any tripwire or gate before writing the ledger or a row-sized slice. +- **`ln-build`** must stop when a row changes class (buildable-now ↔ evidence-gated ↔ wait-gated), needs wider inputs than scoped, or discovers that the inventory was not actually closed. +- **`ln-sync`** must reconcile contradictions between `PLAN`, temporary ledgers, and code reality in the same pass — especially exhaustion claims, promoted-row ownership, and sequencing precedence. diff --git a/.agents/skills/ln-scope/SKILL.md b/.agents/skills/ln-scope/SKILL.md index 2413e780b..7a4c1630b 100644 --- a/.agents/skills/ln-scope/SKILL.md +++ b/.agents/skills/ln-scope/SKILL.md @@ -104,7 +104,19 @@ Chain discipline: A `Mode: coverage` scope file is the execution artifact for a **horizontal coverage frontier** (see [`ln-plan`](../ln-plan/SKILL.md) §Horizontal coverage frontiers). Where `single` / `chain` files group *vertical* slices, a coverage file holds a **closed enumerated ledger** of one capability layer, and its definition of done is *aggregate*: every required row closed. -Write one only when `ln-plan` has established a coverage frontier whose three-part gate is satisfied — a named layer that is load-bearing as a whole, a closeable enumeration, and required-vs-deferred marking. If you cannot close the enumeration, do not use this mode; write ordinary vertical cards instead. +Before writing or revising a coverage file, load [`../ln-plan/references/coverage.md`](../ln-plan/references/coverage.md). + +Write one only when `ln-plan` has established a coverage frontier whose admission gate is satisfied. If you cannot close the enumeration, do not use this mode; write ordinary vertical cards instead. + +### Coverage preflight + +Before you write the ledger or scope one row-sized fill, answer these explicitly: + +1. **What is the boundary?** Name what belongs in the layer and what explicitly does not. +2. **What are the source-of-truth inputs for each open required row?** If the row's promised derivation/ranking/legality cannot be justified from those inputs, the row is wrongly scoped. +3. **Who owns each required row, and what closes it?** Name the canonical owner and the closure oracle. +4. **What class is this frontier?** Buildable-now, evidence-gated, or wait-gated. Rows that depend on missing product state stay deferred/tripwired; they are not hidden required work. +5. **Is the inventory still closed?** If scoping reveals more than one genuinely-missing row or a new sub-seam, stop and route back through `ln-plan` instead of quietly growing the ledger. ### Ledger shape @@ -118,6 +130,8 @@ The file body is a coverage ledger — one table per sub-seam if the layer split - **Req:** `●` required for the DoD · `○` deferred. The DoD is "every `●` row is `have` or `built`." - **Fill:** the posture each row's build inherits — `proving` if the row still carries an unknown, `earned` if it is settled-but-unbuilt. A `new` row usually needs a micro-decision (`ln-disambiguate` / `ln-spec`) before it can be filled. +`Owner / next` must point to a real owner — module, card, frontier, or decision — not a vague intention. Use `Notes` to record the source-of-truth inputs and closure oracle when they are not obvious from the row label. For non-buildable rows, `Notes` must also name the evidence gate or wait-state tripwire. + ### Each row is still a vertical fill The file is horizontal; each **row** is built as an ordinary thin slice under its declared fill posture. `ln-build` implements rows and flips their Status to `built`; the row's target *is* the acceptance criterion. A row whose scope turns out to need its own full card may spawn a sibling `single` file — leave a pointer in that row's Owner / next cell rather than fattening the ledger. diff --git a/.agents/skills/ln-sync/SKILL.md b/.agents/skills/ln-sync/SKILL.md index ab302e9cd..4665b2389 100644 --- a/.agents/skills/ln-sync/SKILL.md +++ b/.agents/skills/ln-sync/SKILL.md @@ -145,6 +145,12 @@ Scan recent code / commits for: - active work not represented in `memory/PLAN.md` sequencing or frontier definitions - stale references between `memory/PLAN.md` and `memory/SPEC.md`, especially PLAN links to retired assumptions / decisions / invariants - equivalent facts that should merge instead of coexisting +- coverage frontiers whose class (`buildable-now`, `evidence-gated`, `wait-gated`) no longer matches code reality or the live cards +- coverage rows missing a named owner, closure oracle, or source-of-truth inputs where the row's behavior is not otherwise self-evident +- temporary ledgers declared exhausted while a required row is still `spec` / `new` / `partial`, including rows that have merely been promoted into `PLAN` +- promoted last-open coverage rows that are sequenced behind unrelated new coverage frontiers without an explicit user reprioritization +- coverage cards whose promised derivation or legality logic cannot be justified from the source-of-truth inputs named in the card +- coverage ledgers that grew multiple `new` rows mid-flight, signaling that the inventory was not actually closed - prepared cards in scope files under `memory/cards/` that should be retired, re-scoped, or reconciled into the next thread's live state - stale derivative artifacts that should be deleted after reconciliation - cross-cutting subsystems that appear only in glossary/design-doc links but are required by multiple active/next frontiers @@ -180,6 +186,9 @@ Produce a concise sync report and make the edits. ### Drift fixed - [concept / decision / frontier / traceability updates made] +### Coverage protocol audit +- [classification repairs, temporary-ledger contradictions, promotion/ordering fixes, or `none`] + ### Retirement assessment - [whether embedded items were sufficiently retired, or whether a stronger protocol / follow-up frontier is needed] @@ -192,6 +201,7 @@ Before finishing, perform a cross-skill preservation check: - If a later agent read only `memory/SPEC.md` and `memory/PLAN.md`, what durable design choices from `ln-design` would they miss? - What verification architecture or loop-tier strategy from `ln-oracles` or canonical docs would they miss? - What cross-cutting obligations would disappear because they are carried only by links, not by live rows or frontier definitions? +- Would they know which temporary coverage ledgers are still live, which promoted rows still keep those ledgers open, and why those rows sequence where they do? - Do any topology READMEs under `src/**/` still cite SPEC IDs or describe topology this sync just changed? Reconcile those READMEs as part of the sync, not as a follow-up. If any answer is non-empty, sync is incomplete. diff --git a/docs/praxis/ln-skills.md b/docs/praxis/ln-skills.md index cb006bfbd..6e9362c6e 100644 --- a/docs/praxis/ln-skills.md +++ b/docs/praxis/ln-skills.md @@ -75,7 +75,7 @@ Regression earned → proving is a state transition, not a third mode: downgrade #### Coverage frontiers (a frontier shape, not a posture) -Posture ranks the next *vertical* slice; it has no completeness test, so vertical tracers can leave a horizontal capability layer permanently shallow while every slice is "done." A **coverage frontier** closes that gap with a layer-level **aggregate DoD** — "no required row in a closed enumerated inventory is left open" — while each row still builds under `proving` or `earned`. It is therefore a different frontier *shape*, not a third posture, and it does not relax the anti-sprawl norm: it fires only over a **closed, enumerated** surface (named load-bearing layer, up-front inventory, required-vs-deferred marking). `ln-plan` recognizes and bounds it; the row ledger lives in a `Mode: coverage` scope file under `memory/cards/` (authored via `ln-scope`); `ln-build` closes rows. The shape is young: do not promote it to a canonical posture or doc type before rule-of-three. +Posture ranks the next *vertical* slice; it has no completeness test, so vertical tracers can leave a horizontal capability layer permanently shallow while every slice is "done." A **coverage frontier** closes that gap with a layer-level **aggregate DoD** — "no required row in a closed enumerated inventory is left open" — while each row still builds under `proving` or `earned`. It is therefore a different frontier *shape*, not a third posture. The rule-of-three is now met in this repo, so coverage has a first-class planning reference at `.agents/skills/ln-plan/references/coverage.md`: use it for the admission gate, buildability classes (`buildable-now` / `evidence-gated` / `wait-gated`), temporary-ledger protocol, and anti-patterns (`category laundering`, `wrong-input derivation`, `residue denial`, `sequencing leakage`, `symmetry regrowth`). `ln-plan` recognizes and bounds the frontier; the row ledger lives in a `Mode: coverage` scope file under `memory/cards/` (authored via `ln-scope`); `ln-build` closes rows; `ln-sync` audits the contradictions coverage mode tends to create. #### Posture distribution across skills From e9a9fd458b5bb7fd0ea2ad6f6fae3421ab5a4546 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 11:47:59 +0200 Subject: [PATCH 15/17] Add runtime affordances coverage ledger --- memory/PLAN.md | 7 +- memory/SPEC.md | 6 +- src/.pi/agents/state.ts | 43 +---- src/projections/session/affordances.test.ts | 58 +++++++ src/projections/session/affordances.ts | 51 ++++++ src/projections/session/runtime-policy.ts | 89 ++++++++++ src/session/README.md | 26 +++ .../runtime-affordances-coverage.test.ts | 161 ++++++++++++++++++ 8 files changed, 396 insertions(+), 45 deletions(-) create mode 100644 src/projections/session/affordances.test.ts create mode 100644 src/projections/session/affordances.ts create mode 100644 src/session/runtime-affordances-coverage.test.ts diff --git a/memory/PLAN.md b/memory/PLAN.md index d972f4c8c..49e9009e4 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -43,8 +43,7 @@ The remaining coverage frontiers are being deliberately de-fogged rather than le 1. `poc-live-ship-gate` — final fresh-cwd runbook remains the delivery gate, but its prepared live-mention-autocomplete slice is currently parked off the critical path. 2. `elicitation-driver` — **first coverage follow-on**: it closes the last open required cross-cut row (Seam 3a `"what to ask next" driver`) and retires the temporary dual-plan state, so it sequences ahead of any fresh coverage frontier. Buildable-now on the FE-823 substrate; not POC-ship-critical. -3. `runtime-affordances-and-legality` — coverage frontier for shared posture legality/default surfaces; **buildable-now** and scoped as an inventory ledger (`memory/cards/runtime-affordances--coverage-ledger.md`). It is **parallel-eligible but must not preempt closing the cross-cut**: do not let it pull planning back into a second open frontier before `elicitation-driver` lands. (It writes disjoint paths from `elicitation-driver`, so it may run as a concurrent worktree stream — just not *instead of* the cross-cut closer.) -4. `capture-quality-spike` — evidence spike that measures generalized-capture fitness (A22-L) so `exchanges-and-generalized-capture` can graduate from horizon on real evidence rather than waiting (`memory/cards/capture-quality--fitness-spike.md`). +3. `capture-quality-spike` — evidence spike that measures generalized-capture fitness (A22-L) so `exchanges-and-generalized-capture` can graduate from horizon on real evidence rather than waiting (`memory/cards/capture-quality--fitness-spike.md`). ### Parallel / Low-conflict @@ -212,7 +211,7 @@ The remaining coverage frontiers are being deliberately de-fogged rather than le - **Name:** Runtime affordances and legality surface - **Linear:** unassigned - **Kind:** structural -- **Status:** next +- **Status:** done - **Certainty:** proving - **Lights up:** A shared affordance/default-on-switch projection across TUI, web, and RPC if runtime posture controls widen again. - **Stabilizes:** D40-L's projection-as-truth model and the shared legality/default semantics over goal/strategy/lens. @@ -226,7 +225,7 @@ The remaining coverage frontiers are being deliberately de-fogged rather than le - **Cross-cutting obligations:** Keep truth append-only in `brunch.agent_runtime_state`; affordances are pure derivations over shared tables. Do not add xstate or a persisted machine without new evidence. - **Traceability:** D25-L, D40-L, D59-L, D66-L. - **Design docs:** `memory/SPEC.md` D40-L/D59-L; `src/projections/README.md`; `src/session/README.md`. -- **Current execution pointer:** Being scoped as a coverage ledger in `memory/cards/runtime-affordances--coverage-ledger.md`. The classification is **buildable-now, not parked**: the core is one Brunch-owned `affordances(resolvedState)` derivation over legality/default tables that already exist in `src/projections/session/runtime-policy.ts` and `src/.pi/agents/state.ts`. Only the `active-review-set` and freestyle-vs-structured `turn-mode` rows are genuinely product-state-gated; they stay tripwired in the ledger, not built speculatively. +- **Current execution pointer:** Done 2026-06-08. `src/projections/session/affordances.ts` now owns the shared `(resolvedState, readinessGrade)` derivation for legal goal/strategy/lens options plus default-on-switch values, reusing the same grade/AUTO legality source consumed by `.pi/agents/state.ts`; `src/session/README.md` owns the closed coverage ledger and `src/session/runtime-affordances-coverage.test.ts` guards required agent/RPC rows while leaving `active-review-set` and `turn-mode` as explicit product-state-gated deferrals. ### exchanges-and-generalized-capture diff --git a/memory/SPEC.md b/memory/SPEC.md index 2246d8089..bb98d95c3 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -128,7 +128,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c - **D2-L — Brunch is an opinionated product, not a pi platform shell.** The POC hardcodes its toolset, system prompt, and policy doctrine; scopes state to `.brunch/`; and hides pi's generic extension surface from end users. Depends on: A1-L. Supersedes: —. - **D39-L — Brunch owns sealed Pi settings plus an explicit Brunch extension bundle around the embedded harness.** Product behavior must come from Brunch-owned programmatic policy, not ambient Pi discovery. `src/.pi/brunch-pi-settings.ts` owns settings policy, resource-loader policy, and offline defaults: it creates an in-memory Brunch-owned `SettingsManager` policy instead of reading ambient global/project `.pi/settings.json`, disables ambient context files, extensions, prompt templates, skills, and themes, and defaults Brunch-launched Pi to offline mode; Pi source confirms extension `resources_discover` can still inject explicit Brunch-owned skill/prompt/theme paths even when `noSkills`/`noPromptTemplates`/`noThemes` disable ambient discovery. `src/.pi/brunch-pi-extensions.ts` owns the explicit Brunch extension factory: it statically imports product extension registrars and registers them from a fixed ordered list rather than ambient discovery. That explicit list must not filesystem-discover or dynamically `import()` extension modules at runtime, because a Brunch-internal discovery layer is itself the discovery this decision rejects. Each product extension exposes one registrar taking explicit dependencies, and the extension bundle wires those dependencies at the call site; the `default` exports under `src/.pi/extensions/*` exist only for dev `/reload` iteration, not as a product load path. Product extension modules live under `src/.pi/extensions/*`, and reusable Pi TUI components live under `src/.pi/components/*`, so they can also be iterated by launching Pi from `src/` and using `/reload`; the root project-local `.pi/` probe runtime files are retired and must not be treated as product configuration. Test files must not live directly under auto-discovered `.pi/extensions` or `.pi/components` resource directories; extension/component tests live under `src/.pi/__tests__/`. The settings boundary owns the audited behavior-shaping settings list in code (`BRUNCH_SETTINGS_POLICY` / `BRUNCH_SETTINGS_AUDITED_GETTERS`), with hostile ambient settings and reload-resilience tests covering shell path/prefix, npm command, ambient resources, skill commands, double-escape behavior, compaction/retry, image/terminal/UI, transport/theme/changelog, and telemetry settings. Remaining sealed-Pi work is runtime-state/prompt/tool posture, not ambient settings file leakage. Depends on: D1-L, D2-L, A19-L. Supersedes: treating `noSkills: true` as full profile isolation, relying on user/project `.pi/` defaults to be harmless, nesting Brunch's product extension modules under `src/.pi/extensions/brunch/`, or replacing the explicit static extension list with a Brunch-internal filesystem-discovery / `brunchExtensionMeta` / `loadOrder` mechanism as the product runtime load path. - Tooling exception: the worktree helper extension now lives outside this repository under the user Pi agent tree (`~/.pi/agent/extensions/worktree/index.ts`) for direct Pi sessions only. It is not a Brunch product extension, is not imported by `src/.pi/brunch-pi-extensions.ts`, and does not weaken the sealed Brunch Pi settings/extensions boundary; Brunch-launched product sessions continue to disable ambient `.pi/` discovery unless deliberately imported. The extension may register direct-Pi `/worktree:switch` / `switch_worktree` and `/worktree:create` / `create_worktree` affordances, but Brunch does not test, package, or document it as a product extension. -- **D40-L — Runtime state is transcript-backed Brunch session-agent state, not hidden extension memory.** `src/session/runtime-state.ts` owns the transcript entry facts (`brunch.agent_runtime_state` schema, parser, and init/switch append helpers); `src/projections/session/runtime-state.ts` owns the pure reusable projection and `src/projections/session/runtime-policy.ts` owns operational-mode/role policy definitions. The projection reconstructs agent posture from linear `brunch.agent_runtime_state` entries (`reason: "init" | "switch"`), last-writer-wins at turn preparation and over `session.runtimeState`; default/empty slots are explicit when no entry family exists. Runtime-state entries are Pi JSONL state-change facts, not assistant/user chat content: init and switch entries should render, when visible, as dim non-chat state rows analogous to Pi thinking/model-change rows, and must not enter LLM context as ordinary conversation. Its axes are `op_mode` (`elicit`, future `execute`) plus optional, AUTO-able objective axes `strategy`, `lens`, and `goal` (D25-L, D59-L). **Posture switches (durable `reason: "switch"` entries) are a user/system authority: the foreground agent never emits a posture switch.** The agent's only in-axis freedom is `AUTO` (per-turn implicit selection from the D58-L manifest); what it actually chose each turn is legible downstream via per-emission facet stamping (D25-L), not via runtime-state — so runtime-state is the *frame/constraints* while emitted facets carry the agent's per-turn choice. User-mutable axes are `op_mode`, `strategy`, and `lens`; `goal` is internal/grade-derived and not part of the user posture-change surface for now (D59-L). On a parent switch that invalidates a child axis, the child defaults to `AUTO`. The `source: "agent"` entry value is reserved — no current path emits it; it is parked for a future execute-mode orchestrator that might legitimately steer sub-postures. `session.runtimeState` also exposes shaped mention slots, world-update watermarks (latest graph LSN and optional git head, without raw transcript detail bags), and lifecycle facts when transcript-backed entries make them computable; this is a projection contract, not a mutable state table. The **foreground session agent** (`elicitor` now, future `executor`) is *derived* from `op_mode`, not stored; the other agent roles (`reviewer`, `reconciler`, future `scout`/`researcher`) are async sub-agent/side-chain workers (D29-L, D44-L) invoked out-of-band, never part of the session state machine. `op_mode` gates tool authority, applied by `src/.pi/extensions/runtime/index.ts` (current `elicit` policy denies side-effecting `bash`/`edit`/`write` plus user-shell interception) while `.pi` reuses session-owned entry definitions and projected policy. Prompt composition is a separate concern (D58-L). Depends on: D17-L, D23-L, D25-L, D39-L, D58-L, D59-L. Supersedes: mode-only vocabulary, extension-local mutable state as authority, storing the foreground role as independent session state, the "runtime bundle / role preset" as one knob deriving model/thinking/resources, and binding prompt-resource location to `src/.pi/context/`. +- **D40-L — Runtime state is transcript-backed Brunch session-agent state, not hidden extension memory.** `src/session/runtime-state.ts` owns the transcript entry facts (`brunch.agent_runtime_state` schema, parser, and init/switch append helpers); `src/projections/session/runtime-state.ts` owns the pure reusable projection, `src/projections/session/runtime-policy.ts` owns operational-mode/role policy plus shared grade legality tables, and `src/projections/session/affordances.ts` owns the pure `(resolvedState, readinessGrade) → legal options + default-on-switch` derivation for goal/strategy/lens. The projection reconstructs agent posture from linear `brunch.agent_runtime_state` entries (`reason: "init" | "switch"`), last-writer-wins at turn preparation and over `session.runtimeState`; default/empty slots are explicit when no entry family exists. Runtime-state entries are Pi JSONL state-change facts, not assistant/user chat content: init and switch entries should render, when visible, as dim non-chat state rows analogous to Pi thinking/model-change rows, and must not enter LLM context as ordinary conversation. Its axes are `op_mode` (`elicit`, future `execute`) plus optional, AUTO-able objective axes `strategy`, `lens`, and `goal` (D25-L, D59-L). **Posture switches (durable `reason: "switch"` entries) are a user/system authority: the foreground agent never emits a posture switch.** The agent's only in-axis freedom is `AUTO` (per-turn implicit selection from the D58-L manifest); what it actually chose each turn is legible downstream via per-emission facet stamping (D25-L), not via runtime-state — so runtime-state is the *frame/constraints* while emitted facets carry the agent's per-turn choice. User-mutable axes are `op_mode`, `strategy`, and `lens`; `goal` is internal/grade-derived and not part of the user posture-change surface for now (D59-L). On a parent switch that invalidates a child axis, the child defaults to `AUTO`. The `source: "agent"` entry value is reserved — no current path emits it; it is parked for a future execute-mode orchestrator that might legitimately steer sub-postures. `session.runtimeState` also exposes shaped mention slots, world-update watermarks (latest graph LSN and optional git head, without raw transcript detail bags), and lifecycle facts when transcript-backed entries make them computable; this is a projection contract, not a mutable state table. The **foreground session agent** (`elicitor` now, future `executor`) is *derived* from `op_mode`, not stored; the other agent roles (`reviewer`, `reconciler`, future `scout`/`researcher`) are async sub-agent/side-chain workers (D29-L, D44-L) invoked out-of-band, never part of the session state machine. `op_mode` gates tool authority, applied by `src/.pi/extensions/runtime/index.ts` (current `elicit` policy denies side-effecting `bash`/`edit`/`write` plus user-shell interception) while `.pi` reuses session-owned entry definitions and projected policy. Prompt composition is a separate concern (D58-L). Depends on: D17-L, D23-L, D25-L, D39-L, D58-L, D59-L. Supersedes: mode-only vocabulary, extension-local mutable state as authority, storing the foreground role as independent session state, the "runtime bundle / role preset" as one knob deriving model/thinking/resources, and binding prompt-resource location to `src/.pi/context/`. - **D34-L — Command containment separates visibility suppression from effect blocking.** Current Pi extension seams can hide unsupported slash suggestions with autocomplete wrapping and can cancel branch/session effects through lifecycle hooks, but they cannot strictly suppress exact interactive built-in commands before `InteractiveMode` dispatches them. Brunch-owned commands must use product-specific names and route writes through Brunch handlers/`CommandExecutor`; extension command collisions are not an override mechanism. Strict built-in command/keybinding policy is a Pi upstream/API ask, while POC safety relies on hiding generic affordances, blocking dangerous effects (`/fork`, `/clone`, `/tree`, raw session replacement), and failing fast on branched transcripts. Brunch's command-policy code should live in `src/.pi/extensions/commands/policy.ts`, merging branch/session-effect blocking with any product command allow/deny behavior instead of preserving a branch-only module. Depends on: D2-L, D24-L, A18-L. Supersedes: treating extension `input` handlers or command-name collisions as built-in command allowlisting. - **D35-L — Dynamic TUI chrome is a Brunch projection wrapper over Pi UI primitives.** Downstream TUI affordances should call a Brunch-owned renderer (`renderBrunchChrome` or its successor) with one activated product-state value rather than scattering raw `ctx.ui.setHeader`, `setFooter`, `setWidget`, title, or working-indicator calls. The wrapper is stateless projection over canonical workspace/session/graph facts, including the discovered project name, selected spec, and real activated session id/label, while its TUI footer compositor may read Pi footer telemetry (`getGitBranch`, foreign `getExtensionStatuses`) at render time. Brunch chrome and startup dialog are project-first shell surfaces with selected-spec context: the project name labels the cwd container, the spec title labels the selected graph, and the session label distinguishes transcript instances. Brunch chrome does not publish a `brunch.chrome` status key; `ctx.ui.setStatus(key, text)` remains a lateral contribution channel for other extensions and future dynamic Brunch state. RPC clients should rely only on surfaces Pi actually emits for the wrapper (currently diagnostic widget/title, plus any future explicit status adapter) because header/footer/working-indicator are TUI-only in current Pi RPC mode. Session display names are product projections over Pi session metadata: every Brunch-created session should immediately receive a neutral workspace-global `Untitled Session N` `session_info` label, and later user/generated names may characterize the transcript without replacing spec identity or graph truth. Depends on: D2-L, D21-L, D34-L, A18-L. Supersedes: treating Pi UI methods as direct downstream affordance APIs, rendering placeholder session state such as `unbound` after a session is activated, consuming the status-key namespace for chrome's own static summary, using spec title as the default session label, or allowing two unchanged Brunch-created default names to collide in one cwd. - **D52-L — Source topology targets `src/{app, workspace, scripts, .pi, db, graph, session, projections, renderers, rpc, web}` with directed layer dependencies.** Product entrypoints live under `src/app/`, local executable utility ownership is reserved under `src/scripts/`, package/workspace identity tests live under `src/workspace/`, and reusable projection/rendering modules live under top-level `src/projections/` and `src/renderers/` rather than whichever domain or adapter first needed them. `app/` owns product host entrypoints and wiring. `workspace/` owns cwd/package/workspace identity helpers. `scripts/` owns local executable utilities. `.pi/` is the sealed Pi-harness runtime surface: `agents/` owns runtime prompt assembly, role definitions, legal resource manifests, and agent-context orchestration; `skills/` owns goal/strategy/lens/method markdown resources read on demand; `components/` owns reusable Pi TUI/message components; `extensions/` owns Pi registrars for tools, hooks, commands, chrome, context tools, system-prompt append, exchanges, graph tools, workspace dialogs, runtime policy, and session lifecycle. `graph/` is the domain layer: CommandExecutor, readers, policy, validators, query bucketing, change-log replay, reconciliation-need substrate; it imports from `db/` (Drizzle schema, migrations, connection lifecycle) and no other layer imports `db/` directly. `session/` owns transcript projection, exchange extraction, workspace coordination, session binding, runtime-state transcript entries, and LSN staleness tracking over Pi JSONL. `projections/` owns structured DTOs derived from graph/session/workspace/tool facts; it must not render lossy text and must not import adapters, transports, app entrypoints, or web code. `renderers/` owns lossy text/markdown/toon/tool-content rendering over domain or projection inputs; it may import input types from `graph/`, `session/`, or `projections/` as needed, but must not import adapters, transports, app entrypoints, or web code. `rpc/` owns Brunch JSON-RPC handlers. `web/` owns the React client. Dependency direction: `.pi/`, `rpc/`, and `app/` may import from `graph/`, `session/`, `projections/`, and `renderers/`; `.pi/agents/` may import from `graph/`, `session/`, `projections/`, and `renderers/` to build agent context; `.pi/extensions/` may import from `.pi/agents/` and `.pi/components/`; `projections/` may import from `graph/`, `session/`, and `workspace/`; `renderers/` may import from `projections/`, `graph/`, and `session/`; `graph/` imports from `db/`; `web/` is a standalone build target. Depends on: D2-L, D4-L, D39-L, D40-L. Supersedes: scattering session domain files at `src/` root; treating Pi-only agents as a host-independent top-level `src/.pi/` layer; nesting prompt composition under `src/.pi/context/`; treating reusable `project` / `format` helpers as owned by whichever adapter first needed them. @@ -281,7 +281,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | I22-L | Brunch TUI startup must not render prior session transcript entries or enter an agent loop until the user has explicitly activated a spec/session decision; creating a new spec implicitly creates its first session, creating a new session for an existing spec lands in a binding-only session, resuming a prior transcript is opt-in, and RPC/headless startup exposes structured initial-selection state rather than invoking TUI picker code. | covered (FE-744 coordinator tests; hierarchical spec/session picker model + component tests; `workspace.selectionState` / `workspace.activate` JSON-RPC contract tests with source assertion that RPC does not import TUI picker code; `src/probes/scripts/verify-startup-no-resume.sh` pty/ANSI-stripped TUI probe oracle proving stale transcript text is absent before explicit activation) | D11-L, D21-L, D22-L, D36-L | | I23-L | Every structured elicitation interaction that owns the response surface persists durable semantic display only through Pi `toolResult` rows rendered by `renderResult`; `renderCall` and live `ctx.ui.*` surfaces are transient. A structured-exchange tuple has a recoverable `present_*` result and, when required, exactly one matching terminal `request_*` result before the next agent turn consumes it. The target details model is checked by `schema` + `v`, `exchange_id`, and `tool_meta`; request outcomes are an exactly-one property-presence union; user-authored text is `comment` and runtime-authored text is `message`; present-side status/kind/expected-request aliases and capture graph payloads are invalid in the Zod-authored schema layer. `toolResult.content` is rich markdown suitable for both TUI transcript display and model context; `toolResult.details` carries structured projection/recovery data. | covered for current structured-exchange tools (registered sequential `present_question`, `present_options`, `present_review_set`, `request_answer`, `request_choice`, `request_choices`, and `request_review`; runtime details are emitted from canonical `schema`/`v`/snake_case Zod shapes; tests cover non-semantic `renderCall`, markdown `renderResult`, present/request details, unmatched-present recovery, active-vs-stub registry, JSON-editor fallback for multi-choice, terminal `answered`/`cancelled`/`unavailable` projection closure, option content/rationale parity, review-set `nodes`/`edges` details parity, invalid review proposal non-recovery, review pending-exchange recovery, public-RPC deterministic permutations, capture response-to-graph proof, and same-assistant-message `present_options → request_choice` ordering over a real Pi RPC run. The Zod-authored schema layer is covered by JSON Schema export, drift-rejection, and source-boundary tests for present/request/capture details. `present_candidates` remains a named stub and intentionally unregistered.) | D12-L, D13-L, D17-L, D37-L, D38-L, D41-L | | I24-L | A Brunch-launched Pi runtime does not load ambient user/project Pi context files, extensions, skills, prompt templates, themes, or behavior-shaping settings unless Brunch's sealed Pi settings/extension boundary explicitly allows them; Brunch-owned extension-discovered resources are identified as intentional product resources. | covered for TUI-launch settings/extension boundary by contract tests: ambient resource flags and explicit extension factories are preserved; hostile ambient global/project settings are ignored by the in-memory Brunch settings policy before and after reload; audited Pi settings getters are tracked in `src/.pi/brunch-pi-settings.ts`. Subagent subprocess inheritance remains future coverage under I29-L. | D2-L, D39-L | -| I25-L | The active `op_mode`, `strategy`, `lens`, and `goal` are reconstructable from linear `brunch.agent_runtime_state` entries at turn start and through `session.runtimeState`; concrete axis ids stay separate from the `auto` selection sentinel; the foreground session-agent role is derived from `op_mode`, not separately stored; tool gating follows the reconstructed `op_mode` so `elicit` cannot use execute/dangerous tools such as raw `bash`/`write` unless explicitly permitted. Runtime-state projection remains transcript-backed and exposes empty/default mention, world-watermark, and lifecycle slots without inventing hidden extension memory. | covered (`src/session/runtime-state.test.ts` covers default state, cumulative last-writer-wins posture, mention/world/lifecycle slot projection, and non-linear rejection; `src/rpc/handlers.test.ts` covers explicit-target `session.runtimeState` discovery/params/spec validation; `src/.pi/__tests__/operational-mode.test.ts` covers append/project/switch helpers over the reconciled axis vocabulary, AUTO selection for every objective axis, init idempotence, previous-state values, malformed/illegal tuple rejection, role derivation from `op_mode`, and Pi JSONL reload projection; `prompting.test.ts` covers prompt/tool-policy projection from the same transcript-backed runtime state, including selected-spec grade activation for commitment-grade `present_review_set` / `request_review` proposal tools; `src/.pi/extensions/runtime/authority-matrix.test.ts` covers the current POC authority matrix for `elicit-read-only`, blocking `bash`/`edit`/`write`, and structured `needs_human` result representability while leaving A18-L strict built-in suppression as residue). | D17-L, D23-L, D40-L, D58-L, D59-L | +| I25-L | The active `op_mode`, `strategy`, `lens`, and `goal` are reconstructable from linear `brunch.agent_runtime_state` entries at turn start and through `session.runtimeState`; concrete axis ids stay separate from the `auto` selection sentinel; the foreground session-agent role is derived from `op_mode`, not separately stored; tool gating follows the reconstructed `op_mode` so `elicit` cannot use execute/dangerous tools such as raw `bash`/`write` unless explicitly permitted. Runtime-state projection remains transcript-backed and exposes empty/default mention, world-watermark, and lifecycle slots without inventing hidden extension memory; legal option/default affordances are pure projections over resolved runtime state plus readiness grade, not persisted state. | covered (`src/session/runtime-state.test.ts` covers default state, cumulative last-writer-wins posture, mention/world/lifecycle slot projection, and non-linear rejection; `src/rpc/handlers.test.ts` covers explicit-target `session.runtimeState` discovery/params/spec validation; `src/.pi/__tests__/operational-mode.test.ts` covers append/project/switch helpers over the reconciled axis vocabulary, AUTO selection for every objective axis, init idempotence, previous-state values, malformed/illegal tuple rejection, role derivation from `op_mode`, and Pi JSONL reload projection; `prompting.test.ts` covers prompt/tool-policy projection from the same transcript-backed runtime state, including selected-spec grade activation for commitment-grade `present_review_set` / `request_review` proposal tools; `src/.pi/extensions/runtime/authority-matrix.test.ts` covers the current POC authority matrix for `elicit-read-only`, blocking `bash`/`edit`/`write`, and structured `needs_human` result representability while leaving A18-L strict built-in suppression as residue; `src/projections/session/affordances.test.ts` covers shared goal/strategy/lens legal options, defaults, AUTO freestyle exclusion, pinned freestyle, and grade-sensitive legality; `src/session/runtime-affordances-coverage.test.ts` guards the required-vs-deferred affordance ledger). | D17-L, D23-L, D40-L, D58-L, D59-L, D66-L | | I27-L | Session display names are presentation metadata only: every Brunch-created session gets a neutral workspace-global default `session_info` label (`Untitled Session N`) at creation, unchanged defaults do not collide across specs in one cwd, later user/generated names may replace the default, and no naming path mutates spec identity, session binding, or graph truth. | planned (creation/boundary tests for workspace-global default allocation across specs and replacement sessions; session-lifecycle naming tests with empty transcript/auth failure/success paths; picker/chrome projection tests read session names when present) | D6-L, D21-L, D35-L, D42-L | | I26-L | Runtime schema-library imports stay deliberately scoped: Zod may appear only in D41-L-acknowledged product/protocol schema seams such as `src/.pi/extensions/exchanges/schemas/`; TypeBox remains valid for unrelated Pi tool parameters, small config/frontmatter contracts, and future Drizzle-derived row schemas; no boundary may hand-author parallel Zod and TypeBox sources for the same shape. Drizzle row/insert/update schemas are not hand-authored alongside their target tables. | covered (structured-exchange schema tests prove Zod parse/export and assert semantic details contracts stay in `src/.pi/extensions/exchanges/schemas/`; the legacy `shared/model.ts` details interface is retired; structured-exchange TypeBox usage is quarantined to the single Pi `TSchema` cast adapter in `src/.pi/extensions/exchanges/pi-schema.ts`; grep-based architectural boundary test in `architecture.test.ts` enforces no direct `db/` imports outside `graph/`; Drizzle derivation via `drizzle-typebox` in `row-schemas.ts`) | D41-L | | I28-L | Auto-compaction output preserves the configured anchor set byte-stable: every entry kind listed in [src/.pi/extensions/compaction/index.ts](file:///Users/lunelson/Code/hashintel/brunch-next/src/.pi/extensions/compaction/index.ts) is reconstructable post-compaction according to its `select` rule (`first | latest | active-leaves | all-unresolved`); LLM-generated narrative summary never replaces or rephrases preserved-anchor content; extension failure falls through to Pi default compaction rather than dropping anchors silently. | planned (compaction round-trip property tests at M9 plus inner-loop anchor-rendering unit tests and TypeBox schema validation of the anchor contract) | D43-L; R15, R13; I3-L, I4-L, I8-L, I12-L | @@ -294,7 +294,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | I35-L | Graph context reads support multiple detail levels: a cursory/compact full-graph overview for orientation, and detailed node-neighborhood context with configurable hop depth for focused work. Context builders in `.pi/agents/contexts/` orchestrate which level to inject or advertise based on mode/goal/strategy/lens/grade. | covered for current POC push path (`getGraphOverview` + `getNodeNeighborhood` in `queries.ts` with 10 tests; `src/.pi/agents/contexts/{graph,node,cwd}.test.ts` covers lens-shaped overview rendering, bounded node-neighborhood rendering, and selected-spec cwd/session/posture context; `src/.pi/__tests__/prompting.test.ts` proves the explicit shell/product prompt path supplies selected-spec-bound graph context to `composeAgentPrompt()`). Pulled context tools are part of the live read surface. | D52-L, D53-L, D58-L | | I36-L | Node `kind` is drawn from a per-plane closed enum structurally validated by the `CommandExecutor`; the intent kind category (basic / structural / reasoning) is a pure function of `kind` and is never stored on the node. | covered (CommandExecutor rejects invalid kind-for-plane; `intentKindCategory` is pure derivation with exhaustive switch; tests in `command-executor.test.ts`) | D54-L, D56-L | | I37-L | `detail` is per-kind validated by the `CommandExecutor`: `decision` and `term` nodes REQUIRE `detail` with their respective sub-schemas; all other kinds must omit `detail`; unknown fields in `detail` are rejected. | covered (detail-required/prohibited/shape tests in `command-executor.test.ts`) | D54-L | -| I38-L | Every Brunch prompt-resource manifest injected for an agent turn is generated from projected runtime state and spec/workspace gates: listed resources are Brunch-owned, readable under the active tool policy, legal for the current `(op_mode × goal × strategy × lens)` / grade / agent allow-list, and off-list resources are not advertised as available. AUTO axes never list illegal choices; pinned axes point to the pinned resource. | covered for current P0 manifest families (`src/.pi/agents/compose.test.ts` covers default header/context/manifest output, AUTO grade/allow-list filtering, pinned singleton resources, illegal pinned grade rejection, and readable `src/.pi/` locations; `src/.pi/__tests__/prompting.test.ts` covers the explicit shell `before_agent_start` product path appending `agents/compose()` output from transcript-projected runtime state and no legacy composer import/resource discovery. Probe fitness may still track whether the agent reads selected resources before use.) | D39-L, D40-L, D58-L, D59-L | +| I38-L | Every Brunch prompt-resource manifest injected for an agent turn is generated from projected runtime state and spec/workspace gates: listed resources are Brunch-owned, readable under the active tool policy, legal for the current `(op_mode × goal × strategy × lens)` / grade / agent allow-list, and off-list resources are not advertised as available. AUTO axes never list illegal choices; pinned axes point to the pinned resource. The shared affordance derivation and prompt manifest filtering use the same grade/AUTO legality source. | covered for current P0 manifest families (`src/.pi/agents/compose.test.ts` covers default header/context/manifest output, AUTO grade/allow-list filtering, pinned singleton resources, illegal pinned grade rejection, and readable `src/.pi/` locations; `src/.pi/__tests__/prompting.test.ts` covers the explicit shell `before_agent_start` product path appending `agents/compose()` output from transcript-projected runtime state and no legacy composer import/resource discovery; `src/.pi/agents/state.test.ts` plus `src/projections/session/affordances.test.ts` cover shared legality/default behavior, including AUTO excluding `freestyle`. Probe fitness may still track whether the agent reads selected resources before use.) | D39-L, D40-L, D58-L, D59-L, D66-L | | I39-L | Every graph node in a spec has exactly one stable projected human reference code derived from `kind` + `kind_ordinal`; `(spec_id, plane, kind, kind_ordinal)` is unique; ordinals are monotonic per `(spec_id, plane, kind)` and are not reused after deletion or supersession. | partially covered (`graph-tool-resilience` added `nodes.kind_ordinal`, `node_kind_counters`, DB uniqueness, CommandExecutor allocation for single-node/batch writes, rollback protection, `GraphNode.kindOrdinal` row mapping, globally unique 1–3 letter labels with readiness-band metadata, projected-code parsing, selected-spec adapter resolution before `CommandExecutor`, code-only `commit_graph` / `read_graph` schemas, and code-primary prompt/tool rendering; remaining slice still needs deletion/supersession no-reuse coverage) | D54-L, D62-L; I1-L, I11-L | | I40-L | Accepted graph nodes and edges use only `basis ∈ explicit | implicit`; review-set approval and direct user statements produce `explicit`, `propose-graph` concept-level materialization produces `implicit`, and the mutation path is recoverable from `change_log` rather than from a persisted basis enum value such as `accepted_review_set`. | covered (`graph-tool-resilience` replaced the persisted basis enum with `explicit | implicit`, made `commitGraph` apply one batch approval basis to all created nodes/edges, made single-node `createNode` reject retired basis values before LSN/counter/node/change-log allocation, made `propose-graph` adapter commits implicit, made review-set translation explicit, rejected retired `accepted_review_set`, and records `change_log.operation` independently; `capture-response-to-graph` proves direct structured text responses commit explicit-basis graph nodes through `CommandExecutor`; `.fixtures/runs/project-graph-review-cycle/2026-06-06-project-graph-review-cycle/` proves full review-cycle approval creates explicit-basis graph truth) | D26-L, D27-L, D53-L, D63-L | | I41-L | Same-spec `supersession` edges form an acyclic directed graph; every edge-creation path validates proposed supersession edges together with existing supersession edges before committing. | covered (`command-executor/commit-graph-batch.test.ts` rejects existing-cycle closure, intra-batch cycles, and mixed existing+batch cycles through the shared dry-run/commit planner before batch writes; rejected cycles roll back or avoid batch nodes/edges/change_log; acyclic supersession commits remain covered by query/CommandExecutor success paths) | D51-L, D53-L; I34-L | diff --git a/src/.pi/agents/state.ts b/src/.pi/agents/state.ts index 40e6e8018..8a4dde3c1 100644 --- a/src/.pi/agents/state.ts +++ b/src/.pi/agents/state.ts @@ -2,6 +2,11 @@ import { fileURLToPath } from 'node:url'; import type { ReadinessGrade } from '../../graph/index.js'; import { + AUTO_EXCLUDED_STRATEGIES, + GOAL_MIN_GRADE, + LENS_MIN_GRADE, + STRATEGY_MIN_GRADE, + isGradeLegal, toolPolicyForRuntimeState, type ResolvedBrunchAgentState, } from '../../projections/session/runtime-policy.js'; @@ -48,36 +53,6 @@ export interface BrunchPostureToolPolicyInput { readinessGrade: ReadinessGrade; } -const GRADE_RANK: Record = { - grounding_onboarding: 0, - elicitation_ready: 1, - commitments_ready: 2, - planning_ready: 3, -}; - -const GOAL_MIN_GRADE: Record = { - 'grounding-advance': 'grounding_onboarding', - 'elicit-expand': 'elicitation_ready', - 'commit-converge': 'commitments_ready', - 'capture-posture': 'grounding_onboarding', -}; - -const STRATEGY_MIN_GRADE: Record = { - freestyle: 'grounding_onboarding', - 'step-wise-decision-tree': 'grounding_onboarding', - 'step-wise-disambiguate': 'grounding_onboarding', - 'propose-graph': 'elicitation_ready', - 'project-graph': 'commitments_ready', -}; - -const AUTO_EXCLUDED_STRATEGIES = new Set(['freestyle']); - -const LENS_MIN_GRADE: Record = { - intent: 'grounding_onboarding', - design: 'elicitation_ready', - oracle: 'elicitation_ready', -}; - const METHOD_MIN_GRADE: Record = { 'run-structured-exchange': 'grounding_onboarding', 'infer-and-capture': 'grounding_onboarding', @@ -332,14 +307,6 @@ function selectAxisResources({ return [resources[selection]]; } -function isGradeLegal( - id: TId, - readinessGrade: ReadinessGrade, - minGrades: Record, -): boolean { - return GRADE_RANK[readinessGrade] >= GRADE_RANK[minGrades[id]]; -} - function promptResourceLocation(family: PromptResourceFamily, id: string): string { const root = family === 'definitions' ? './agents' : './skills'; return fileURLToPath(new URL(`../${root}/${family}/${id}.md`, import.meta.url)); diff --git a/src/projections/session/affordances.test.ts b/src/projections/session/affordances.test.ts new file mode 100644 index 000000000..0947d11fe --- /dev/null +++ b/src/projections/session/affordances.test.ts @@ -0,0 +1,58 @@ +import { describe, expect, it } from 'vitest'; + +import { DEFAULT_BRUNCH_AGENT_STATE } from '../../session/runtime-state.js'; +import { affordances } from './affordances.js'; +import { resolveBrunchAgentState } from './runtime-state.js'; + +function resolved(overrides: Partial = {}) { + return resolveBrunchAgentState({ ...DEFAULT_BRUNCH_AGENT_STATE, ...overrides }); +} + +describe('runtime affordances derivation', () => { + it('reports legal options and default-on-switch values for every posture axis', () => { + expect(affordances(resolved(), 'commitments_ready')).toEqual({ + goal: { + selection: 'grounding-advance', + legalOptions: ['grounding-advance', 'elicit-expand', 'commit-converge', 'capture-posture'], + defaultOnSwitch: 'grounding-advance', + }, + strategy: { + selection: 'auto', + legalOptions: ['step-wise-decision-tree', 'step-wise-disambiguate', 'propose-graph', 'project-graph'], + defaultOnSwitch: 'auto', + }, + lens: { + selection: 'auto', + legalOptions: ['intent', 'design', 'oracle'], + defaultOnSwitch: 'auto', + }, + }); + }); + + it('excludes freestyle from AUTO strategy affordances but reports a pinned legal strategy', () => { + expect(affordances(resolved(), 'planning_ready').strategy.legalOptions).not.toContain('freestyle'); + + expect(affordances(resolved({ agentStrategy: 'freestyle' }), 'grounding_onboarding').strategy).toEqual({ + selection: 'freestyle', + legalOptions: ['freestyle', 'step-wise-decision-tree', 'step-wise-disambiguate'], + defaultOnSwitch: 'auto', + }); + }); + + it('uses readiness grade as a load-bearing legality input', () => { + const grounding = affordances(resolved(), 'grounding_onboarding'); + const elicitation = affordances(resolved(), 'elicitation_ready'); + const commitments = affordances(resolved(), 'commitments_ready'); + + expect(grounding.goal.legalOptions).toEqual(['grounding-advance', 'capture-posture']); + expect(grounding.strategy.legalOptions).toEqual(['step-wise-decision-tree', 'step-wise-disambiguate']); + expect(grounding.lens.legalOptions).toEqual(['intent']); + + expect(elicitation.goal.legalOptions).toContain('elicit-expand'); + expect(elicitation.strategy.legalOptions).toContain('propose-graph'); + expect(elicitation.lens.legalOptions).toEqual(['intent', 'design', 'oracle']); + + expect(commitments.goal.legalOptions).toContain('commit-converge'); + expect(commitments.strategy.legalOptions).toContain('project-graph'); + }); +}); diff --git a/src/projections/session/affordances.ts b/src/projections/session/affordances.ts new file mode 100644 index 000000000..e6e228891 --- /dev/null +++ b/src/projections/session/affordances.ts @@ -0,0 +1,51 @@ +import type { ReadinessGrade } from '../../graph/index.js'; +import type { + AgentGoalId, + AgentGoalSelection, + AgentLensId, + AgentLensSelection, + AgentStrategyId, + AgentStrategySelection, +} from '../../session/runtime-state.js'; +import { + axisOptionsForRuntimeState, + defaultGoalForRuntimeState, + defaultLensForRuntimeState, + defaultStrategyForRuntimeState, + type ResolvedBrunchAgentState, +} from './runtime-policy.js'; + +export interface AxisAffordance { + readonly selection: TSelection; + readonly legalOptions: readonly TId[]; + readonly defaultOnSwitch: TSelection; +} + +export interface RuntimeAffordances { + readonly goal: AxisAffordance; + readonly strategy: AxisAffordance; + readonly lens: AxisAffordance; +} + +export function affordances( + state: ResolvedBrunchAgentState, + readinessGrade: ReadinessGrade, +): RuntimeAffordances { + return { + goal: { + selection: state.agentGoal, + legalOptions: axisOptionsForRuntimeState('goal', state, readinessGrade), + defaultOnSwitch: defaultGoalForRuntimeState(state), + }, + strategy: { + selection: state.agentStrategy, + legalOptions: axisOptionsForRuntimeState('strategy', state, readinessGrade), + defaultOnSwitch: defaultStrategyForRuntimeState(state), + }, + lens: { + selection: state.agentLens, + legalOptions: axisOptionsForRuntimeState('lens', state, readinessGrade), + defaultOnSwitch: defaultLensForRuntimeState(state), + }, + }; +} diff --git a/src/projections/session/runtime-policy.ts b/src/projections/session/runtime-policy.ts index 15dc9dbf6..dec6ff9f8 100644 --- a/src/projections/session/runtime-policy.ts +++ b/src/projections/session/runtime-policy.ts @@ -1,3 +1,4 @@ +import type { ReadinessGrade } from '../../graph/index.js'; import type { AgentGoalId, AgentGoalSelection, @@ -86,6 +87,94 @@ export const TOOL_POLICY_DEFINITIONS: Record }, }; +export const GRADE_RANK: Record = { + grounding_onboarding: 0, + elicitation_ready: 1, + commitments_ready: 2, + planning_ready: 3, +}; + +export const GOAL_MIN_GRADE: Record = { + 'grounding-advance': 'grounding_onboarding', + 'elicit-expand': 'elicitation_ready', + 'commit-converge': 'commitments_ready', + 'capture-posture': 'grounding_onboarding', +}; + +export const STRATEGY_MIN_GRADE: Record = { + freestyle: 'grounding_onboarding', + 'step-wise-decision-tree': 'grounding_onboarding', + 'step-wise-disambiguate': 'grounding_onboarding', + 'propose-graph': 'elicitation_ready', + 'project-graph': 'commitments_ready', +}; + +export const AUTO_EXCLUDED_STRATEGIES = new Set(['freestyle']); + +export const LENS_MIN_GRADE: Record = { + intent: 'grounding_onboarding', + design: 'elicitation_ready', + oracle: 'elicitation_ready', +}; + +export type RuntimeAffordanceAxis = 'goal' | 'strategy' | 'lens'; + +export function isGradeLegal( + id: TId, + readinessGrade: ReadinessGrade, + minGrades: Record, +): boolean { + return GRADE_RANK[readinessGrade] >= GRADE_RANK[minGrades[id]]; +} + +export function axisOptionsForRuntimeState( + axis: 'goal', + state: ResolvedBrunchAgentState, + readinessGrade: ReadinessGrade, +): readonly AgentGoalId[]; +export function axisOptionsForRuntimeState( + axis: 'strategy', + state: ResolvedBrunchAgentState, + readinessGrade: ReadinessGrade, +): readonly AgentStrategyId[]; +export function axisOptionsForRuntimeState( + axis: 'lens', + state: ResolvedBrunchAgentState, + readinessGrade: ReadinessGrade, +): readonly AgentLensId[]; +export function axisOptionsForRuntimeState( + axis: RuntimeAffordanceAxis, + state: ResolvedBrunchAgentState, + readinessGrade: ReadinessGrade, +): readonly (AgentGoalId | AgentStrategyId | AgentLensId)[] { + if (axis === 'goal') { + return state.agentRoleDefinition.allowedGoals.filter((id) => + isGradeLegal(id, readinessGrade, GOAL_MIN_GRADE), + ); + } + if (axis === 'strategy') { + const legal = state.agentRoleDefinition.allowedStrategies.filter((id) => + isGradeLegal(id, readinessGrade, STRATEGY_MIN_GRADE), + ); + return state.agentStrategy === 'auto' ? legal.filter((id) => !AUTO_EXCLUDED_STRATEGIES.has(id)) : legal; + } + return state.agentRoleDefinition.allowedLenses.filter((id) => + isGradeLegal(id, readinessGrade, LENS_MIN_GRADE), + ); +} + +export function defaultGoalForRuntimeState(state: ResolvedBrunchAgentState): AgentGoalSelection { + return state.agentRoleDefinition.defaultGoal; +} + +export function defaultStrategyForRuntimeState(state: ResolvedBrunchAgentState): AgentStrategySelection { + return state.agentRoleDefinition.defaultStrategy; +} + +export function defaultLensForRuntimeState(state: ResolvedBrunchAgentState): AgentLensSelection { + return state.agentRoleDefinition.defaultLens; +} + export function toolPolicyForRuntimeState(state: ResolvedBrunchAgentState): ToolPolicyDefinition { return TOOL_POLICY_DEFINITIONS[state.operationalModeDefinition.toolPolicyId]; } diff --git a/src/session/README.md b/src/session/README.md index 5cbd1af48..ef4fb716c 100644 --- a/src/session/README.md +++ b/src/session/README.md @@ -39,6 +39,32 @@ plus the coordination logic for workspace/spec/session lifecycle. start, checks at `prepareNextTurn`, injects `worldUpdate` with optional context refresh when stale. +## Runtime affordance coverage ledger + +Runtime posture affordances are pure derivations over projected runtime state plus +spec readiness grade. `projections/session/affordances.ts` owns legal option sets +and default-on-switch values; `session.runtimeState` currently exposes only the +selected value per axis. Deferred means eligible or known but not currently +transported for that consumer. + +| Row | Canonical owner | Agent | RPC | Web | Reason for deferred | +| --- | --- | --- | --- | --- | --- | +| `goal.options` | `affordances.goal.legalOptions` | required | deferred | deferred | Transport follows a concrete UI/client need; agent already needs legality. | +| `goal.default_on_switch` | `affordances.goal.defaultOnSwitch` | required | deferred | deferred | Transport follows a concrete posture-switch surface. | +| `goal.selection` | `session.runtimeState.agent.goal` | required | required | deferred | RPC already reports current posture; web has no posture UI yet. | +| `strategy.options` | `affordances.strategy.legalOptions` | required | deferred | deferred | Transport follows a concrete UI/client need; AUTO excludes `freestyle`. | +| `strategy.default_on_switch` | `affordances.strategy.defaultOnSwitch` | required | deferred | deferred | Transport follows a concrete posture-switch surface. | +| `strategy.selection` | `session.runtimeState.agent.strategy` | required | required | deferred | RPC already reports current posture; web has no posture UI yet. | +| `lens.options` | `affordances.lens.legalOptions` | required | deferred | deferred | Transport follows a concrete UI/client need. | +| `lens.default_on_switch` | `affordances.lens.defaultOnSwitch` | required | deferred | deferred | Transport follows a concrete posture-switch surface. | +| `lens.selection` | `session.runtimeState.agent.lens` | required | required | deferred | RPC already reports current posture; web has no posture UI yet. | +| `active-review-set` | product-state-gated review-cycle surface | deferred | deferred | deferred | Needs current review-set product state; not derivable from runtime policy alone. | +| `turn-mode` | product-state-gated freestyle-vs-structured turn surface | deferred | deferred | deferred | Needs current turn/exchange mode state; not derivable from runtime policy alone. | + +`runtime-affordances-coverage.test.ts` guards the required subsets: agent rows +must remain covered by the shared derivation, RPC rows by the public session +schema, and the product-state-gated rows must stay explicit deferred tripwires. + ## Does NOT own - Graph state, CommandExecutor, graph queries — those live in `graph/`. diff --git a/src/session/runtime-affordances-coverage.test.ts b/src/session/runtime-affordances-coverage.test.ts new file mode 100644 index 000000000..bffb1dc69 --- /dev/null +++ b/src/session/runtime-affordances-coverage.test.ts @@ -0,0 +1,161 @@ +import { describe, expect, it } from 'vitest'; + +import { affordances } from '../projections/session/affordances.js'; +import { resolveBrunchAgentState } from '../projections/session/runtime-state.js'; +import { sessionRpcMethods } from '../rpc/methods/session.js'; +import { DEFAULT_BRUNCH_AGENT_STATE } from './runtime-state.js'; + +const runtimeAffordanceLedger = [ + { + row: 'goal.options', + owner: 'affordances.goal.legalOptions', + agent: 'required', + rpc: 'deferred', + web: 'deferred', + }, + { + row: 'goal.default_on_switch', + owner: 'affordances.goal.defaultOnSwitch', + agent: 'required', + rpc: 'deferred', + web: 'deferred', + }, + { + row: 'goal.selection', + owner: 'session.runtimeState.agent.goal', + agent: 'required', + rpc: 'required', + web: 'deferred', + }, + { + row: 'strategy.options', + owner: 'affordances.strategy.legalOptions', + agent: 'required', + rpc: 'deferred', + web: 'deferred', + }, + { + row: 'strategy.default_on_switch', + owner: 'affordances.strategy.defaultOnSwitch', + agent: 'required', + rpc: 'deferred', + web: 'deferred', + }, + { + row: 'strategy.selection', + owner: 'session.runtimeState.agent.strategy', + agent: 'required', + rpc: 'required', + web: 'deferred', + }, + { + row: 'lens.options', + owner: 'affordances.lens.legalOptions', + agent: 'required', + rpc: 'deferred', + web: 'deferred', + }, + { + row: 'lens.default_on_switch', + owner: 'affordances.lens.defaultOnSwitch', + agent: 'required', + rpc: 'deferred', + web: 'deferred', + }, + { + row: 'lens.selection', + owner: 'session.runtimeState.agent.lens', + agent: 'required', + rpc: 'required', + web: 'deferred', + }, + { + row: 'active-review-set', + owner: 'product-state-gated: review-cycle surface', + agent: 'deferred', + rpc: 'deferred', + web: 'deferred', + }, + { + row: 'turn-mode', + owner: 'product-state-gated: freestyle-vs-structured turn surface', + agent: 'deferred', + rpc: 'deferred', + web: 'deferred', + }, +] as const; + +type Consumer = 'agent' | 'rpc' | 'web'; + +function requiredRowsFor(consumer: Consumer): string[] { + return runtimeAffordanceLedger + .filter((row) => row[consumer] === 'required') + .map((row) => row.row) + .sort(); +} + +function runtimeStateSchemaAgentFields(): string[] { + const runtimeState = sessionRpcMethods.find((method) => method.method === 'session.runtimeState'); + if (!runtimeState) throw new Error('session.runtimeState RPC method is not registered.'); + const agentProperties = (runtimeState.resultSchema as any).properties.agent.properties; + return Object.keys(agentProperties) + .filter((field) => field === 'goal' || field === 'strategy' || field === 'lens') + .map((field) => `${field}.selection`) + .sort(); +} + +describe('runtime affordances coverage ledger', () => { + it('keeps the closed ledger focused on derived posture axes plus tripwired deferred rows', () => { + expect(runtimeAffordanceLedger.map((row) => row.row)).toEqual([ + 'goal.options', + 'goal.default_on_switch', + 'goal.selection', + 'strategy.options', + 'strategy.default_on_switch', + 'strategy.selection', + 'lens.options', + 'lens.default_on_switch', + 'lens.selection', + 'active-review-set', + 'turn-mode', + ]); + }); + + it('covers all agent-required rows through the shared affordances derivation', () => { + const derived = affordances(resolveBrunchAgentState(DEFAULT_BRUNCH_AGENT_STATE), 'commitments_ready'); + const derivedRows = Object.entries(derived).flatMap(([axis, axisAffordance]) => + Object.keys(axisAffordance).map((field) => + field === 'legalOptions' ? `${axis}.options` : `${axis}.default_on_switch`, + ), + ); + + expect(new Set(derivedRows)).toEqual( + new Set(requiredRowsFor('agent').filter((row) => !row.endsWith('.selection'))), + ); + }); + + it('keeps the required RPC affordance subset to current posture selections', () => { + expect(runtimeStateSchemaAgentFields()).toEqual(requiredRowsFor('rpc')); + }); + + it('keeps product-state-gated affordances deferred instead of certifying unbuilt state', () => { + expect( + runtimeAffordanceLedger.filter((row) => row.row === 'active-review-set' || row.row === 'turn-mode'), + ).toEqual([ + { + row: 'active-review-set', + owner: 'product-state-gated: review-cycle surface', + agent: 'deferred', + rpc: 'deferred', + web: 'deferred', + }, + { + row: 'turn-mode', + owner: 'product-state-gated: freestyle-vs-structured turn surface', + agent: 'deferred', + rpc: 'deferred', + web: 'deferred', + }, + ]); + }); +}); From 82918a5c0ac77a8850f840e9d578f690ee9f5ecc Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 11:48:34 +0200 Subject: [PATCH 16/17] Add capture quality spike probe --- .../extractions.json | 87 ++++ .../report.json | 141 ++++++ .../scenarios.json | 96 ++++ .../verdict.md | 9 + .../sample-llm-extractions.json | 87 ++++ memory/PLAN.md | 23 +- memory/SPEC.md | 2 +- .../cards/capture-quality--fitness-spike.md | 82 ---- src/probes/capture-quality-loop.test.ts | 195 ++++++++ src/probes/capture-quality-loop.ts | 421 ++++++++++++++++++ 10 files changed, 1049 insertions(+), 94 deletions(-) create mode 100644 .fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/extractions.json create mode 100644 .fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/report.json create mode 100644 .fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/scenarios.json create mode 100644 .fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/verdict.md create mode 100644 .fixtures/runs/capture-quality/sample-llm-extractions.json delete mode 100644 memory/cards/capture-quality--fitness-spike.md create mode 100644 src/probes/capture-quality-loop.test.ts create mode 100644 src/probes/capture-quality-loop.ts diff --git a/.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/extractions.json b/.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/extractions.json new file mode 100644 index 000000000..8a89fc624 --- /dev/null +++ b/.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/extractions.json @@ -0,0 +1,87 @@ +[ + { + "scenarioId": "free-prose-launch-goal", + "facts": [ + { + "expectedId": "workspace-for-solo-developers", + "kind": "context", + "title": "The product is for solo developers working in a local spec workspace.", + "confidence": "high", + "evidence": "local spec workspace for solo developers" + }, + { + "expectedId": "capture-goals-without-template", + "kind": "goal", + "title": "Capture project goals without forcing a rigid template.", + "confidence": "high", + "evidence": "should help capture project goals without forcing people into a rigid template" + }, + { + "expectedId": "new-contributor-explains-problem", + "kind": "criterion", + "title": "A new contributor can read the graph and explain what problem the project solves.", + "confidence": "high", + "evidence": "Success means a new contributor can read the graph and explain what problem the project solves." + } + ] + }, + { + "scenarioId": "file-ref-bearing-answer", + "facts": [ + { + "expectedId": "prd-is-product-frame", + "kind": "context", + "title": "docs/architecture/prd.md is the product frame for this answer.", + "confidence": "high", + "evidence": "Use docs/architecture/prd.md as the product frame." + }, + { + "expectedId": "graph-truth-sqlite-brunch", + "kind": "constraint", + "title": "Graph truth must stay in SQLite under .brunch.", + "confidence": "high", + "evidence": "The non-negotiable is that graph truth must stay in SQLite under .brunch" + }, + { + "expectedId": "jsonl-ok-if-replay-recovers-exchanges", + "kind": "criterion", + "title": "JSONL transcript evidence is acceptable only if replay recovers structured exchange results.", + "confidence": "high", + "evidence": "transcript evidence can remain JSONL as long as replay can recover the structured exchange results" + }, + { + "expectedId": "must-build-full-replay-engine-now", + "kind": "requirement", + "title": "Build a full replay engine immediately.", + "confidence": "low", + "evidence": "The answer names a replay condition, not an immediate build requirement." + } + ] + }, + { + "scenarioId": "implication-heavy-no-overcommit", + "facts": [ + { + "expectedId": "terminal-demo-preference-conditional", + "kind": "assumption", + "title": "The user may prefer the terminal view if the browser observer is confusing.", + "confidence": "low", + "evidence": "If the browser observer gets confusing, I might prefer the terminal view for the demo." + }, + { + "expectedId": "web-helpful-if-fast", + "kind": "criterion", + "title": "The web graph is helpful only if it keeps up quickly enough.", + "confidence": "high", + "evidence": "The web graph is helpful, but only if it keeps up quickly enough." + }, + { + "expectedId": "review-sets-in-poc", + "kind": "requirement", + "title": "Review sets belong in the POC story.", + "confidence": "low", + "evidence": "I have not decided whether review sets belong in the POC story." + } + ] + } +] diff --git a/.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/report.json b/.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/report.json new file mode 100644 index 000000000..63e39af36 --- /dev/null +++ b/.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/report.json @@ -0,0 +1,141 @@ +{ + "schemaVersion": 1, + "probeId": "capture-quality", + "runId": "2026-06-08-capture-quality-sample", + "generatedAt": "2026-06-08T09:44:38.831Z", + "cwd": "/Users/lunelson/Code/hashintel/brunch-next-chi", + "extractorName": "sample-llm-output", + "scenarioCount": 3, + "totals": { + "shouldCommitCount": 7, + "truePositiveCount": 7, + "missedShouldCommitCount": 0, + "falseCommitCount": 0, + "lowConfidenceImplicationCount": 3, + "precision": 1, + "recall": 1 + }, + "scenarioResults": [ + { + "scenarioId": "free-prose-launch-goal", + "label": "Free prose with explicit acceptance facts", + "category": "free_prose", + "shouldCommitCount": 3, + "truePositiveCount": 3, + "missedShouldCommit": [], + "falseCommitCount": 0, + "falseCommits": [], + "lowConfidenceImplicationCount": 0, + "extractedFacts": [ + { + "expectedId": "workspace-for-solo-developers", + "kind": "context", + "title": "The product is for solo developers working in a local spec workspace.", + "confidence": "high", + "evidence": "local spec workspace for solo developers" + }, + { + "expectedId": "capture-goals-without-template", + "kind": "goal", + "title": "Capture project goals without forcing a rigid template.", + "confidence": "high", + "evidence": "should help capture project goals without forcing people into a rigid template" + }, + { + "expectedId": "new-contributor-explains-problem", + "kind": "criterion", + "title": "A new contributor can read the graph and explain what problem the project solves.", + "confidence": "high", + "evidence": "Success means a new contributor can read the graph and explain what problem the project solves." + } + ] + }, + { + "scenarioId": "file-ref-bearing-answer", + "label": "Answer grounded in a referenced file", + "category": "file_ref", + "shouldCommitCount": 3, + "truePositiveCount": 3, + "missedShouldCommit": [], + "falseCommitCount": 0, + "falseCommits": [], + "lowConfidenceImplicationCount": 1, + "extractedFacts": [ + { + "expectedId": "prd-is-product-frame", + "kind": "context", + "title": "docs/architecture/prd.md is the product frame for this answer.", + "confidence": "high", + "evidence": "Use docs/architecture/prd.md as the product frame." + }, + { + "expectedId": "graph-truth-sqlite-brunch", + "kind": "constraint", + "title": "Graph truth must stay in SQLite under .brunch.", + "confidence": "high", + "evidence": "The non-negotiable is that graph truth must stay in SQLite under .brunch" + }, + { + "expectedId": "jsonl-ok-if-replay-recovers-exchanges", + "kind": "criterion", + "title": "JSONL transcript evidence is acceptable only if replay recovers structured exchange results.", + "confidence": "high", + "evidence": "transcript evidence can remain JSONL as long as replay can recover the structured exchange results" + }, + { + "expectedId": "must-build-full-replay-engine-now", + "kind": "requirement", + "title": "Build a full replay engine immediately.", + "confidence": "low", + "evidence": "The answer names a replay condition, not an immediate build requirement." + } + ] + }, + { + "scenarioId": "implication-heavy-no-overcommit", + "label": "Implication-heavy answer that should not over-commit", + "category": "implication_heavy", + "shouldCommitCount": 1, + "truePositiveCount": 1, + "missedShouldCommit": [], + "falseCommitCount": 0, + "falseCommits": [], + "lowConfidenceImplicationCount": 2, + "extractedFacts": [ + { + "expectedId": "terminal-demo-preference-conditional", + "kind": "assumption", + "title": "The user may prefer the terminal view if the browser observer is confusing.", + "confidence": "low", + "evidence": "If the browser observer gets confusing, I might prefer the terminal view for the demo." + }, + { + "expectedId": "web-helpful-if-fast", + "kind": "criterion", + "title": "The web graph is helpful only if it keeps up quickly enough.", + "confidence": "high", + "evidence": "The web graph is helpful, but only if it keeps up quickly enough." + }, + { + "expectedId": "review-sets-in-poc", + "kind": "requirement", + "title": "Review sets belong in the POC story.", + "confidence": "low", + "evidence": "I have not decided whether review sets belong in the POC story." + } + ] + } + ], + "verdict": { + "a22ConfidenceShift": "positive: high-confidence capture separated commit-worthy facts from implications", + "recommendation": "graduate", + "summary": "A22-L is fit to graduate into a narrow generalized-capture frontier, preserving an explicit false-commit guard." + }, + "artifacts": { + "runDir": "runs/capture-quality/2026-06-08-capture-quality-sample", + "scenariosJson": "runs/capture-quality/2026-06-08-capture-quality-sample/scenarios.json", + "extractionsJson": "runs/capture-quality/2026-06-08-capture-quality-sample/extractions.json", + "reportJson": "runs/capture-quality/2026-06-08-capture-quality-sample/report.json", + "verdictMarkdown": "runs/capture-quality/2026-06-08-capture-quality-sample/verdict.md" + } +} diff --git a/.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/scenarios.json b/.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/scenarios.json new file mode 100644 index 000000000..eb96be24e --- /dev/null +++ b/.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/scenarios.json @@ -0,0 +1,96 @@ +[ + { + "id": "free-prose-launch-goal", + "label": "Free prose with explicit acceptance facts", + "category": "free_prose", + "input": "We are building a local spec workspace for solo developers. The first useful outcome is that it should help capture project goals without forcing people into a rigid template. Success means a new contributor can read the graph and explain what problem the project solves.", + "expectedFacts": [ + { + "id": "workspace-for-solo-developers", + "kind": "context", + "title": "The product is for solo developers working in a local spec workspace.", + "shouldCommit": true, + "rationale": "Direct statement of audience and workspace setting." + }, + { + "id": "capture-goals-without-template", + "kind": "goal", + "title": "Capture project goals without forcing a rigid template.", + "shouldCommit": true, + "rationale": "Directly stated useful outcome." + }, + { + "id": "new-contributor-explains-problem", + "kind": "criterion", + "title": "A new contributor can read the graph and explain the problem solved.", + "shouldCommit": true, + "rationale": "Explicit success criterion." + } + ] + }, + { + "id": "file-ref-bearing-answer", + "label": "Answer grounded in a referenced file", + "category": "file_ref", + "input": "Use docs/architecture/prd.md as the product frame. The non-negotiable is that graph truth must stay in SQLite under .brunch, while transcript evidence can remain JSONL as long as replay can recover the structured exchange results.", + "expectedFacts": [ + { + "id": "prd-is-product-frame", + "kind": "context", + "title": "docs/architecture/prd.md is the product frame for this answer.", + "shouldCommit": true, + "rationale": "Direct source/reference grounding." + }, + { + "id": "graph-truth-sqlite-brunch", + "kind": "constraint", + "title": "Graph truth must stay in SQLite under .brunch.", + "shouldCommit": true, + "rationale": "Directly labeled as non-negotiable." + }, + { + "id": "jsonl-ok-if-replay-recovers-exchanges", + "kind": "criterion", + "title": "JSONL transcript evidence is acceptable only if replay recovers structured exchange results.", + "shouldCommit": true, + "rationale": "Explicit conditional acceptance criterion." + }, + { + "id": "must-build-full-replay-engine-now", + "kind": "requirement", + "title": "Build a full replay engine immediately.", + "shouldCommit": false, + "rationale": "This is an implication beyond the stated condition." + } + ] + }, + { + "id": "implication-heavy-no-overcommit", + "label": "Implication-heavy answer that should not over-commit", + "category": "implication_heavy", + "input": "If the browser observer gets confusing, I might prefer the terminal view for the demo. The web graph is helpful, but only if it keeps up quickly enough. I have not decided whether review sets belong in the POC story.", + "expectedFacts": [ + { + "id": "terminal-demo-preference-conditional", + "kind": "assumption", + "title": "The user may prefer the terminal view if the browser observer is confusing.", + "shouldCommit": false, + "rationale": "Conditional preference, not settled graph truth." + }, + { + "id": "web-helpful-if-fast", + "kind": "criterion", + "title": "The web graph is helpful only if it keeps up quickly enough.", + "shouldCommit": true, + "rationale": "Clear acceptance condition for web observer usefulness." + }, + { + "id": "review-sets-in-poc", + "kind": "requirement", + "title": "Review sets belong in the POC story.", + "shouldCommit": false, + "rationale": "Explicitly undecided; should stay out of graph truth." + } + ] + } +] diff --git a/.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/verdict.md b/.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/verdict.md new file mode 100644 index 000000000..92c3f8ad2 --- /dev/null +++ b/.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/verdict.md @@ -0,0 +1,9 @@ +# Capture-quality verdict + +- A22-L confidence shift: positive: high-confidence capture separated commit-worthy facts from implications +- Recommendation: graduate +- Precision: 1 +- Recall: 1 +- False commits: 0 + +A22-L is fit to graduate into a narrow generalized-capture frontier, preserving an explicit false-commit guard. diff --git a/.fixtures/runs/capture-quality/sample-llm-extractions.json b/.fixtures/runs/capture-quality/sample-llm-extractions.json new file mode 100644 index 000000000..8a89fc624 --- /dev/null +++ b/.fixtures/runs/capture-quality/sample-llm-extractions.json @@ -0,0 +1,87 @@ +[ + { + "scenarioId": "free-prose-launch-goal", + "facts": [ + { + "expectedId": "workspace-for-solo-developers", + "kind": "context", + "title": "The product is for solo developers working in a local spec workspace.", + "confidence": "high", + "evidence": "local spec workspace for solo developers" + }, + { + "expectedId": "capture-goals-without-template", + "kind": "goal", + "title": "Capture project goals without forcing a rigid template.", + "confidence": "high", + "evidence": "should help capture project goals without forcing people into a rigid template" + }, + { + "expectedId": "new-contributor-explains-problem", + "kind": "criterion", + "title": "A new contributor can read the graph and explain what problem the project solves.", + "confidence": "high", + "evidence": "Success means a new contributor can read the graph and explain what problem the project solves." + } + ] + }, + { + "scenarioId": "file-ref-bearing-answer", + "facts": [ + { + "expectedId": "prd-is-product-frame", + "kind": "context", + "title": "docs/architecture/prd.md is the product frame for this answer.", + "confidence": "high", + "evidence": "Use docs/architecture/prd.md as the product frame." + }, + { + "expectedId": "graph-truth-sqlite-brunch", + "kind": "constraint", + "title": "Graph truth must stay in SQLite under .brunch.", + "confidence": "high", + "evidence": "The non-negotiable is that graph truth must stay in SQLite under .brunch" + }, + { + "expectedId": "jsonl-ok-if-replay-recovers-exchanges", + "kind": "criterion", + "title": "JSONL transcript evidence is acceptable only if replay recovers structured exchange results.", + "confidence": "high", + "evidence": "transcript evidence can remain JSONL as long as replay can recover the structured exchange results" + }, + { + "expectedId": "must-build-full-replay-engine-now", + "kind": "requirement", + "title": "Build a full replay engine immediately.", + "confidence": "low", + "evidence": "The answer names a replay condition, not an immediate build requirement." + } + ] + }, + { + "scenarioId": "implication-heavy-no-overcommit", + "facts": [ + { + "expectedId": "terminal-demo-preference-conditional", + "kind": "assumption", + "title": "The user may prefer the terminal view if the browser observer is confusing.", + "confidence": "low", + "evidence": "If the browser observer gets confusing, I might prefer the terminal view for the demo." + }, + { + "expectedId": "web-helpful-if-fast", + "kind": "criterion", + "title": "The web graph is helpful only if it keeps up quickly enough.", + "confidence": "high", + "evidence": "The web graph is helpful, but only if it keeps up quickly enough." + }, + { + "expectedId": "review-sets-in-poc", + "kind": "requirement", + "title": "Review sets belong in the POC story.", + "confidence": "low", + "evidence": "I have not decided whether review sets belong in the POC story." + } + ] + } +] diff --git a/memory/PLAN.md b/memory/PLAN.md index 49e9009e4..bc2b1790b 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -53,7 +53,6 @@ The remaining coverage frontiers are being deliberately de-fogged rather than le ### Horizon -- `exchanges-and-generalized-capture` — exchange topology is enumerable now, but generalized-capture breadth is gated on the `capture-quality-spike` evidence (above), not on waiting; graduates to a coverage frontier once the spike closes the inventory honestly. - `turn-boundary-reconciliation` — M7; graph revisions, `worldUpdate`, mention staleness, side-task/reviewer drains. - `coherence-first-class` — M8; bounded coherence verdicts backed by reconciliation needs. - `compaction-and-conflict-widening` — M9; long-horizon continuity through compaction. @@ -232,17 +231,17 @@ The remaining coverage frontiers are being deliberately de-fogged rather than le - **Name:** Exchange surface and generalized capture inventory - **Linear:** unassigned - **Kind:** structural -- **Status:** horizon +- **Status:** next - **Certainty:** proving -- **Blocked by:** An honest, closeable exchange/capture inventory. The forcing function is now named and active: `capture-quality-spike` (`memory/cards/capture-quality--fitness-spike.md`) must produce A22-L fitness evidence over free text/files/refs before this graduates. This is evidence-gated, not wait-gated; do not start the breadth frontier while it still depends on deleted-stub symmetry or speculative breadth. +- **Unblocked by:** `capture-quality-spike` (2026-06-08) measured fixed free-prose, file/ref-bearing, and implication-heavy scenarios, reached precision 1.0 / recall 1.0 with zero false commits in the sample extraction report, and recommended graduating a narrow generalized-capture frontier with an explicit false-commit guard. - **Stabilizes:** The ownership split between `.pi/extensions/exchanges`, `projections/exchanges`, `renderers/exchanges`, and `session/structured-exchange-loop.ts`. -- **Objective:** Revisit richer exchange payload families and generalized capture breadth only after the surviving surface is clear enough to enumerate. -- **Why now / unlocks:** Recording this frontier here prevents the deleted `capture-*` topology from silently regrowing while preserving the likely future concern once capture breadth becomes honest. +- **Objective:** Enumerate the surviving exchange/capture families and scope generalized capture narrowly around high-confidence extractive facts; keep implication-heavy material out of graph truth unless a later slice proves a safe commitment path. +- **Why now / unlocks:** The capture-quality spike closed the evidence gate enough to scope the next inventory. The frontier should still start with enumeration and false-commit protection rather than regrowing deleted `capture-*` topology or broad LLM commitment behavior. - **Acceptance:** - - Work does not start until the surviving exchange/capture families can be enumerated with required vs deferred marking. + - The surviving exchange/capture families are enumerated with required vs deferred marking. - Reusable exchange details justify `projections/exchanges`; single-owner reads or orchestration state stay in their owning domains. - - Capture beyond directly labeled facts is driven by real evidence, not symmetry with removed stubs. -- **Verification:** Likely probe-backed transcript and capture read-back oracles rather than purely unit tests; define when the frontier is actually scoped. + - Capture beyond directly labeled facts starts with high-confidence extractive facts and carries an explicit false-commit oracle for implication-heavy text. +- **Verification:** Probe-backed transcript and capture read-back oracles; include the capture-quality false-commit scenario family as a regression guard. - **Cross-cutting obligations:** Keep `renderers/exchanges` for durable markdown/text/toon only, keep TUI presenters local, and do not reintroduce `snapshot` as an architecture noun. - **Traceability:** D27-L, D65-L, D66-L. - **Design docs:** `memory/SPEC.md` D65-L/D66-L; `src/projections/README.md`; `src/renderers/README.md`. @@ -300,6 +299,8 @@ The remaining coverage frontiers are being deliberately de-fogged rather than le - **Design docs:** `.fixtures/seeds/bilal-port/README.md`; `docs/design/GRAPH_MODEL.md`; `docs/praxis/manual-testing.md`. ## Recently Completed +- 2026-06-08 `capture-quality-spike` — Done: added `src/probes/capture-quality-loop.ts` and a deterministic report test over free-prose, file/ref-bearing, and implication-heavy capture scenarios. The run artifact `.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/` records precision 1.0 / recall 1.0 with zero false commits from the sample extraction set and recommends graduating `exchanges-and-generalized-capture` narrowly, preserving a false-commit oracle for implication-heavy text. Verified: `src/probes/capture-quality-loop.test.ts` and `npm run verify`. + - 2026-06-08 `minimal-authority-shell` (FE-810) — Done: added the authority-matrix guard test over the current POC authority seam. The guard locks `CommandExecutor` mutation-result discriminants as the graph outcome vocabulary, proves `needs_human` is structured data rather than a TUI-only dialog, and asserts `elicit` tool authority comes from the shared projected runtime policy while blocking the identified side-effecting tools (`bash`, `edit`, `write`). No new authority service; `src/.pi/agents/state.ts` untouched; A18-L strict built-in suppression remains accepted Pi-upstream/API residue. Verified: `src/.pi/extensions/runtime/authority-matrix.test.ts` and `npm run verify`. - 2026-06-08 cross-cut prompt-resource body-depth pass (Seam 3a/3b) — Done (1ca02e38): deepened every thin `src/.pi/skills/{goals,strategies,lenses,methods}` body to carry its per-axis facet guidance (goals→D59-L, strategies/lenses→README+D25-L, methods→D58-L tool-routing role), and added a manifest-wide readability/depth test in `src/.pi/agents/compose.test.ts` asserting every `{GOAL,STRATEGY,LENS,METHOD}_RESOURCES` location resolves and clears a ≥700-char floor. `state.ts` untouched. This closed the prompt-resource body-depth row, but the cross-cut is **not** exhausted: its Seam 3a `"what to ask next" driver` row (`partial · ●`) remains the last required row, now promoted to the `elicitation-driver` frontier. Verified: `npm run verify` (551 tests, build). @@ -329,7 +330,7 @@ nodes: graph-observed-shapes [done · proving] ratified consumer-specific observed-shape ledger + drift guard; no transport shape shipped runtime-affordances-and-legality [next · proving] buildable-now affordance(resolvedState) coverage ledger; review-set/turn-mode rows tripwired elicitation-driver [next · proving] live per-turn what-to-ask-next driver on FE-823 substrate; closes cross-cut Seam 3a - capture-quality-spike [next · spike] A22-L fitness evidence to graduate exchanges-and-generalized-capture + capture-quality-spike [done · spike] A22-L fitness evidence graduated a narrow exchanges-and-generalized-capture scope probes-and-transcripts-evolution [parallel] continuous evidence substrate topology-readmes-and-boundaries [parallel] attach-to-frontier topology hardening dev-seed-fixtures [parallel] rich seed data substrate for dev/observer testing @@ -362,10 +363,10 @@ horizon: notes: - `elicitation-backlog` was the promoted D65-L *substrate* row from `memory/CROSS_CUT_PLAN.md`; the prompt-resource body-depth pass landed in 1ca02e38. The cross-cut is **not** exhausted: its Seam 3a `"what to ask next" driver` row is still `partial · ●`, which by the seam DoD keeps the seam open. That row is now disposed as the `elicitation-driver` frontier (not residue), so the remaining cross-cut obligation has a named owner in `PLAN.md`. - - Parallel worktree streams (2026-06-08): all three landed — (A) `crosscut-know--resource-body-depth` (1ca02e38), (B) `graph-observed-shapes--coverage-ledger` (85e73ba7), (C) `minimal-authority-shell--audit-and-guard` (68474e3f); each kept to its declared write paths and left `src/.pi/agents/state.ts` untouched, so the parallel run produced no collisions. `poc-live-ship-gate` is now unblocked (its hard dependency `minimal-authority-shell` is done). The next coverage frontiers are de-fogged rather than parked: `runtime-affordances-and-legality` (buildable-now ledger) and `elicitation-driver` (buildable-now on the FE-823 substrate) are cold-startable worktree streams; `capture-quality-spike` is an evidence spike that gates `exchanges-and-generalized-capture`. + - Parallel worktree streams (2026-06-08): all three landed — (A) `crosscut-know--resource-body-depth` (1ca02e38), (B) `graph-observed-shapes--coverage-ledger` (85e73ba7), (C) `minimal-authority-shell--audit-and-guard` (68474e3f); each kept to its declared write paths and left `src/.pi/agents/state.ts` untouched, so the parallel run produced no collisions. `poc-live-ship-gate` is now unblocked (its hard dependency `minimal-authority-shell` is done). The next coverage frontiers are de-fogged rather than parked: `runtime-affordances-and-legality` (buildable-now ledger), `elicitation-driver` (buildable-now on the FE-823 substrate), and the now-graduated narrow `exchanges-and-generalized-capture` inventory are cold-startable worktree streams. - Completed prerequisites: `agents-composition-layer` supplies runtime prompt/resource posture, and `live-graph-observer` supplies the read-only web observer path expected by `capture-response-to-graph` and `poc-live-ship-gate`. - `graph-observed-shapes` is intentionally consumer-specific: do not assume every agent read shape belongs on the web observer. - - `exchanges-and-generalized-capture` stays deferred until the surviving inventory is honest enough to close; do not regrow deleted `capture-*` symmetry in the meantime. + - `exchanges-and-generalized-capture` is now graduated only narrowly: scope high-confidence extractive capture with a false-commit guard, and do not regrow deleted `capture-*` symmetry. - `project-graph-review-cycle` is complete evidence for the optional batch proposal/review story; keep future review-quality work as follow-up, not FE-809 completion debt. - `topology-readmes-and-boundaries` is not a license for abstract cleanup; it rides with concrete delivery seams. - Multi-spec workspace discipline applies throughout: target the selected/current spec explicitly; no workspace-global graph truth in the POC. diff --git a/memory/SPEC.md b/memory/SPEC.md index bb98d95c3..d59b43dad 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -117,7 +117,7 @@ The POC's purpose is to prove three things: (a) that pi's coding-agent harness c | A19-L | Pi's current settings/resource lifecycle can be made product-safe through a sealed Brunch Pi Profile without forking Pi: ambient discovery remains disabled, Brunch-owned extension factories may inject explicit resources, and remaining settings/keybinding leakage can be eliminated through programmatic policy or a narrow upstream seam. | medium | open | D39-L | FE-744/profile audit: source-backed resource-loader/settings audit, tests proving no ambient `.pi/` skills/prompts/themes/extensions/context files affect Brunch, and product-owned resources still load when intentionally injected. | | A20-L | The chosen Drizzle line and row-schema derivation path can be settled during the prep envelope without forcing later M4 rework: Brunch can prove migrations, SQLite fidelity, monotonic counter allocation, change-log writes, and runtime-schema derivation on one representative persistence slice before CRUD proper starts. | high | **validated** | D16-L, D41-L | **Validated by A20-L spike (2026-06-01).** Stack: `drizzle-orm@0.45.2` + `drizzle-kit@0.31.10` + `better-sqlite3@12.8.0` + `drizzle-typebox@0.3.3` + `@sinclair/typebox@0.34.14`. Proved: (1) `drizzle-typebox` derives valid TypeBox insert/select schemas from Drizzle tables; `Value.Check` validates/rejects correctly. (2) Batch `commitGraph`-shaped transaction (multi-node → intra-batch ref resolution → multi-edge → LSN allocation → change-log append) works atomically; full rollback on FK violation or domain-validation throw. (3) `update().returning()` works for atomic monotonic counter increment; `insert().returning()` gives auto-increment IDs for ref resolution; JSON detail column round-trips cleanly. (4) Pi tool parameters (`typebox` v1.x) and Drizzle row schemas (`@sinclair/typebox` v0.34 via `drizzle-typebox`) serve different roles and never cross — shared enum `const` arrays bridge both. | | A21-L | The POC can treat coherence as a bounded product verdict over structural legality plus explicitly detected contradictions, gaps, and unresolved reconciliation needs, without solving a general theory of “spec coherence.” | low | open | D8-L | M8 must sharpen the coherence rubric before implementation: known-bad adversarial briefs should show what counts as incoherent, what is merely immature/underspecified, and what should become a reconciliation need. | -| A22-L | The elicitor can perform synchronous post-exchange capture well enough for the POC: high-confidence extractive facts and readiness-grade updates can be committed immediately, while low-confidence implications can be kept out of graph truth and used as disambiguation material. | medium | partially validated | D18-L, D26-L, D45-L, I30-L | 2026-06-05 `capture-response-to-graph` validated the product wiring for narrow labeled text facts (`Goal:`, `Context:`, `Constraint:`, `Criterion:`) on `session.submitExchangeResponse`. 2026-06-07 generalized the same explicit-text capture core onto `session.submitMessage`: ordinary labeled user text now appends to transcript truth, commits through `graph/capture` → `CommandExecutor.commitGraph({basis: explicit})`, targets the transcript binding's spec, and publishes graph invalidations; explicit interruptions are transcript-visible but do not capture or silently answer a pending exchange. Broader LLM capture quality and readiness-grade updates remain fitness evidence. | +| A22-L | The elicitor can perform synchronous post-exchange capture well enough for the POC: high-confidence extractive facts and readiness-grade updates can be committed immediately, while low-confidence implications can be kept out of graph truth and used as disambiguation material. | medium | partially validated | D18-L, D26-L, D45-L, I30-L | 2026-06-05 `capture-response-to-graph` validated the product wiring for narrow labeled text facts (`Goal:`, `Context:`, `Constraint:`, `Criterion:`) on `session.submitExchangeResponse`. 2026-06-07 generalized the same explicit-text capture core onto `session.submitMessage`: ordinary labeled user text now appends to transcript truth, commits through `graph/capture` → `CommandExecutor.commitGraph({basis: explicit})`, targets the transcript binding's spec, and publishes graph invalidations; explicit interruptions are transcript-visible but do not capture or silently answer a pending exchange. 2026-06-08 `capture-quality-spike` added a fixed scenario measurement over free prose, file/ref-bearing prose, and implication-heavy prose; the sample extraction report reached precision 1.0 / recall 1.0 with zero false commits, moving generalized capture from parked evidence-gate to a narrow graduate recommendation with an explicit false-commit guard. Readiness-grade capture remains open fitness evidence. | | A24-L | A flat `elicitation_backlog` table (prospective memory) is sufficient to drive elicitor questioning and seed grounding without graph structure — no `unknown` plane/node and no unknown→unknown edges; apparent dependency among open questions is mediated by the claims their resolution produces. | medium | partially validated | D65-L | 2026-06-08 FE-823 materialized the flat table, `createSpec` seed set, `CommandExecutor` create/close mutations, and graph-owned per-spec read-back on the real LSN/change-log seam. Remaining proof is the live per-turn driver plus capture-reflection across elicitation fixtures; if genuine unknown→unknown dependency or rich traversal emerges, promote the table to a plane (rows→nodes, FK pointers→edges). | ### Active Decisions diff --git a/memory/cards/capture-quality--fitness-spike.md b/memory/cards/capture-quality--fitness-spike.md deleted file mode 100644 index 0a57c32de..000000000 --- a/memory/cards/capture-quality--fitness-spike.md +++ /dev/null @@ -1,82 +0,0 @@ -# Capture-quality fitness spike - -Frontier: capture-quality-spike (gates exchanges-and-generalized-capture) -Status: active -Mode: single -Created: 2026-06-08 - -## Orientation - -- **Containing seam:** post-exchange / ordinary-message capture. The production path commits only **directly-labeled** high-confidence facts today: `captureExplicitTextFacts` in `src/graph/capture/structured-response.ts` accepts `Goal:`/`Context:`/`Constraint:`/`Criterion:` lines and routes them through `CommandExecutor.commitGraph({basis: explicit})` (wired on `session.submitExchangeResponse` and `session.submitMessage`). Capture beyond labeled facts is unbuilt. -- **Relevant frontier item:** this spike is the **named forcing function** for the horizon frontier `exchanges-and-generalized-capture`. That frontier is *evidence-gated, not wait-gated* (PLAN.md): it cannot graduate until we have real measurement of capture fitness over free text/files/refs. The output of this card is **knowledge + evidence artifacts**, not production capture code. -- **Volatile handoff state:** no `HANDOFF.md`. The `capture-*` projector/renderer stubs were deliberately deleted in the snapshot migration (35eff395) precisely because the capture inventory was not honest yet; **do not** recreate them. The probe precedent is `src/probes/fixture-curation-loop.ts` (an LLM-driven measurement probe that emits report artifacts under `.fixtures/runs/`). -- **Main open risk:** the spike quietly turning into production capture work — adding LLM extraction into `src/graph/capture/` or materializing broad runtime/product seams. It must stay throwaway: measure fitness, record a confidence shift on A22-L, and recommend whether/how the frontier graduates. - -Posture: proving (this is a spike; output is evidence and a confidence shift, not a tracer). - -## Light scope card (spike) - -### Objective - -Produce real evidence of how reliably an LLM-driven capture step can extract high-confidence graph facts from free prose / files / refs **beyond** directly-labeled lines, so `exchanges-and-generalized-capture` can graduate (or stay parked) on measurement rather than guesswork. - -### Acceptance Criteria - -``` -✓ A spike probe under src/probes/ runs a capture-quality measurement over a small fixed scenario set - (free-prose answers, file/ref-bearing answers, implication-heavy answers) and emits a report artifact - under .fixtures/runs/capture-quality/ with per-scenario extraction vs expected-fact comparison. -✓ The report quantifies fitness against the A22-L split: high-confidence facts that SHOULD commit vs - low-confidence implications that should STAY OUT of graph truth (precision/recall or false-commit count). -✓ A short verdict is written (in the run artifact and/or a spike note) recording the confidence shift on - A22-L and a concrete recommendation: graduate the frontier, narrow it, or keep it parked with the next gate. -✓ No production capture behavior changes: src/graph/capture/ logic is not extended, and no capture-* - projector/renderer stubs are reintroduced. -``` - -### Verification Approach - -``` -- Inner: a deterministic harness test (like src/probes/fixture-curation-loop.test.ts) that proves the - probe's report/summarization mechanics WITHOUT requiring a live LLM (fixture-fed transcript in → summary out). -- Outer: the real LLM measurement run, recorded as artifacts under .fixtures/runs/capture-quality/ - (mixed-basis output stays in runs/, never registered as a reusable seed). -``` - -### Cross-cutting obligations - -``` -- Throwaway investigation: knowledge + evidence, not production capture code. -- Do not regrow deleted capture-* topology; do not reintroduce `snapshot` as an architecture noun. -- Any commit the probe demonstrates still routes through CommandExecutor with basis: explicit (D63-L); - the probe must not invent a side channel into graph truth. -- Keep src/renderers/ for durable text only; measurement output is run-artifact data, not a renderer. -``` - -### Assumption dependency - -Depends on: A22-L (capture is "partially validated" — labeled facts proven; broader fitness explicitly open). This spike exists precisely to move A22-L's evidence; building against it is sound because the spike's job is to test it, not assume it. - -### Expected touched paths (tentative) - -```pseudo -src/probes/ -├── capture-quality-loop.ts + # LLM measurement probe + report summarizer -└── capture-quality-loop.test.ts + # deterministic harness mechanics (no live LLM) - -.fixtures/runs/capture-quality/ + # real-run evidence artifacts (transcript, extraction, verdict) - -memory/SPEC.md ? # update A22-L evidence/status after the verdict (reconciliation) -``` - -### Promotion checklist - -- [ ] Does this change a requirement? -- [ ] Does this create, retire, or invalidate an assumption? — *expected:* it will shift A22-L evidence; reconcile SPEC after the verdict. -- [ ] Does this slice depend on an unvalidated high-impact assumption? — it tests one; that is the point of a spike. -- [ ] Does this make or reverse a non-trivial design decision? -- [ ] Does this establish a new seam-level invariant? -- [ ] Does this change a frontier-level cross-cutting obligation or verification architecture layer? -- [ ] Does it cross more than two major seams? -- [ ] Is this the first touch in an unfamiliar seam from a fresh thread? -- [ ] Can you not name the containing seam or current rationale from the live docs? diff --git a/src/probes/capture-quality-loop.test.ts b/src/probes/capture-quality-loop.test.ts new file mode 100644 index 000000000..e72fcdc7d --- /dev/null +++ b/src/probes/capture-quality-loop.test.ts @@ -0,0 +1,195 @@ +import { mkdtemp, readFile } from 'node:fs/promises'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; + +import { describe, expect, it } from 'vitest'; + +import { + CAPTURE_QUALITY_SCENARIOS, + summarizeCaptureQualityRun, + writeCaptureQualityArtifacts, + type CaptureQualityScenarioExtraction, +} from './capture-quality-loop.js'; + +const goodExtractions: CaptureQualityScenarioExtraction[] = [ + { + scenarioId: 'free-prose-launch-goal', + facts: [ + { + expectedId: 'workspace-for-solo-developers', + kind: 'context', + title: 'The product is for solo developers working in a local spec workspace.', + confidence: 'high', + evidence: 'for solo developers / local spec workspace', + }, + { + expectedId: 'capture-goals-without-template', + kind: 'goal', + title: 'Capture project goals without forcing a rigid template.', + confidence: 'high', + evidence: 'should help capture project goals without forcing people into a rigid template', + }, + { + expectedId: 'new-contributor-explains-problem', + kind: 'criterion', + title: 'A new contributor can read the graph and explain the problem solved.', + confidence: 'high', + evidence: 'Success means...', + }, + ], + }, + { + scenarioId: 'file-ref-bearing-answer', + facts: [ + { + expectedId: 'prd-is-product-frame', + kind: 'context', + title: 'docs/architecture/prd.md is the product frame for this answer.', + confidence: 'high', + evidence: 'Use docs/architecture/prd.md as the product frame.', + }, + { + expectedId: 'graph-truth-sqlite-brunch', + kind: 'constraint', + title: 'Graph truth must stay in SQLite under .brunch.', + confidence: 'high', + evidence: 'The non-negotiable is...', + }, + { + expectedId: 'jsonl-ok-if-replay-recovers-exchanges', + kind: 'criterion', + title: 'JSONL transcript evidence is acceptable only if replay recovers structured exchange results.', + confidence: 'high', + evidence: 'can remain JSONL as long as...', + }, + { + expectedId: 'must-build-full-replay-engine-now', + kind: 'requirement', + title: 'Build a full replay engine immediately.', + confidence: 'low', + evidence: 'Not directly stated; only a possible implication.', + }, + ], + }, + { + scenarioId: 'implication-heavy-no-overcommit', + facts: [ + { + expectedId: 'terminal-demo-preference-conditional', + kind: 'assumption', + title: 'The user may prefer the terminal view if the browser observer is confusing.', + confidence: 'low', + evidence: 'If the browser observer gets confusing, I might prefer...', + }, + { + expectedId: 'web-helpful-if-fast', + kind: 'criterion', + title: 'The web graph is helpful only if it keeps up quickly enough.', + confidence: 'high', + evidence: 'only if it keeps up quickly enough', + }, + { + expectedId: 'review-sets-in-poc', + kind: 'requirement', + title: 'Review sets belong in the POC story.', + confidence: 'low', + evidence: 'I have not decided...', + }, + ], + }, +]; + +describe('capture quality report', () => { + it('quantifies high-confidence recall while keeping low-confidence implications out of graph truth', () => { + const report = summarizeCaptureQualityRun({ + runId: 'capture-quality-test', + generatedAt: '2026-06-08T00:00:00.000Z', + cwd: '/tmp/capture-quality-test', + extractorName: 'fixture-fed', + scenarios: CAPTURE_QUALITY_SCENARIOS, + extractions: goodExtractions, + }); + + expect(report.totals).toMatchObject({ + shouldCommitCount: 7, + truePositiveCount: 7, + missedShouldCommitCount: 0, + falseCommitCount: 0, + lowConfidenceImplicationCount: 3, + precision: 1, + recall: 1, + }); + expect(report.verdict.recommendation).toBe('graduate'); + }); + + it('fails the verdict when a low-confidence implication is marked high-confidence', () => { + const report = summarizeCaptureQualityRun({ + runId: 'capture-quality-test', + generatedAt: '2026-06-08T00:00:00.000Z', + cwd: '/tmp/capture-quality-test', + extractorName: 'fixture-fed', + scenarios: CAPTURE_QUALITY_SCENARIOS, + extractions: [ + ...goodExtractions.slice(0, 2), + { + scenarioId: 'implication-heavy-no-overcommit', + facts: [ + { + expectedId: 'web-helpful-if-fast', + kind: 'criterion', + title: 'The web graph is helpful only if it keeps up quickly enough.', + confidence: 'high', + evidence: 'only if it keeps up quickly enough', + }, + { + expectedId: 'review-sets-in-poc', + kind: 'requirement', + title: 'Review sets belong in the POC story.', + confidence: 'high', + evidence: 'I have not decided whether review sets belong in the POC story.', + }, + ], + }, + ], + }); + + expect(report.totals.falseCommitCount).toBe(1); + expect(report.verdict).toMatchObject({ + recommendation: 'keep_parked', + a22ConfidenceShift: expect.stringContaining('negative'), + }); + }); + + it('writes portable scenario, extraction, report, and verdict artifacts', async () => { + const fixtureRoot = await mkdtemp(join(tmpdir(), 'brunch-capture-quality-artifacts-')); + const report = summarizeCaptureQualityRun({ + runId: 'capture-quality-test', + generatedAt: '2026-06-08T00:00:00.000Z', + cwd: fixtureRoot, + extractorName: 'fixture-fed', + scenarios: CAPTURE_QUALITY_SCENARIOS, + extractions: goodExtractions, + }); + + const artifacts = await writeCaptureQualityArtifacts({ + fixtureRoot, + report, + scenarios: CAPTURE_QUALITY_SCENARIOS, + extractions: goodExtractions, + }); + + expect(artifacts).toEqual({ + runDir: 'runs/capture-quality/capture-quality-test', + scenariosJson: 'runs/capture-quality/capture-quality-test/scenarios.json', + extractionsJson: 'runs/capture-quality/capture-quality-test/extractions.json', + reportJson: 'runs/capture-quality/capture-quality-test/report.json', + verdictMarkdown: 'runs/capture-quality/capture-quality-test/verdict.md', + }); + await expect(readFile(join(fixtureRoot, artifacts.reportJson), 'utf8')).resolves.toContain( + '"cwd": ""', + ); + await expect(readFile(join(fixtureRoot, artifacts.verdictMarkdown), 'utf8')).resolves.toContain( + 'Recommendation: graduate', + ); + }); +}); diff --git a/src/probes/capture-quality-loop.ts b/src/probes/capture-quality-loop.ts new file mode 100644 index 000000000..d05e22f11 --- /dev/null +++ b/src/probes/capture-quality-loop.ts @@ -0,0 +1,421 @@ +import { mkdir, readFile, writeFile } from 'node:fs/promises'; +import { dirname, join, resolve } from 'node:path'; +import process from 'node:process'; +import { fileURLToPath } from 'node:url'; + +import { portableCwd } from './portable-report.js'; + +const PROBE_ID = 'capture-quality' as const; + +export type CaptureFactKind = 'goal' | 'context' | 'constraint' | 'criterion' | 'requirement' | 'assumption'; +export type CaptureRecommendation = 'graduate' | 'narrow' | 'keep_parked'; + +export interface CaptureQualityExpectedFact { + readonly id: string; + readonly kind: CaptureFactKind; + readonly title: string; + readonly shouldCommit: boolean; + readonly rationale: string; +} + +export interface CaptureQualityScenario { + readonly id: string; + readonly label: string; + readonly category: 'free_prose' | 'file_ref' | 'implication_heavy'; + readonly input: string; + readonly expectedFacts: readonly CaptureQualityExpectedFact[]; +} + +export interface CaptureQualityExtractedFact { + readonly expectedId?: string; + readonly kind: CaptureFactKind; + readonly title: string; + readonly confidence: 'high' | 'low'; + readonly evidence: string; +} + +export interface CaptureQualityScenarioExtraction { + readonly scenarioId: string; + readonly facts: readonly CaptureQualityExtractedFact[]; +} + +export interface CaptureQualityScenarioResult { + readonly scenarioId: string; + readonly label: string; + readonly category: CaptureQualityScenario['category']; + readonly shouldCommitCount: number; + readonly truePositiveCount: number; + readonly missedShouldCommit: readonly CaptureQualityExpectedFact[]; + readonly falseCommitCount: number; + readonly falseCommits: readonly CaptureQualityExtractedFact[]; + readonly lowConfidenceImplicationCount: number; + readonly extractedFacts: readonly CaptureQualityExtractedFact[]; +} + +export interface CaptureQualityReport { + readonly schemaVersion: 1; + readonly probeId: typeof PROBE_ID; + readonly runId: string; + readonly generatedAt: string; + readonly cwd: string; + readonly extractorName: string; + readonly scenarioCount: number; + readonly totals: { + readonly shouldCommitCount: number; + readonly truePositiveCount: number; + readonly missedShouldCommitCount: number; + readonly falseCommitCount: number; + readonly lowConfidenceImplicationCount: number; + readonly precision: number; + readonly recall: number; + }; + readonly scenarioResults: readonly CaptureQualityScenarioResult[]; + readonly verdict: { + readonly a22ConfidenceShift: string; + readonly recommendation: CaptureRecommendation; + readonly summary: string; + }; + readonly artifacts?: CaptureQualityArtifacts; +} + +export interface CaptureQualityArtifacts { + readonly runDir: string; + readonly scenariosJson: string; + readonly extractionsJson: string; + readonly reportJson: string; + readonly verdictMarkdown: string; +} + +export const CAPTURE_QUALITY_SCENARIOS: readonly CaptureQualityScenario[] = [ + { + id: 'free-prose-launch-goal', + label: 'Free prose with explicit acceptance facts', + category: 'free_prose', + input: + 'We are building a local spec workspace for solo developers. The first useful outcome is that it should help capture project goals without forcing people into a rigid template. Success means a new contributor can read the graph and explain what problem the project solves.', + expectedFacts: [ + { + id: 'workspace-for-solo-developers', + kind: 'context', + title: 'The product is for solo developers working in a local spec workspace.', + shouldCommit: true, + rationale: 'Direct statement of audience and workspace setting.', + }, + { + id: 'capture-goals-without-template', + kind: 'goal', + title: 'Capture project goals without forcing a rigid template.', + shouldCommit: true, + rationale: 'Directly stated useful outcome.', + }, + { + id: 'new-contributor-explains-problem', + kind: 'criterion', + title: 'A new contributor can read the graph and explain the problem solved.', + shouldCommit: true, + rationale: 'Explicit success criterion.', + }, + ], + }, + { + id: 'file-ref-bearing-answer', + label: 'Answer grounded in a referenced file', + category: 'file_ref', + input: + 'Use docs/architecture/prd.md as the product frame. The non-negotiable is that graph truth must stay in SQLite under .brunch, while transcript evidence can remain JSONL as long as replay can recover the structured exchange results.', + expectedFacts: [ + { + id: 'prd-is-product-frame', + kind: 'context', + title: 'docs/architecture/prd.md is the product frame for this answer.', + shouldCommit: true, + rationale: 'Direct source/reference grounding.', + }, + { + id: 'graph-truth-sqlite-brunch', + kind: 'constraint', + title: 'Graph truth must stay in SQLite under .brunch.', + shouldCommit: true, + rationale: 'Directly labeled as non-negotiable.', + }, + { + id: 'jsonl-ok-if-replay-recovers-exchanges', + kind: 'criterion', + title: 'JSONL transcript evidence is acceptable only if replay recovers structured exchange results.', + shouldCommit: true, + rationale: 'Explicit conditional acceptance criterion.', + }, + { + id: 'must-build-full-replay-engine-now', + kind: 'requirement', + title: 'Build a full replay engine immediately.', + shouldCommit: false, + rationale: 'This is an implication beyond the stated condition.', + }, + ], + }, + { + id: 'implication-heavy-no-overcommit', + label: 'Implication-heavy answer that should not over-commit', + category: 'implication_heavy', + input: + 'If the browser observer gets confusing, I might prefer the terminal view for the demo. The web graph is helpful, but only if it keeps up quickly enough. I have not decided whether review sets belong in the POC story.', + expectedFacts: [ + { + id: 'terminal-demo-preference-conditional', + kind: 'assumption', + title: 'The user may prefer the terminal view if the browser observer is confusing.', + shouldCommit: false, + rationale: 'Conditional preference, not settled graph truth.', + }, + { + id: 'web-helpful-if-fast', + kind: 'criterion', + title: 'The web graph is helpful only if it keeps up quickly enough.', + shouldCommit: true, + rationale: 'Clear acceptance condition for web observer usefulness.', + }, + { + id: 'review-sets-in-poc', + kind: 'requirement', + title: 'Review sets belong in the POC story.', + shouldCommit: false, + rationale: 'Explicitly undecided; should stay out of graph truth.', + }, + ], + }, +]; + +export async function runCaptureQualityMeasurement( + options: { + readonly fixtureRoot?: string; + readonly extractionFile?: string; + readonly runId?: string; + readonly cwd?: string; + readonly extractorName?: string; + } = {}, +): Promise { + const fixtureRoot = resolve( + options.fixtureRoot ?? join(dirname(fileURLToPath(import.meta.url)), '../../.fixtures'), + ); + const extractionFile = + options.extractionFile ?? join(fixtureRoot, 'runs', PROBE_ID, 'sample-llm-extractions.json'); + const extractions = await readScenarioExtractions(extractionFile); + let report = summarizeCaptureQualityRun({ + runId: options.runId ?? defaultRunId(), + generatedAt: new Date().toISOString(), + cwd: options.cwd ?? process.cwd(), + extractorName: options.extractorName ?? 'sample-llm-output', + scenarios: CAPTURE_QUALITY_SCENARIOS, + extractions, + }); + report = { + ...report, + artifacts: await writeCaptureQualityArtifacts({ + fixtureRoot, + report, + scenarios: CAPTURE_QUALITY_SCENARIOS, + extractions, + }), + }; + return report; +} + +export function summarizeCaptureQualityRun(input: { + readonly runId: string; + readonly generatedAt: string; + readonly cwd: string; + readonly extractorName: string; + readonly scenarios: readonly CaptureQualityScenario[]; + readonly extractions: readonly CaptureQualityScenarioExtraction[]; +}): CaptureQualityReport { + const extractionByScenario = new Map(input.extractions.map((entry) => [entry.scenarioId, entry])); + const scenarioResults = input.scenarios.map((scenario) => + summarizeScenario(scenario, extractionByScenario.get(scenario.id)?.facts ?? []), + ); + const totals = scenarioResults.reduce( + (acc, result) => ({ + shouldCommitCount: acc.shouldCommitCount + result.shouldCommitCount, + truePositiveCount: acc.truePositiveCount + result.truePositiveCount, + missedShouldCommitCount: acc.missedShouldCommitCount + result.missedShouldCommit.length, + falseCommitCount: acc.falseCommitCount + result.falseCommitCount, + lowConfidenceImplicationCount: acc.lowConfidenceImplicationCount + result.lowConfidenceImplicationCount, + }), + { + shouldCommitCount: 0, + truePositiveCount: 0, + missedShouldCommitCount: 0, + falseCommitCount: 0, + lowConfidenceImplicationCount: 0, + }, + ); + const precisionDenominator = totals.truePositiveCount + totals.falseCommitCount; + const precision = precisionDenominator === 0 ? 0 : round(totals.truePositiveCount / precisionDenominator); + const recall = + totals.shouldCommitCount === 0 ? 0 : round(totals.truePositiveCount / totals.shouldCommitCount); + const verdict = verdictFor({ ...totals, precision, recall }); + + return { + schemaVersion: 1, + probeId: PROBE_ID, + runId: input.runId, + generatedAt: input.generatedAt, + cwd: input.cwd, + extractorName: input.extractorName, + scenarioCount: input.scenarios.length, + totals: { ...totals, precision, recall }, + scenarioResults, + verdict, + }; +} + +export async function writeCaptureQualityArtifacts(options: { + readonly fixtureRoot: string; + readonly report: CaptureQualityReport; + readonly scenarios: readonly CaptureQualityScenario[]; + readonly extractions: readonly CaptureQualityScenarioExtraction[]; +}): Promise { + const runDirRef = `runs/${PROBE_ID}/${options.report.runId}`; + const artifacts: CaptureQualityArtifacts = { + runDir: runDirRef, + scenariosJson: `${runDirRef}/scenarios.json`, + extractionsJson: `${runDirRef}/extractions.json`, + reportJson: `${runDirRef}/report.json`, + verdictMarkdown: `${runDirRef}/verdict.md`, + }; + const diskPath = (ref: string) => resolve(options.fixtureRoot, ref); + const report = { ...options.report, cwd: portableCwd(options.report.cwd), artifacts }; + + await mkdir(diskPath(artifacts.runDir), { recursive: true }); + await writeFile( + diskPath(artifacts.scenariosJson), + `${JSON.stringify(options.scenarios, null, 2)}\n`, + 'utf8', + ); + await writeFile( + diskPath(artifacts.extractionsJson), + `${JSON.stringify(options.extractions, null, 2)}\n`, + 'utf8', + ); + await writeFile(diskPath(artifacts.reportJson), `${JSON.stringify(report, null, 2)}\n`, 'utf8'); + await writeFile(diskPath(artifacts.verdictMarkdown), verdictMarkdown(report), 'utf8'); + + return artifacts; +} + +function summarizeScenario( + scenario: CaptureQualityScenario, + extractedFacts: readonly CaptureQualityExtractedFact[], +): CaptureQualityScenarioResult { + const expectedById = new Map(scenario.expectedFacts.map((fact) => [fact.id, fact])); + const highConfidence = extractedFacts.filter((fact) => fact.confidence === 'high'); + const truePositiveIds = new Set( + highConfidence.flatMap((fact) => { + const expected = fact.expectedId === undefined ? undefined : expectedById.get(fact.expectedId); + return expected?.shouldCommit === true ? [expected.id] : []; + }), + ); + const falseCommits = highConfidence.filter((fact) => { + if (fact.expectedId === undefined) return true; + return expectedById.get(fact.expectedId)?.shouldCommit !== true; + }); + const shouldCommitFacts = scenario.expectedFacts.filter((fact) => fact.shouldCommit); + const missedShouldCommit = shouldCommitFacts.filter((fact) => !truePositiveIds.has(fact.id)); + const lowConfidenceImplicationCount = extractedFacts.filter((fact) => { + if (fact.confidence !== 'low' || fact.expectedId === undefined) return false; + return expectedById.get(fact.expectedId)?.shouldCommit === false; + }).length; + + return { + scenarioId: scenario.id, + label: scenario.label, + category: scenario.category, + shouldCommitCount: shouldCommitFacts.length, + truePositiveCount: truePositiveIds.size, + missedShouldCommit, + falseCommitCount: falseCommits.length, + falseCommits, + lowConfidenceImplicationCount, + extractedFacts, + }; +} + +function verdictFor(totals: CaptureQualityReport['totals']): CaptureQualityReport['verdict'] { + if (totals.falseCommitCount > 0) { + return { + a22ConfidenceShift: + 'negative: the measured extractor committed at least one low-confidence implication', + recommendation: 'keep_parked', + summary: + 'Do not graduate generalized capture until the extraction prompt/model can keep undecided implications out of high-confidence graph truth.', + }; + } + if (totals.recall < 0.8) { + return { + a22ConfidenceShift: 'mixed: precision held, but recall missed too many directly stated facts', + recommendation: 'narrow', + summary: + 'Generalized capture can be narrowed to high-confidence directly extractive facts, but should not broaden until recall improves.', + }; + } + return { + a22ConfidenceShift: 'positive: high-confidence capture separated commit-worthy facts from implications', + recommendation: 'graduate', + summary: + 'A22-L is fit to graduate into a narrow generalized-capture frontier, preserving an explicit false-commit guard.', + }; +} + +async function readScenarioExtractions(path: string): Promise { + return JSON.parse(await readFile(path, 'utf8')) as CaptureQualityScenarioExtraction[]; +} + +function verdictMarkdown(report: CaptureQualityReport): string { + return `# Capture-quality verdict\n\n- A22-L confidence shift: ${report.verdict.a22ConfidenceShift}\n- Recommendation: ${report.verdict.recommendation}\n- Precision: ${report.totals.precision}\n- Recall: ${report.totals.recall}\n- False commits: ${report.totals.falseCommitCount}\n\n${report.verdict.summary}\n`; +} + +function round(value: number): number { + return Math.round(value * 1000) / 1000; +} + +function defaultRunId(): string { + return new Date().toISOString().replaceAll(':', '-').replaceAll('.', '-'); +} + +function parseCliArgs(argv: readonly string[]): Parameters[0] { + const options: Record = {}; + for (let index = 0; index < argv.length; index += 1) { + const arg = argv[index]; + if (arg !== undefined && arg.startsWith('--')) { + options[arg.slice(2)] = requiredValue(argv, (index += 1), arg); + } + } + return { + ...(options['fixture-root'] !== undefined ? { fixtureRoot: options['fixture-root'] } : {}), + ...(options['extraction-file'] !== undefined ? { extractionFile: options['extraction-file'] } : {}), + ...(options['run-id'] !== undefined ? { runId: options['run-id'] } : {}), + ...(options.cwd !== undefined ? { cwd: options.cwd } : {}), + ...(options['extractor-name'] !== undefined ? { extractorName: options['extractor-name'] } : {}), + }; +} + +function requiredValue(argv: readonly string[], index: number, flag: string): string { + const value = argv[index]; + if (value === undefined) { + throw new Error(`${flag} requires a value`); + } + return value; +} + +async function main(): Promise { + const report = await runCaptureQualityMeasurement(parseCliArgs(process.argv.slice(2))); + process.stdout.write(`${JSON.stringify(report, null, 2)}\n`); + process.exitCode = report.verdict.recommendation === 'keep_parked' ? 1 : 0; +} + +if (process.argv[1] === fileURLToPath(import.meta.url)) { + main().catch((error: unknown) => { + console.error(error); + process.exitCode = 1; + }); +} From 68273e3519a4c5e02173ca55f7a70cf2a4ef49d0 Mon Sep 17 00:00:00 2001 From: Lu Nelson Date: Mon, 8 Jun 2026 12:07:48 +0200 Subject: [PATCH 17/17] Sync canonical memory after coverage build-outs and merge Reconcile runtime-affordances-and-legality to done (00105108), add its Recently Completed entry, archive the 06-05/06-06 completion bursts to PLAN_HISTORY, update the cross-cut capture-quality note, and add D40-L to the session topology README header for the new affordance ledger. Amp-Thread-ID: https://ampcode.com/threads/T-019ea6aa-0ce3-766e-8458-36e6a4450587 Co-authored-by: Amp --- docs/archive/PLAN_HISTORY.md | 12 ++++++++++++ memory/CROSS_CUT_PLAN.md | 2 +- memory/PLAN.md | 16 +++++----------- src/session/README.md | 2 +- 4 files changed, 19 insertions(+), 13 deletions(-) diff --git a/docs/archive/PLAN_HISTORY.md b/docs/archive/PLAN_HISTORY.md index 1da2cd99a..89837dcc7 100644 --- a/docs/archive/PLAN_HISTORY.md +++ b/docs/archive/PLAN_HISTORY.md @@ -3,6 +3,18 @@ This file is the active POC-line plan archive for `memory/PLAN.md`. Legacy pre-`next` history was moved out of the live docs tree with the old archived implementation. +## 2026-06-08 Sync archive + +Archived from `memory/PLAN.md` during the post-merge ln-sync once the 2026-06-08 coverage burst (`runtime-affordances-and-legality`, `capture-quality-spike`, `minimal-authority-shell`, cross-cut body-depth, `elicitation-backlog`) became the live completion window. These prior bursts move here. + +- 2026-06-06 `project-graph-review-cycle` (FE-809) — Done: `project-graph` now has active review tools at commitment readiness, real agent proposal generation reaches `present_review_set`, approval goes through public `session.submitExchangeResponse`, `CommandExecutor.acceptReviewSet` commits the exact reviewed batch with `basis: explicit`, and graph/session invalidations publish with `{specId, lsn}`. Verified: `src/.pi/agents/state.test.ts`, `src/.pi/__tests__/prompting.test.ts`, `src/probes/project-graph-review-cycle-proof.test.ts`, and real run `.fixtures/runs/project-graph-review-cycle/2026-06-06-project-graph-review-cycle/`. + +- 2026-06-06 `topology-readmes-and-boundaries` — Done: root product entrypoints moved to `app/`/`workspace/`/`scripts`; reusable graph/session/exchanges/workspace projection helpers moved to `projections/`; reusable markdown/text renderers moved to `renderers/`; `src/projections/topology-boundaries.test.ts` now guards the projection/renderer adapter boundary; and D40-L runtime-state policy now shares `elicit-read-only` tool-policy definitions from `projections/session/runtime-policy.ts` while `.pi/extensions/runtime` remains the Pi tool adapter. Verified: targeted topology/runtime tests and `npm run verify`. + +- 2026-06-05 `capture-response-to-graph` (FE-807) — Done: synchronous response-capture tracer. Added a narrow labeled-text translator for `Goal:`, `Context:`, `Constraint:`, and `Criterion:` facts; wired public `session.submitExchangeResponse` to capture through the transcript binding's spec and `CommandExecutor.commitGraph({basis: explicit})`; returned loud capture outcomes; published graph invalidations; and added a public-RPC proof that activation/trigger/submit/overview exposes captured projected codes. Verified: `src/graph/capture/structured-response.test.ts`, `src/rpc/handlers.test.ts`, `src/probes/capture-response-to-graph-proof.test.ts`. + +- 2026-06-05 `dev-seed-fixtures` — Done: first product-driven fixture curation tracer. Added deterministic `bilal-port-variants/macro-view-grounded-intent` explicit-only intent base, a `fixture-curation` probe runner/report summarizer, and run artifacts proving `gpt-5.5` used real `read_graph`/`commit_graph` product tools to persist two implicit requirement nodes plus six implicit edges through `CommandExecutor`. Verified: `src/probes/fixture-curation-loop.test.ts`, `src/graph/seed-fixtures.test.ts`, real run `.fixtures/runs/fixture-curation/fixture-curation-2026-06-05T104440Z/`. + ## 2026-06-06 Sync archive Archived from `memory/PLAN.md` during topology-chain sync so the live plan keeps only active/next/parallel definitions plus the last few completion summaries. diff --git a/memory/CROSS_CUT_PLAN.md b/memory/CROSS_CUT_PLAN.md index c88f14042..16e75fce1 100644 --- a/memory/CROSS_CUT_PLAN.md +++ b/memory/CROSS_CUT_PLAN.md @@ -127,7 +127,7 @@ DoD: every ● row is `have` or `built`. | --- | --- | --- | --- | --- | --- | | 6 method resources scaffolding | have | ● | — | — | run-structured-exchange, infer-and-capture, commit-graph, read-context, generate-proposal, review-for-gaps | | method **content depth** | built | ● | — | done — deepened bodies + manifest-wide depth test (1ca02e38) | each method gives tool-routing/sequencing guidance, not tool-description restatement | -| generalized capture (free text, files, refs; iterative passes) | built | ● | — | done — labeled-text core on `session.submitMessage` (5f5e6ac8) | `built` = the **POC bar only** (directly-labeled facts). Richer free-text/files/refs capture is **out of this row's scope by design**, not unfinished here: it is gated on the `capture-quality-spike` (A22-L) and owned by the PLAN frontier `exchanges-and-generalized-capture`. D66-L | +| generalized capture (free text, files, refs; iterative passes) | built | ● | — | done — labeled-text core on `session.submitMessage` (5f5e6ac8) | `built` = the **POC bar only** (directly-labeled facts). Richer free-text/files/refs capture is **out of this row's scope by design**, not unfinished here: the `capture-quality-spike` (A22-L) has since landed (2026-06-08, precision/recall 1.0, zero false commits) and narrowly graduated the PLAN frontier `exchanges-and-generalized-capture`, which now owns the richer capture with an explicit false-commit guard. D66-L | | exchange-tool `.description()` / `promptGuidelines` | built | ● | — | done — all 7 exchange tools carry both (drift correction 2026-06-07) | `src/.pi/extensions/exchanges/*` already match the `commit_graph` pattern | | skill-commands (`gap-review`, `arbitrary-enhance`) | new | ○ | proving | Q6 (deferred) | off critical path | diff --git a/memory/PLAN.md b/memory/PLAN.md index bc2b1790b..7bfd7c39d 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -299,6 +299,8 @@ The remaining coverage frontiers are being deliberately de-fogged rather than le - **Design docs:** `.fixtures/seeds/bilal-port/README.md`; `docs/design/GRAPH_MODEL.md`; `docs/praxis/manual-testing.md`. ## Recently Completed +- 2026-06-08 `runtime-affordances-and-legality` — Done (00105108): added `src/projections/session/affordances.ts` owning the pure `(resolvedState, readinessGrade) → legal goal/strategy/lens options + default-on-switch` derivation; lifted the shared grade/AUTO legality tables into `src/projections/session/runtime-policy.ts` and refactored `src/.pi/agents/state.ts` to reuse that single legality source (no client-local reimplementation); added the closed coverage ledger to `src/session/README.md` with `src/session/runtime-affordances-coverage.test.ts` guarding the required agent rows while tripwiring `active-review-set` / `turn-mode` as explicit product-state-gated deferrals. Reconciled D40-L. Verified: `src/projections/session/affordances.test.ts`, `src/session/runtime-affordances-coverage.test.ts`, and `npm run verify`. + - 2026-06-08 `capture-quality-spike` — Done: added `src/probes/capture-quality-loop.ts` and a deterministic report test over free-prose, file/ref-bearing, and implication-heavy capture scenarios. The run artifact `.fixtures/runs/capture-quality/2026-06-08-capture-quality-sample/` records precision 1.0 / recall 1.0 with zero false commits from the sample extraction set and recommends graduating `exchanges-and-generalized-capture` narrowly, preserving a false-commit oracle for implication-heavy text. Verified: `src/probes/capture-quality-loop.test.ts` and `npm run verify`. - 2026-06-08 `minimal-authority-shell` (FE-810) — Done: added the authority-matrix guard test over the current POC authority seam. The guard locks `CommandExecutor` mutation-result discriminants as the graph outcome vocabulary, proves `needs_human` is structured data rather than a TUI-only dialog, and asserts `elicit` tool authority comes from the shared projected runtime policy while blocking the identified side-effecting tools (`bash`, `edit`, `write`). No new authority service; `src/.pi/agents/state.ts` untouched; A18-L strict built-in suppression remains accepted Pi-upstream/API residue. Verified: `src/.pi/extensions/runtime/authority-matrix.test.ts` and `npm run verify`. @@ -307,15 +309,7 @@ The remaining coverage frontiers are being deliberately de-fogged rather than le - 2026-06-08 `elicitation-backlog` (FE-823) — Done: materialized `elicitation_backlog` as a flat spec-scoped table with generated migration, seeded the grounding agenda at `createSpec`, routed create/close entry mutations through `CommandExecutor` on the shared `{specId, lsn}` / `change_log` boundary, and added graph-owned per-spec open-entry read-back. Reconciled D65-L/A24-L and updated graph/db topology docs. Verified: `src/graph/command-executor.test.ts`, `src/graph/queries.test.ts`, and `npm run verify`. -- 2026-06-06 `project-graph-review-cycle` (FE-809) — Done: `project-graph` now has active review tools at commitment readiness, real agent proposal generation reaches `present_review_set`, approval goes through public `session.submitExchangeResponse`, `CommandExecutor.acceptReviewSet` commits the exact reviewed batch with `basis: explicit`, and graph/session invalidations publish with `{specId, lsn}`. Verified: `src/.pi/agents/state.test.ts`, `src/.pi/__tests__/prompting.test.ts`, `src/probes/project-graph-review-cycle-proof.test.ts`, and real run `.fixtures/runs/project-graph-review-cycle/2026-06-06-project-graph-review-cycle/`. - -- 2026-06-06 `topology-readmes-and-boundaries` — Done: root product entrypoints moved to `app/`/`workspace/`/`scripts`; reusable graph/session/exchanges/workspace projection helpers moved to `projections/`; reusable markdown/text renderers moved to `renderers/`; `src/projections/topology-boundaries.test.ts` now guards the projection/renderer adapter boundary; and D40-L runtime-state policy now shares `elicit-read-only` tool-policy definitions from `projections/session/runtime-policy.ts` while `.pi/extensions/runtime` remains the Pi tool adapter. Verified: targeted topology/runtime tests and `npm run verify`. - -- 2026-06-05 `capture-response-to-graph` (FE-807) — Done: synchronous response-capture tracer. Added a narrow labeled-text translator for `Goal:`, `Context:`, `Constraint:`, and `Criterion:` facts; wired public `session.submitExchangeResponse` to capture through the transcript binding's spec and `CommandExecutor.commitGraph({basis: explicit})`; returned loud capture outcomes; published graph invalidations; and added a public-RPC proof that activation/trigger/submit/overview exposes captured projected codes. Verified: `src/graph/capture/structured-response.test.ts`, `src/rpc/handlers.test.ts`, `src/probes/capture-response-to-graph-proof.test.ts`. - -- 2026-06-05 `dev-seed-fixtures` — Done: first product-driven fixture curation tracer. Added deterministic `bilal-port-variants/macro-view-grounded-intent` explicit-only intent base, a `fixture-curation` probe runner/report summarizer, and run artifacts proving `gpt-5.5` used real `read_graph`/`commit_graph` product tools to persist two implicit requirement nodes plus six implicit edges through `CommandExecutor`. Verified: `src/probes/fixture-curation-loop.test.ts`, `src/graph/seed-fixtures.test.ts`, real run `.fixtures/runs/fixture-curation/fixture-curation-2026-06-05T104440Z/`. - -Older history (including `graph-tool-resilience`, spec-scoped graph-clock hardening, `agents-composition-layer`, `live-graph-observer`, `agent-graph-integration`, `spec-persistence-and-startup`, `sealed-pi-profile-runtime-state`, `pi-ui-extension-patterns`, `web-shell`, `jsonl-session-viability`, `mode-shell-and-fixture-driver`, `walking-skeleton`): `docs/archive/PLAN_HISTORY.md` +Older history (including `project-graph-review-cycle`, `topology-readmes-and-boundaries`, `capture-response-to-graph`, `dev-seed-fixtures` first tracer, `graph-tool-resilience`, spec-scoped graph-clock hardening, `agents-composition-layer`, `live-graph-observer`, `agent-graph-integration`, `spec-persistence-and-startup`, `sealed-pi-profile-runtime-state`, `pi-ui-extension-patterns`, `web-shell`, `jsonl-session-viability`, `mode-shell-and-fixture-driver`, `walking-skeleton`): `docs/archive/PLAN_HISTORY.md` ## Dependencies @@ -328,7 +322,7 @@ nodes: minimal-authority-shell [done · P1] thin safety posture for current POC paths poc-live-ship-gate [next · P1] final fresh-cwd composed product runbook graph-observed-shapes [done · proving] ratified consumer-specific observed-shape ledger + drift guard; no transport shape shipped - runtime-affordances-and-legality [next · proving] buildable-now affordance(resolvedState) coverage ledger; review-set/turn-mode rows tripwired + runtime-affordances-and-legality [done · proving] shared affordance(resolvedState, grade) derivation + coverage ledger; review-set/turn-mode rows tripwired elicitation-driver [next · proving] live per-turn what-to-ask-next driver on FE-823 substrate; closes cross-cut Seam 3a capture-quality-spike [done · spike] A22-L fitness evidence graduated a narrow exchanges-and-generalized-capture scope probes-and-transcripts-evolution [parallel] continuous evidence substrate @@ -363,7 +357,7 @@ horizon: notes: - `elicitation-backlog` was the promoted D65-L *substrate* row from `memory/CROSS_CUT_PLAN.md`; the prompt-resource body-depth pass landed in 1ca02e38. The cross-cut is **not** exhausted: its Seam 3a `"what to ask next" driver` row is still `partial · ●`, which by the seam DoD keeps the seam open. That row is now disposed as the `elicitation-driver` frontier (not residue), so the remaining cross-cut obligation has a named owner in `PLAN.md`. - - Parallel worktree streams (2026-06-08): all three landed — (A) `crosscut-know--resource-body-depth` (1ca02e38), (B) `graph-observed-shapes--coverage-ledger` (85e73ba7), (C) `minimal-authority-shell--audit-and-guard` (68474e3f); each kept to its declared write paths and left `src/.pi/agents/state.ts` untouched, so the parallel run produced no collisions. `poc-live-ship-gate` is now unblocked (its hard dependency `minimal-authority-shell` is done). The next coverage frontiers are de-fogged rather than parked: `runtime-affordances-and-legality` (buildable-now ledger), `elicitation-driver` (buildable-now on the FE-823 substrate), and the now-graduated narrow `exchanges-and-generalized-capture` inventory are cold-startable worktree streams. + - Parallel worktree streams (2026-06-08): all three landed — (A) `crosscut-know--resource-body-depth` (1ca02e38), (B) `graph-observed-shapes--coverage-ledger` (85e73ba7), (C) `minimal-authority-shell--audit-and-guard` (68474e3f); each kept to its declared write paths and left `src/.pi/agents/state.ts` untouched, so the parallel run produced no collisions. `poc-live-ship-gate` is now unblocked (its hard dependency `minimal-authority-shell` is done). `runtime-affordances-and-legality` has since landed (00105108), so the remaining de-fogged coverage frontiers are `elicitation-driver` (buildable-now on the FE-823 substrate) and the now-graduated narrow `exchanges-and-generalized-capture` inventory — both cold-startable worktree streams. - Completed prerequisites: `agents-composition-layer` supplies runtime prompt/resource posture, and `live-graph-observer` supplies the read-only web observer path expected by `capture-response-to-graph` and `poc-live-ship-gate`. - `graph-observed-shapes` is intentionally consumer-specific: do not assume every agent read shape belongs on the web observer. - `exchanges-and-generalized-capture` is now graduated only narrowly: scope high-confidence extractive capture with a false-commit guard, and do not regrow deleted `capture-*` symmetry. diff --git a/src/session/README.md b/src/session/README.md index ef4fb716c..25413fcc5 100644 --- a/src/session/README.md +++ b/src/session/README.md @@ -1,6 +1,6 @@ # session/ — Session domain layer -SPEC decisions: D6-L, D11-L, D12-L, D13-L, D21-L, D52-L +SPEC decisions: D6-L, D11-L, D12-L, D13-L, D21-L, D40-L, D52-L ## Owns