ksopyla · ksopyla · Apr 13, 2026 · Apr 13, 2026 · Apr 13, 2026 · Apr 13, 2026
@@ -0,0 +1,7 @@
+{
+  "permissions": {
+    "allow": [
+      "mcp__crypto-intelligence__research_crypto_project"
+    ]
+  }
+}
@@ -57,8 +57,9 @@ Short positioning paragraph:
 
 ## Quick Start
 
-Fastest runnable path: .env setup, docker compose, curl. Include the
-verification request here so there's no need for a separate Verification section.
+Fastest runnable path: .env setup, docker compose, curl. Include the main
+health and success path here, and mention `endpoints.http` when prebuilt
+requests already exist.
 
 ## What You Get Back
 
@@ -85,10 +86,16 @@ copy, rewrite it shorter.
 
 ## Implementation Walkthrough
 
-Link to source files so the reader can jump directly. Show code only when it's
-the actual working snippet that teaches the architecture (e.g. the MCP tool
-definition). For everything else, reference the file and explain the idea in
-prose. Never show pseudo-code or comment-only code blocks.
+Use numbered steps so the flow is easy to follow:
+1. define the shared state
+2. define the nodes
+3. wire the graph / runtime
+4. expose the public entry points
+
+Link to source files so the reader can jump directly. Prefer short prose plus
+file references over long code excerpts. Inline code only when it teaches the
+architecture and the real working snippet is genuinely more helpful than prose.
+Never show pseudo-code or comment-only code blocks.
 
 ## Connect Your MCP Client / Integration
 (if applicable -- combine all client tools into one section, CLI first, GUI last)
@@ -97,13 +104,11 @@ prose. Never show pseudo-code or comment-only code blocks.
 
 uv sync, test, lint commands.
 
-## Exercises
-
-2 items max. One simple extension, one architectural extension. One sentence each.
-
-## Trade-offs
+## What You Have Learned
 
-Table of advantages vs. limitations. End with the bridge to the next pattern.
+Short takeaway bullets, then:
+- bridge to the next pattern with one sentence explaining what it adds
+- call to action to star the GitHub repository
 
 ## Further Reading
 
@@ -112,8 +117,13 @@ Only link docs for technologies introduced in this pattern.
 
 **Sections to skip:**
 
-- **What You Should See** -- skip if Quick Start already shows expected output
+- **When to Use / When Not to Use** -- skip unless the pattern needs a short,
+  non-obvious note that materially helps the reader choose the pattern
+- **What You Should See** -- skip; console and Docker logs add noise unless they
+  teach something essential
 - **Verification** -- never duplicate Quick Start with the same curl commands
+- **Exercises** -- skip for example READMEs in this repo
+- **Trade-offs** -- replace with `What You Have Learned`
 
 ## README Quality Rules
 
@@ -132,7 +142,10 @@ Only link docs for technologies introduced in this pattern.
 - **Architecture explanation helps, not just describes**: when mentioning infrastructure (containers, ports, networks), explain WHY it's structured that way, not just WHAT exists.
 - **Key Concepts are tight**: 4 bullets max. One line each with em-dash separators. Cut any bullet that restates the architecture diagram.
 - **The Problem is concise**: 2-4 sentences stating the limitation. No comparison tables unless truly needed.
-- **Exercises are short**: 2 items max. One sentence each. One simple extension, one architectural.
+- **Implementation Walkthrough stays structural**: prefer numbered steps and file references over detailed tutorial prose. The goal is to show how things fit together, not restate the code line by line.
+- **Quick Start does the heavy lifting**: put the runnable path there and point to `endpoints.http` instead of adding a separate verification section.
+- **MCP setup stays practical**: if the example exposes MCP, include Claude Code, Cursor, and Claude Desktop setup in one section.
+- **End strong**: use `What You Have Learned` for takeaways, then bridge to the next pattern and add the GitHub star CTA.
 - **Further Reading is scoped**: only link docs for technologies introduced by this specific pattern.
 - **Integration guides are combined**: don't split Claude Code / Cursor / Claude Desktop into separate sections. One section, multiple examples, developer-workflow order (CLI tools first, GUI apps last).
 - **No AI tone**: avoid marketing-speak, over-explanation, and restating the obvious. If a sentence doesn't add information, delete it.
@@ -141,6 +154,7 @@ Only link docs for technologies introduced in this pattern.
 
 Before finalizing an example README, verify:
 - the documented quick start matches the actual `docker compose` flow
+- `endpoints.http` is mentioned when it exists and is useful
 - repo-root `.env` dependencies are stated explicitly when present
 - optional shortcuts are labeled as optional
 - provider selection instructions match shared config defaults

@@ -2,6 +2,36 @@
 
 All notable changes to this project are documented here.
 
+## [2026-04-13] Pattern 03: Checkpoint Recovery and Resilience
+
+### Added
+- PostgreSQL-backed checkpointer via `agent_common.persistence` (shared `create_postgres_pool`, `setup_checkpointer`, `close_checkpointer`)
+- Project Verifier and Project Selector nodes -- CoinGecko match validation with `interrupt()` for ambiguous results
+- `POST /run/resume` REST endpoint for human-in-the-loop resume after interrupt
+- MCP tools: `get_research_status`, `list_research_threads`, `delete_research_thread` for thread inspection
+- Service layer (`src/service.py`) separating retry-after-failure from resume-after-interrupt semantics
+- Docker Compose with PostgreSQL container and health checks
+- Full test suite: checkpoint recovery e2e, interrupt/resume e2e, MCP tool unit tests, API tests
+
+### Changed
+- Graph extends P02 fan-out/fan-in with verifier → selector before parallel branches
+- `libs/common`: added `postgres_uri` to Settings, new `persistence` module, updated `__init__.py` exports
+- Pattern progression revised: P03 renamed from "Persistent Memory" to "Checkpoint Recovery", P04 from "Memory Lifecycle" to "Agent Memory" (now on main path)
+- P01 and P02 READMEs streamlined with walkthrough structure and "What You Have Learned" takeaways
+- README skill template updated with new section conventions
+
+### Architecture Decisions
+- **Checkpointing is resilience, not memory**: `thread_id` resumes a failed or interrupted workflow but does not create cross-session knowledge -- that is P04's concern
+- **Verifier + selector over silent best-guess**: ambiguous CoinGecko matches become explicit `interrupt()` calls, keeping the human in the loop rather than silently choosing the wrong coin
+- **Thread status derived from checkpoints**: MCP tools inspect LangGraph state directly instead of maintaining a parallel status table
+- **Two entry points, one service layer**: REST and MCP both delegate to `service.run_pipeline` / `service.resume_pipeline`, keeping execution semantics in one place
+
+### Dependencies
+- langgraph-checkpoint-postgres (PostgreSQL checkpointer)
+- psycopg[binary,pool] (async PostgreSQL driver with connection pooling)
+
+---
+
 ## [2026-03-30] Pattern 02: MCP Tool Integration -- Architecture Redesign
 
 ### Added

@@ -52,7 +52,7 @@ Teams that emerge as complexity demands them:
 ### Act 1 &mdash; One Team, Growing Capabilities
 <sup>Patterns 01-04</sup>
 
-You are **Team 1: Intelligence**. Three agents research crypto projects inside a single LangGraph pipeline. It works -- until you realize tools are hardcoded, every request starts from scratch, and memory grows unbounded. Each limitation drives the next pattern: MCP for standardized tools, PostgreSQL-backed checkpointers for persistence, a Memory Refiner for lifecycle management.
+You are **Team 1: Intelligence**. Three agents research crypto projects inside a single LangGraph pipeline. It works -- until you realize tools are hardcoded, long-running workflows are fragile, and the agent forgets what users cared about across sessions. Each limitation drives the next pattern: MCP for standardized tools, PostgreSQL-backed checkpoint recovery for resilience, and real long-term memory for user and project knowledge.
 
 ### Act 2 &mdash; Teams Multiply, Protocols Emerge
 <sup>Patterns 05-06</sup>
@@ -93,15 +93,15 @@ Team 2 moves to an external partner. Implicit trust is gone -- JWT authenticatio
 </tr>
 <tr>
 <td>03</td>
-<td><a href="examples/03-persistent-memory/">Persistent Memory</a></td>
-<td>Remembering across conversations</td>
-<td>Checkpointer, PostgreSQL, thread management</td>
+<td><a href="examples/03-checkpoint-recovery/">Checkpoint Recovery</a></td>
+<td>Recovering long-running workflows without restarting from scratch</td>
+<td>PostgresSaver, thread_id, interrupts, resume semantics</td>
 </tr>
 <tr>
 <td>04</td>
-<td><a href="examples/04-memory-lifecycle/">Memory Lifecycle</a> <sup>optional</sup></td>
-<td>Managing growing knowledge bases</td>
-<td>Memory refiner, fact TTL, hierarchical memory</td>
+<td><a href="examples/04-agent-memory/">Agent Memory</a></td>
+<td>Remembering user interests and prior research across sessions</td>
+<td>PostgresStore, Honcho, user preferences, incremental research</td>
 </tr>
 <tr><td colspan="4"><strong>Distribution Tier</strong> · Multi-service, multi-team, real distributed systems</td></tr>
 <tr>
@@ -144,9 +144,9 @@ Every pattern exists because the previous one creates a real limitation:
 
 ```
 P01 ─── Hardcoded tools can't be shared ──────────────── P02
-P02 ─── Every request starts from scratch ────────────── P03
-P03 ─┬─ Memory grows unbounded ──────────────────────── P04 (optional)
-     └─ A second team arrives, can't import their code ─ P05
+P02 ─── Long runs fail and lose completed work ──────── P03
+P03 ─── Resilient threads still forget across sessions ─ P04
+P04 ─── A second team arrives, can't import their code ─ P05
 P05 ─── Third team needs both, sequential is too slow ── P06
 P06 ─── Team 2 moves to external partner, no trust ──── P07
 P07 ─── New agents appear, consumers need code changes ─ P08

@@ -48,8 +48,8 @@ graph TD
     subgraph foundation ["Foundation Tier"]
         P01["P01: Orchestrator Pipeline"]
         P02["P02: MCP Tool Integration"]
-        P03["P03: Persistent Memory"]
-        P04["P04: Memory Lifecycle\n(enrichment)"]
+        P03["P03: Checkpoint Recovery\nand Resilience"]
+        P04["P04: Agent Memory\nand Knowledge"]
     end
     subgraph distribution ["Distribution Tier"]
         P05["P05: Distributed A2A"]
@@ -63,17 +63,14 @@ graph TD
     P01 --> P02
     P02 --> P03
     P03 --> P04
-    P03 --> P05
-    P04 -.-> P05
+    P04 --> P05
     P05 --> P06
     P06 --> P07
     P07 --> P08
     P08 --> P09
 ```
 
-**Main path**: P01 -> P02 -> P03 -> P05 -> P06 -> P07 -> P08 -> P09
-
-**Optional enrichment**: P04 branches off P03 (can be skipped without breaking the progression)
+**Main path**: P01 -> P02 -> P03 -> P04 -> P05 -> P06 -> P07 -> P08 -> P09
 
 **Team introduction timeline**:
 
@@ -195,65 +192,77 @@ graph TD
 
 ---
 
-### Pattern 03: Persistent Memory
+### Pattern 03: Checkpoint Recovery and Resilience
 
-**Folder:** `examples/03-persistent-memory/`
+**Folder:** `examples/03-checkpoint-recovery/`
 
-**Goal:** Add persistent state across conversations using LangGraph's checkpointer backed by PostgreSQL. When a user asks about a crypto project a second time, the system remembers previous research and provides incremental updates instead of starting from scratch.
+**Goal:** Add durable execution to the Pattern 02 pipeline using LangGraph's PostgreSQL-backed checkpointer. When a long-running research run fails midway, the system resumes from the last successful checkpoint instead of starting over.
 
-**What it solves:** In Patterns 01-02, every request starts fresh. For a research platform, this wastes tokens and time -- if you researched Arbitrum yesterday, you should build on that knowledge, not repeat it.
+**What it solves:** Pattern 02 already has a realistic failure surface: three external API calls, multiple LLM invocations, and a fan-out/fan-in graph. If `project_profiler` times out after `news_scanner` and `community_analyst` succeed, you currently lose completed work and repay the token and latency cost on retry. Checkpointing fixes resiliency, not memory.
 
-**Team focus:** Team 1 (Intelligence) -- same 5 agents, now with persistent memory.
+**Team focus:** Team 1 (Intelligence) -- same 5 agents, now with durable execution, thread continuity, and human checkpoints.
 
 **Architecture:**
 
 ```mermaid
 graph TD
     User --> FastAPI["Agent Service\n(FastAPI :8000)"]
     FastAPI --> Pipeline["LangGraph Pipeline\n+ Checkpointer"]
-    Pipeline --> PG["PostgreSQL\n(conversation state + research cache)"]
-    Pipeline --> CoinGecko["CoinGecko API"]
-    Pipeline --> DDG["DuckDuckGo\n(web search)"]
     ClaudeDesktop["Claude Desktop"] -->|MCP| MCP["crypto-intelligence\nMCP (:8001)"]
     MCP --> Pipeline
+    Pipeline --> PG["PostgreSQL\n(checkpoints)"]
+    Pipeline --> CoinGecko["CoinGecko API"]
+    Pipeline --> DDG["DuckDuckGo\n(web search)"]
+    Pipeline --> HITL["Human checkpoint\ninterrupt()/resume"]
 ```
 
 **Key concepts:**
 
-- LangGraph checkpointer with PostgreSQL backend
-- Thread-based conversation management (each project = a thread)
-- Research result caching and incremental updates
-- State persistence across agent restarts
-- Docker Compose with PostgreSQL container
+- LangGraph `PostgresSaver` for durable checkpoints
+- Stable `thread_id` as the resume handle for a research workflow
+- Resume-after-failure semantics: retry only the failed node, not the full graph
+- Human-in-the-loop with `interrupt()` and `Command(resume=...)`
+- Idempotent node design and graceful degradation around external API failures
+- Docker Compose with PostgreSQL as durable workflow state
 
-**libs/common additions:** `agent_common.memory` -- checkpointer setup utilities
+**libs/common additions:** `agent_common.persistence` -- PostgreSQL pool and checkpointer helpers
 
 **Builds on:** Pattern 02
 
 ---
 
-### Pattern 04: Memory Lifecycle Management (Enrichment)
+### Pattern 04: Agent Memory and Knowledge
 
-**Folder:** `examples/04-memory-lifecycle/`
+**Folder:** `examples/04-agent-memory/`
 
-**Goal:** Manage growing agent memory with consolidation, expiration, and hierarchical organization. Introduce a Memory Refiner agent that runs periodically to keep the knowledge base accurate and compact.
+**Goal:** Add actual cross-session memory using LangGraph `PostgresStore` plus a memory layer such as Honcho for richer user and project understanding. The system should remember which coins a user tracks, what they care about, and what was learned in previous research threads.
 
-**What it solves:** After many research sessions, memory grows unbounded. Stale facts ("BTC price is $67k") pollute new analyses. The system needs to distinguish between ephemeral data (prices, news) and durable knowledge (project launch date, team composition).
+**What it solves:** Pattern 03 makes the workflow resilient, but it is still amnesiac. A resumed thread is not the same thing as long-term memory. Users expect the agent to remember repeated interests ("I keep tracking Arbitrum and Base"), preferences ("focus on developer traction"), and prior research findings across separate sessions.
 
-**Note:** This is an enrichment pattern. The main progression continues from Pattern 03 to Pattern 05. Skip this if your priority is distributed architecture.
+**Team focus:** Team 1 (Intelligence) -- same 5 agents, now augmented with episodic and semantic memory.
+
+**Architecture:**
+
+```mermaid
+graph TD
+    User --> FastAPI["Agent Service\n(FastAPI :8000)"]
+    FastAPI --> Pipeline["LangGraph Pipeline\n+ Checkpointer + Store"]
+    ClaudeDesktop["Claude Desktop"] -->|MCP| MCP["crypto-intelligence\nMCP (:8001)"]
+    MCP --> Pipeline
+    Pipeline --> PG["PostgreSQL\n(checkpoints + BaseStore)"]
+    Pipeline --> Honcho["Honcho\n(memory service)"]
+    Pipeline --> CoinGecko["CoinGecko API"]
+    Pipeline --> DDG["DuckDuckGo\n(web search)"]
+```
 
 **Key concepts:**
 
-- Memory Refiner agent (consolidates and prunes the knowledge base)
-- Fact TTL: timestamped facts with expiration policies
-  - Price data: 1-hour TTL
-  - News: 7-day TTL
-  - Project fundamentals: no expiration
-- Hierarchical memory tiers:
-  - Working memory (current conversation context)
-  - Episodic memory (past research sessions)
-  - Semantic memory (consolidated, long-term knowledge)
-- Memory compaction strategies
+- `PostgresStore` / `BaseStore` for cross-thread memory
+- User memory namespaces such as tracked coins, watchlists, and research preferences
+- Project memory namespaces such as prior summaries, open risks, and last-reviewed timestamps
+- Incremental research: query planning informed by previous findings
+- Honcho as a production-oriented memory service for agent and user representations
+- Memory freshness policies: separate stable facts from volatile market data
 
 **Builds on:** Pattern 03
 
@@ -310,7 +319,7 @@ graph TD
 
 **libs/common additions:** `agent_common.a2a` -- A2A protocol client/server helpers
 
-**Builds on:** Pattern 03
+**Builds on:** Pattern 04
 
 ---