Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ data/modes.json
Thumbs.db
.env

# Canonical shared context root — all AI tool configs consolidated here
.context/

# Per-contributor Claude Code config (preview launcher, etc.)
.claude/

Expand Down Expand Up @@ -65,3 +68,6 @@ out/
*.deb
__pycache__/
bench/results-latest.json
bench/artifacts/
bench/data/
bench/fixtures/skills/
3 changes: 3 additions & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@ CONTEXT.md
server.out.log
server.err.log
*.log
bench/artifacts
bench/data
bench/fixtures/skills
248 changes: 248 additions & 0 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,248 @@
# Context Engine — Comprehensive Roadmap

> Updated: 2026-05-18. Author: Jeremy.
> What we've built, where we stand, and what ships next. Two-month horizon.

---

## Why This Exists

Context quality is the bottleneck in every AI workflow. Models are good and getting better — but the context you feed them is fragmented, stale, and unstructured. CE solves that: it ingests skills from anywhere, understands them semantically, deduplicates overlap, ranks quality, auto-selects the right subset per task, and delivers optimised context to any AI surface.

---

## Repo Layout — Canonical Context Root

All AI tool configs live in a single canonical location: `app/.context/`. Root-level dot-files and directories are NTFS junctions pointing into it.

```
app/.context/
├── claude/ → launch.json, settings.local.json
├── codex/ → instructions.md
├── continue/rules/ → context-engine.md
├── app-claude/ → launch.json (when running from app/)
├── instructions/ → AGENTS.md, CLAUDE.md
├── rules/ → cursorrules, windsurfrules, clinerules
└── kimi-system-prompt.md
```

Root-level junctions: `.claude/`, `.codex/`, `.continue/` all resolve to subdirectories of `.context/`. All file-based targets (`.clinerules`, `.cursorrules`, `.windsurfrules`, `.ampcoderc`, `.goosehints`, `.rules`, `AGENTS.md`, `CLAUDE.md`, `CONTEXT.md`, `CONVENTIONS.md`, `GEMINI.md`, `devin.md`, `.kimi-system-prompt.md`, `.github/copilot-instructions.md`) and remaining dirs (`.augment/`, `.junie/`, `.kiro/`, `.pearai/`, `.tmp/`, `.trae/`, `.void/`) are removed from root. **`app/.context/` is the single source of truth for every AI tool config.**

---

## What We've Shipped (v0.3.1)

### Architecture: 31 server/lib modules, ~14k files total

The backend is deep and production-grade. Everything written zero-dependency (Node builtins + Ollama HTTP).

### Phase 0 — Stabilised v3 Base

- Modular DRAM CSS with token system and lint guard
- Cleanup modal for tidy/overlap review with apply flow
- Local update detection with clickable update toast
- Release checklist covering syntax, line limits, Git scope

### P0 — Electron + TypeScript Quality Gate

- Strict TypeScript typecheck across server and renderer — zero errors
- ESLint with TS-aware rules, no-floating-promises enforcement
- All source files under 500/700 line limits
- Electron shell with main/preload/renderer boundary

### P0 — Windows Installer + Auto-Update

- NSIS installer + portable target via electron-builder
- Auto-update wired (8s after launch, then every 6h)
- Brand marks: icon/mono/simple SVGs, .ico, .icns, Linux set
- GitHub Actions release workflow (tag-driven + manual)
- First tagged release v0.2.1 cut and validated end-to-end
- **Blocked:** code signing certificate not procured (SmartScreen warnings)

### Phase 1 — Vector Foundation

- `chunker.js`: semantic heading/rule/knowledge/example parser with frontmatter
- `embeddings.js`: Ollama client for nomic-embed-text, batch embed, graceful fallback
- `vectorstore.js`: flat-file vector index, cosine search, <5ms for 500 chunks
- `POST /api/index`, `POST /api/index/skill/:id`, `POST /api/search`, `GET /api/index/status`
- UI: indexed chunk count, model, last indexed time

### Phase 2 — Dedup + Rank

- `dedup.js`: pairwise similarity clustering, Union-Find, duplicate/related thresholds
- `ranking.js`: specificity, coverage, source weight, freshness scoring
- `GET /api/dedup`, `POST /api/dedup/resolve` with reversible resolution history
- Quality Audit UI: duplicate clusters, low-specificity filler, side-by-side comparison

### Phase 3 — Smart Compile

- `POST /api/compile/smart`: task embedding → vector search → expand → budget-fit → compile
- Project-aware stack detection (package.json, README, Cargo.toml)
- Before-and-after token comparison vs manual All On mode
- Modes moving toward presets (Smart Preview on dashboard)

### Phase 1.5 (Promoted) — MCP Bridge to Daily Apps

- `mcp-server.mjs`: stdio transport, 4-tool contract (search, list_skills, get_skill, status)
- Remote Streamable HTTP MCP adapter with bearer-token auth
- Claude Desktop `.mcpb` wrapper bundle
- One-paste config snippets for Claude Desktop, Codex CLI
- Lifecycle: spawned by host app, independent of Electron shell
- Claude Desktop + Codex CLI validated end-to-end
- **Blocked:** ChatGPT remote connector needs HTTPS tunnel/hosting choice

### Handoffs — Full Lifecycle

- Project handoffs (repo-bound), thread handoffs (topic-bound), dual-bound
- Git staleness detection (auto-archive at 5+ commits)
- Thread staleness (archive after 14 days idle), purge after 30 days
- 9 REST endpoints + MCP tool + admin UI (peer to Memory tab)
- Migrated existing llm-handoff.md entries into structured format

### Projects — CRUD + Scoped Context

- Project directories with seed memory.json + rules.json
- Collision-safe slugs, directory lifecycle, path management
- REST endpoints + 91-pass smoke test

### PR #2 (James Chapman) — Priority Rules, Auth, Security

- Priority-based rules model (hard/soft per section) with auto-migration
- API token auth (bearer, 48-hex, encrypted at rest, fully opt-in)
- Rule-files CRUD (`data/rules/*.json`)
- CI workflow (typecheck + lint + lint:css)
- 12 new smoke test suites (backup, crypto, mode-apply, mutex, projects, ranking, security, validation, etc.)

### Benchmark v1.3 — Evidence Pack

- Report PDF + chart PNGs under `bench/charts/`
- Smart Compile token accounting fixed (same output surface comparison)
- Manifest chunking for skill name/description/trigger search
- **Not yet green:** quality gate (Recall@8, no-context paired comparison)

---

## Current State Summary

| Area | Status | Notes |
| ----------------------------- | --------------- | -------------------------------------------------------------------------- |
| Backend architecture | Shipped | 31 modules, zero-dependency |
| Vector index + search | Shipped | Works, needs quality tuning |
| Dedup + rank | Shipped | Exists but quality-gated |
| Smart Compile | Shipped | Token savings proven, quality unproven |
| MCP bridge (local) | Shipped | Claude Desktop + Codex validated |
| MCP bridge (remote) | Blocked | ChatGPT needs HTTPS tunnel decision |
| Handoffs | Shipped | |
| Projects | Shipped | |
| Priority rules | Shipped | PR #2 merged |
| API auth | Shipped | Opt-in, encrypted |
| Benchmark gates | **Not green** | Recall@8 ≠ 1.00, no-context not compared |
| System detection + onboarding | **Not started** | Broad spec written. Replaces old skill-sources + onboarding-redesign |
| `scanSystem()` backend | Exists (narrow) | Currently only SKILL.md dirs. Must broaden to rules/instructions/MCP/hosts |
| MCP discovery | **Not started** | |
| AIModelDB bridge | **Not started** | |
| Modes-as-presets | **Partial** | Smart Preview on dashboard, tab grid not replaced |
| Code signing cert | **Blocked** | SmartScreen warnings on installer |

---

## Two-Month Roadmap

### Week 1-2: Quality Gate (P0) — Make the Benchmark Honest

Before any new features, CE must prove it doesn't make answers worse.

| Task | Est. | Owner |
| -------------------------------------------------------------------------------------------- | ---- | ------ |
| Rebuild vector index + re-run v1.3 benchmark after manifest chunking | 2d | Jeremy |
| Add retrieval-quality smoke gate: expected-source Recall@8 = 1.00 | 2d | Jeremy |
| Add no-context paired quality gate: Smart/Search must beat or tie no-context | 2d | Jeremy |
| Hybrid reranking: vector score + lexical match on id/name/triggers/section | 3d | Jeremy |
| Fix any retrieval misses identified by the benchmark | 2d | Jeremy |
| Dashboard/reporting copy: "token reduction measured; quality gate pending" → remove warning | 1d | Jeremy |
| Promote multi-resolution packaging: manifest → relevant chunks → full skill only when needed | 3d | Jeremy |

**Gate:** Smoke CI fails if Recall@8 < 1.00 or quality drops below no-context.

### Week 3-4: System Detection Phase 1 + MCP Remote (P0/P1)

Two parallel streams. This replaces the old "Skill Sources" and "Onboarding Redesign" — now unified into System Detection.

**Stream A — MCP Remote (Jeremy):**
| Task | Est. |
|------|------|
| Choose HTTPS tunnel/hosting (e.g. Cloudflare Tunnel, ngrok, or $5 VPS) | 1d |
| Set `MCP_OAUTH_PASSWORD`, expose `/mcp` | 1d |
| Register URL in Claude/ChatGPT connector, validate `context_engine_status` | 1d |
| Document the setup for self-hosted users | 1d |

**Stream B — System Detection Backend (Jeremy or James):**
| Task | Est. |
|------|------|
| Rewrite `scanHostSkillPaths()` → `scanSystem()` with 5 categories (skills, rules, instructions, MCP, hosts) | 2d |
| Add probe functions: probeRuleFiles, probeInstructionFiles, probeMcpServers, probeHostConfigs | 2d |
| New endpoint: `GET /api/system/scan` — runs all probes, returns grouped results | 2d |
| New endpoints: `POST /api/system/link-all`, `POST /api/system/link`, `DELETE /api/system/link/:id` | 2d |
| New data model: `data/system-context.json` (tracks all linked sources with timestamps) | 1d |
| Refactor `server/lib/skills.js` → `findAllSkillDirs()` with sourceId for unified skill listing | 2d |

### Week 5-6: Onboarding + Import Pipeline (P1)

**Stream A — Onboarding Rewrite (Jeremy):**
| Task | Est. |
|------|------|
| 4-step scan → review → build → done flow. Step 1 runs `GET /api/system/scan` on open | 3d |
| Grouped result cards: skills count, rules count, instructions count, MCP count, hosts detected | 2d |
| "Link All" button links every unmanaged source in one click | 1d |
| Per-source Link/Unlink with inline feedback | 1d |
| Step 3: import + rebuild index with progress | 2d |
| CSS budget ≤ 250 lines, reuse DRAM tokens | 1d |

**Stream B — Link-Import Pipeline (James):**
| Task | Est. |
|------|------|
| Directory sources → NTFS junction into `app/.context/<category>/<id>` | 2d |
| File sources → copy into `app/.context/` | 1d |
| MCP server registration from detected configs | 2d |
| Auto-rebuild index after link/import | 1d |
| Per-source linking: collision-safe ID prefixing (`<sourceId>:<bareId>`) | 1d |
| Smoke tests: full scan → link → verify → unlink roundtrip | 2d |

### Week 7-8: Set & Forget + Polish (P1/P2)

**Jeremy:**
| Task | Est. |
|------|------|
| Periodic health check (24h timer re-scan, notify on new/changed sources) | 2d |
| Connections tab (post-onboarding source management UI, sibling to Skills/Memory/Handoffs) | 3d |
| Replace Modes tab grid with preset library | 2d |
| Smart Preview can promote selected skill sets into presets | 2d |
| Code signing certificate procurement + wiring | 2d |

**James:**
| Task | Est. |
|------|------|
| CE becomes sole author — compile writes to `app/.context/` AND root junctions (where they exist) | 3d |
| CE writes compiled output to tool root paths when no junction exists (file fallback) | 2d |
| Handoff rate-limit-aware heartbeat/update API | 1d |
| `context_engine_dedup_report` MCP tool | 2d |
| Validation + edge-case hardening across all system detection endpoints | 2d |

---

## Beyond Two Months (Next)

- **Phase 5 — AIModelDB Bridge:** model-aware compile budget, dashboard display, model comparison MCP tool
- **`context_engine_model_lookup` MCP tool**
- **Multi-platform native installers** (macOS dmg, Linux deb/rpm) — runners exist in the release workflow, just need testing
- **Plugin/skill marketplace** — network effects, the real moat

---

## Key Principles

1. **Quality gate is the door.** Nothing ships to users until the benchmark proves it doesn't make answers worse.
2. **Daily use is the signal.** If Jeremy wouldn't reach for it in Claude Desktop or Codex tomorrow morning, it waits.
3. **James audits and hardens.** Jeremy drives the main roadmap. James catches edge cases, adds tests, and prevents regressions.
4. **Zero new deps.** Node builtins + Ollama HTTP. This is a product principle, not an accident.
5. **Under 500 lines.** Every module stays under 500 lines soft limit, 700 absolute. New modules get split early.
Loading