Beta → 1.0 hardening + data-loss repair (24 hardening items + C1-C6)#10
Conversation
…ework B1: get_connection sets WAL journal mode and 10s busy_timeout for concurrent hook access. B2: PRAGMA user_version-based sequential migration mechanism (_MIGRATIONS, _run_migrations) replaces the ad-hoc ALTER TABLE; new DBs stamp the latest version directly, existing DBs apply pending migrations transactionally. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…fig errors H4: search connections close on exception paths. H6: claude --print TimeoutExpired still cleans up side-effect .jsonl files. P2: indexer reads incrementally and skips parsing rows at or before last_ply_end. Q5: config catches specific exception types and hoists the sys import. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
H3: loci server start verifies PID liveness + ping before launching, and cleans up stale socket/PID files. H5: client and server read until newline so responses larger than the recv buffer are not truncated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… (Lane A) B3: save_palace_object writes atomically (rollback leaves status pending). H1: distill_status column (migration v2) with pending/skipped/distilled; status command distinguishes all three. B5: meta table (migration v3) records embedding_model and prompt_version (prompt sha256); check_drift warns in index/distill/search/status. P1: indexes on rooms/symbols/palace_objects FKs (migration v4). H8: memory.db chmod 0o600. P3: tree-sitter resolver/cache reuse across a distill batch. Q2: datetime.now(UTC). Q3: distill_all returns (count, errors). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
B4: _write_settings writes via a temp file + os.replace with a .json.bak backup, so a crash mid-write never corrupts settings.json. install_hooks now routes through it. H7: uninstall_hooks removes only codeatrium hooks (Stop index, SessionStart server/distill/prime, legacy SessionEnd distill), preserves user hooks, prunes emptied entries/sections, and is idempotent; exposed as loci hook uninstall. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…n (H2, Q8) H2: distill now holds an fcntl.flock(LOCK_EX|LOCK_NB) on distill.lock; a second invocation hits BlockingIOError and exits 0. The OS releases the lock on process death, so the stale-PID detection and re-acquire dance is gone. Q8: embedding bytes come from ndarray.astype(float32).tobytes() instead of struct.pack, dropping the struct import. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Drop user_content/agent_content from the default `loci context --json`
output and add verbatim_ref ("{source_path}:ply={ply_start}"), making it
symmetric with `loci search`. Full conversation text is now opt-in via the
new --full flag or fetchable through `loci show <verbatim_ref>`.
- context SQL now JOINs conversations to fetch source_path + ply_start
- default JSON emits 9 fields: symbol_name, symbol_kind, file_path,
signature, line, exchange_id, exchange_core, specific_context, verbatim_ref
- --full restores user_content/agent_content
- human output shows verbatim_ref instead of full text
- update agent-facing docs (prime_cmd.py, CLAUDE.md) to point at
`loci show <verbatim_ref>` for full text
- add tests/test_search_cmd.py covering the output contract
openspec: repair-distill-and-revive-context (item C5)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (C1-C4,C6) Implements the C-core lane of repair-distill-and-revive-context. C1 — indexer captures tool_use file paths. parse_exchanges now collects file_path/notebook_path from assistant Edit/Write/Read/MultiEdit/NotebookEdit tool_use blocks into the new exchange_files(exchange_id, file_path) table (migration v5 + new-DB executescript). External paths (site-packages, node_modules, venv) are excluded at capture time. distiller's files_touched uses exchange_files as the primary source and the regex extraction as fallback. C2 — symbols are now many-to-many. symbols.id becomes sha256(symbol_name:file_path:palace_object_id) with dedup_hash kept as sha256(symbol_name:file_path), so one symbol can link to every conversation that discusses it. Migration v6 rebuilds existing symbol ids. C3 — repair migration v7: rebuilds palace_objects to drop a residual legacy bm25_text column, resets distill_status to pending for exchanges marked distilled but lacking a palace_objects row (so they re-distill), and deletes orphan rooms/vec_palace/symbols rows. C4 — save_palace_object no longer relies on INSERT OR IGNORE. It uses explicit existence checks for dedup and verifies the palace_objects row exists after insert, raising on failure so distill_status stays pending instead of silently losing distilled (and billed) results. C6 — only symbols whose name appears in the exchange body (user_content + agent_content) are linked, cutting reverse-lookup noise. Tests added across test_indexer/test_distiller/test_db; full suite green (typecheck 0 errors, ruff clean). The unrelated pre-existing test_status_hook::test_prime_outputs_instructions failure belongs to the C5 lane and is untouched here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…laude (Q1,Q6,Q7) Q1: extract the duplicated _sha256 helper into codeatrium.utils.sha256; distiller and indexer import it instead of each defining their own. Q6: name the server-start polling magic number (_SERVER_STARTUP_POLL_ATTEMPTS) in server_cmd.py. Q7: add tests/test_llm.py covering call_claude command flags, JSON parsing, and side-effect .jsonl cleanup on both success and TimeoutExpired. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The test relied on an ambient .codeatrium/ in the cwd, so it passed locally (repo has one) but failed in CI's clean checkout where loci prime exits silently. Set up a tmp .codeatrium/ and chdir into it so the test no longer depends on the working directory. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Code reviewTwo bugs found and validated. Bug 1: Ply-skip boundary mismatch after malformed JSON linesFile: Codeatrium/src/codeatrium/indexer.py Lines 195 to 205 in a4b7425 The skip loop increments Impact: For any existing DB whose already-indexed region contains malformed JSON lines, the skip boundary fires one position too early per bad line, pushing a previously-indexed entry back into the parse region. The newly-stored Suggested fix: Only advance ```python Bug 2: WAL sidecar files not chmod 0o600 — sensitive data potentially world-readableFile: Codeatrium/src/codeatrium/db.py Lines 196 to 208 in a4b7425 `init_db` calls `os.chmod(db_path, 0o600)` only on the main `memory.db` file, but `get_connection` enables WAL mode (`PRAGMA journal_mode=WAL`), which causes SQLite to create `memory.db-wal` and `memory.db-shm` sidecar files containing the same verbatim conversation and code data. These sidecar files inherit the process umask (commonly `0o644`), leaving them readable by other local users on a shared system. Suggested fix: ```python |
Bug 1 (indexer): the incremental skip loop counted every non-empty line including malformed JSON, but last_ply_end is in successfully-parsed-line coordinates. A malformed line in the already-indexed region shifted the skip boundary one position early, drifting ply_end on each run. Now the skip region validate-parses lines so malformed ones don't occupy a position, matching the stored coordinate system. Adds a regression test. Bug 2 (db): init_db chmod'd only memory.db, but WAL mode creates memory.db-wal / -shm sidecars holding the same verbatim data with umask perms (often 0o644). Now chmod the sidecars to 0o600 too. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…te examples Implement openspec change improve-prime-injection (P1-P3 + dual-source consolidation): - P1: rewrite triggers from user-question-driven to agent-action-driven (before edit/refactor, before new implementation, on known errors) - P2: promote loci context (reverse lookup) to an independent section, spelling out the symbol-to-memory recall design intent - P3: embed concrete command examples for loci search and loci context - Collapse CLAUDE_MD_SECTION to a minimal redirect; PRIME_TEXT becomes the single source of behavioral guidance - Add tests/test_prime_cmd.py covering the PRIME_TEXT contract and inject_claude_md idempotency - Re-inject the CLAUDE.md marker block with the new section P4 (when-not-to-search guidance) deliberately omitted per review. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> # Conflicts: # CLAUDE.md # src/codeatrium/cli/prime_cmd.py
Implements three openspec changes: reliability/safety/performance hardening toward the 1.0.0 release, repair of an ongoing distillation data-loss incident, and a rewrite of the
loci primeinjected prompt to drive agent-initiated memory usage.release-1.0-hardening — 24/24
user_version-based migration mechanism / B3 transactional distillation / B4 atomic settings.json writes +.bakbackup / B5metatable (model/prompt versions) + drift warningsdistill_status(pending/skipped/distilled) / H2 flock-based distill lock / H3 server double-start prevention / H4 search connection leak fix / H5 full socket reads / H6 .jsonl cleanup on claude timeout / H7loci hook uninstall/ H8memory.db0o600datetime.now(UTC)/ Q3 error reporting indistill_all/ Q4 assert→raise / Q5 finer-grained config exceptions / Q6 constant naming / Q7 llm.py tests / Q8tobytes()repair-distill-and-revive-context — 6/6
exchange_files) / C2 many-to-many symbols / C3 repair migration (reset lost distillations to pending + orphan cleanup + dropbm25_text) / C4 drop INSERT OR IGNORE, verify persistence / C5 lighterloci contextoutput +--full/ C6 link only symbols appearing in conversation bodiesimprove-prime-injection — P1-P3
Rewrites the prompt that
loci primeinjects at session start, so agents actually reach forloci search/loci contexton their own (real-world logs showed near-zero spontaneous usage):loci context(reverse lookup) promoted to an independent section alongsideloci search, with the design intent spelled out: touching a symbol = recalling memory about that symbolloci search "BM25 RRF fusion ranking",loci context --symbol "SymbolResolver.extract")CLAUDE_MD_SECTIONcollapsed to a minimal redirect —PRIME_TEXTis now the single source of behavioral guidance, removing the duplicated-and-drifting wording between the twotests/test_prime_cmd.py: 8 tests covering the PRIME_TEXT contract andinject_claude_mdidempotencyVerification
bm25_textcolumn droppedNotes
🤖 Generated with Claude Code