VoxCore

Local-only retrieval, citation-precision, and synthesis software for high-stakes evidence work.

VoxCore is single-machine software that ingests heterogeneous evidence (email archives, PDFs, audio, scanned images, office docs), indexes it across keyword + vector + knowledge-graph channels, and answers questions with verbatim-quoted citations that are deterministically verifiable against the source. It was built solo on personally-owned hardware to address the working set of one operator's privileged legal case; the architecture is general.

The differentiated capability is forensically-defensible citation: every quote in an answer is paired with its source path, and a substring verifier confirms the quote exists verbatim in that source. Quotes the model invents are deterministically flagged FABRICATED at scoring time AND prevented from shipping by a verify-retry loop. The pipeline also runs an in-pipeline CONTRADICTS Auditor (Sonnet 4.6) that holds answers when the model's quote opposes its own claim. As of 2026-05-02 the system delivers an answer with 16.7% hallucination on shipped (n=35 held-out, v4 with auditor) OR refuses to deliver and flags for human review — 80% delivery rate, 0 silent CONTRADICTS, 0 fabricated quotes shipped.

What this is

A single-machine appliance with three layers stacked on top of each other:

Ingest. tools/extract_cache.py, tools/bulk_extract.py, tools/excluded_daemon/ watch a folder tree, extract text from PDFs / DOCX / EML / MSG / images / audio (Whisper-large-v3) / mbox archives, run a governance gate (PII + classification-marker + sealing-order detection), and write to a CAS-style cache.
Index. tools/excluded_fts_build.py (SQLite FTS5), tools/rag_build.py (ChromaDB + nomic-embed-text), and tools/excluded_daemon/kg/ (knowledge graph, 24K entities / 175K mentions / 743K relations) build three independent retrievers over the cached text. tools/excluded_hybrid_search.py fuses them via Reciprocal Rank Fusion with adaptive boosts.
Answer. .claude/commands/ex-ask.md is a multi-agent skill that fans out to typed retrieval agents, synthesizes an answer under one-quote-per-claim discipline, and emits citations. tools/citation_scorer.py measures hallucination rate by extracting claims, verifying citation paths exist in the corpus, substring-verifying inline quotes, and (optionally) running a Claude-Opus-4.7 LLM-as-judge over each (claim, quote) pair to score span correctness.

The MCP fleet (.mcp.json — voxcore-db, voxcore-server, arcanum, docs-rag, local-llm) exposes corpus + database + LLM access to Claude Code over stdio. See docs/architecture/MCP_TRANSPORT.md.

Who this is for

The intended user is the operator who owns the corpus. There is no multi-tenant story, no remote API, no hosted deployment. Production = development = operator's workstation by design — see docs/DEPLOYMENT_MODEL.md for the full rationale and the cost of going hosted.

If you are evaluating this for an acquihire or technical diligence: read docs/architecture/decisions/ (7 ADRs), docs/COST_AND_LATENCY_BENCHMARKS.md, and the Desktop verification artifacts (VoxCore_Verification_Master_Checklist.md, VoxCore_Decisions_Log.md, VoxCore_Benchmark_Results.md).

Measured numbers (current, as of 2026-05-02 evening)

Metric	Value	Test set	Judge	Evidence
Hybrid retrieval pass rate	92% (46/50)	50-query suite	n/a (deterministic)	`quality_probe_20260430_191844.json`
Citation precision (path-level)	100% (302/302)	n=30 batch	n/a (deterministic FTS lookup)	`citation_score_n30_20260502.json`
Hallucination rate, v4 shipped-only	16.7%	n=35 held-out (28 shipped, 7 held)	claude-opus-4-7	`citation_score_holdout_n35_v4_claudejudge_20260502_142347.json`
Hallucination rate, v2 all-shipped	24.7%	n=35 held-out (35 shipped)	claude-opus-4-7	`citation_score_holdout_n35_v2_claudejudge_20260502_113446.json`
FABRICATED quotes shipped (with verify-retry)	0	n=35 held-out	substring verifier	same
Silent CONTRADICTS shipped (with v4 auditor)	0	n=35 held-out	substring + Sonnet 4.6 auditor	same
Coverage (deliverable, v4)	80% (28/35)	n=35 held-out	auditor 0.70 threshold	same
FABRICATED detection rate (catch at scoring)	100%	n=35 held-out (pre-v2 measurement)	substring verifier	`citation_score_holdout_n35_claudejudge_20260502_074107.json`
Multi-hop coverage	33% (4/12)	n=12 held-out, mixed hop types	n/a	`citation_score_multihop_n12_claudejudge_20260502_140536.json`
Multi-hop on-coverage hallucination	39.6%	same	claude-opus-4-7	same
Audio cross-instance WER	0.59%	26 audio files	n/a	`wer_cross-instance_20260502_031916.json`
OCR character accuracy (avg CER)	24.26%	10 random PDFs	n/a	`ocr_accuracy_20260502_032335.json`
LegalBench overall (PROVEN tier)	66.4% (166/250)	n=50/task, 5 tasks	claude-opus-4-7 (free-text), string-match (binary)	`legalbench_n50_claudejudge_20260502_135847.json`
Throughput, PDF (cold-cache)	12,033 files/hour	10 random PDFs	n/a	`throughput_pdf_image_20260502_115747.json`
Cost per fully-judged query (v4)	~$0.24	n=35 (synthesis + auditor + judging)	claude-opus-4-7	`docs/COST_AND_LATENCY_BENCHMARKS.md`
Latency, synthesis only (v4)	p50=6.1s, p95=12.3s	n=35 sequential	n/a	same

Methodology discipline (encoded in ~/.claude/projects/C--Users-atayl-VoxCore/memory/feedback_calibration_overfit.md):

Test sets must be held out from pipeline development. A 0% hallucination rate on the calibration batch was 30% on a held-out batch of fresh queries.
Every published quality number must specify the judge model. Same answers, Gemma judge → 45.5%; Claude Opus judge → 30.3%.
Roadmap predictions calibrated against an inflated baseline are themselves inflated.

Tech stack

Component	Choice	ADR
Orchestration	Triad (Gemini 3.1 Pro Architect → Claude Opus 4.7 Executor → Gemini 3.1 Pro Auditor, fail-closed, 3-retry)	`docs/architecture/decisions/0001-triad-orchestration.md`
Tool integration	MCP (Model Context Protocol), stdio transport	`docs/architecture/decisions/0002-mcp-first-protocol.md`, `docs/architecture/MCP_TRANSPORT.md`
Local compute	Ollama (gemma4:26b, qwen3.5:27b, nomic-embed-text), Whisper-large-v3 (audio), Tesseract 5.4 (OCR)	`docs/architecture/decisions/0003-local-gpu-offload.md`
Governance	Pre-ingest filename gate + post-extraction content scan (PII/credentials/classification markers/sealing orders)	`docs/architecture/decisions/0004-governance-gate.md`
Citation precision	Inline-grounded verbatim quotes + substring verifier + LLM-as-judge for span correctness	`docs/architecture/decisions/0005-citation-precision-pipeline.md`
PDF extraction	pdfplumber (MIT) + pypdfium2 (Apache 2.0) — chosen over PyMuPDF (AGPL)	`docs/architecture/decisions/0006-pdfplumber-pypdfium2-over-pymupdf.md`
Retrieval fusion	Reciprocal Rank Fusion across FTS5 + ChromaDB + KG, k=60, adaptive boosts	`docs/architecture/decisions/0007-hybrid-retrieval-rrf.md`
Chunking	Three independent fixed-size chunkers, NOT semantic — for determinism and citation stability	`docs/architecture/CHUNKING_STRATEGY.md`

Hardware (the production runtime)

Documented in docs/ENVIRONMENT.md. Ryzen 9 9950X3D / RTX 5090 32GB / 128GB DDR5 / NVMe. Windows 11 Pro, Python 3.14.3. All personally-owned. No government-furnished equipment, no .mil network, no employer-paid subscriptions — see docs/acquihire/03_IP_Chain_of_Title/02_Subscriptions/subscription_summary.md.

License

VoxCore-specific code is MIT-equivalent (single author, no employer claims, no government code). The TrinityCore subtree under src/server/ is GPL-2.0 inherited (legacy WoW-server work that lives in the same repo for historical reasons; not part of the AI/retrieval product). Third-party Python deps were audited 2026-05-02 — all AGPL/GPL dependencies replaced; see docs/acquihire/03_IP_Chain_of_Title/04_Open_Source_Inventory/license_remediation.md.

Diligence-grade documentation map

If you read these in order you'll have the full picture in under an hour:

README.md (this file) — what it is, who it's for, what's measured
docs/ENVIRONMENT.md — hardware, OS, Python, GPU, models, databases
docs/DEPLOYMENT_MODEL.md — explicit local-only decision, cost-of-reversal
docs/architecture/decisions/README.md — index of 7 ADRs covering the non-obvious choices
docs/architecture/MCP_TRANSPORT.md — MCP server fleet, transport, statefulness, error handling
docs/architecture/CHUNKING_STRATEGY.md — three chunkers, why fixed not semantic
docs/COST_AND_LATENCY_BENCHMARKS.md — measured per-query cost and latency by role
docs/architecture/TRIAD_ENTRY_POINT.md — orchestrator entry, fail-closed enforcement, model selection
Desktop: Do NOT Delete These/VoxCore_Verification_Summary_3page.md — diligence-grade 3-page leave-behind (PROVEN-tier numbers + methodology + IP/license posture)
Desktop: Do NOT Delete These/VoxCore_Economic_Impact_Analysis_v3.1.md — full Economic Impact analysis with measured numbers replacing the v2 PDF
Desktop: VoxCore_Verification_Master_Checklist.md — 106/171 verified, with evidence per item
Desktop: VoxCore_Benchmark_Results.md — measured-numbers ledger
Desktop: VoxCore_Decisions_Log.md — append-only decision audit trail

Building / running

The retrieval and citation pipeline is pure Python — no compiled component to build. Pinned dependencies live alongside each subproject (tools/ai_studio/requirements.pinned.txt, tools/voxcore-daemon/requirements.pinned.txt, etc. — there is no top-level tools/requirements.pinned.txt because each subproject has its own dependency surface).

# Install per-subproject (the subprojects you actually use):
pip install -r tools/ai_studio/requirements.pinned.txt        # for Triad orchestration
pip install -r tools/voxcore-daemon/requirements.pinned.txt   # for the ingest daemon
pip install -r tools/mcp-voxcore-db/requirements.pinned.txt   # for the MCP DB server

# Most retrieval/citation tooling has no separate requirements file — the imports
# resolve from any of the above (anthropic, requests, sqlite3 stdlib, etc.).

# MySQL is required for the legacy server subtree but not for the AI/retrieval product.
# For retrieval-only use, the SQLite + ChromaDB indices in .cache/ are self-contained.

# Build the FTS index (one-shot, ~30 min for 24K-doc corpus)
python tools/excluded_fts_build.py

# Build the vector index (one-shot, ~2 hr for 24K-doc corpus)
python tools/rag_build.py

# Build the KG (one-shot, ~6 hr for 24K-doc corpus — runs locally via Ollama)
python tools/excluded_daemon/kg/build.py

Day-to-day usage is via Claude Code with the .claude/commands/ex-ask.md skill. The MCP fleet (.mcp.json) loads automatically.

Status

v4 shipped-only hallucination rate: 16.7% (28 of 35 held-out queries delivered; 7 held with [AUDITOR_FAILED] for human review).
0 silent CONTRADICTS, 0 fabricated quotes shipped, 100% fabrication detection rate.
LegalBench n=50 + Claude judge: 66.4% overall (PROVEN tier, externally publishable).
Multi-hop coverage 33% on n=12 held-out (current bottleneck — the next major engineering target via per-claim re-retrieval).
Documented path to 8-12% held-out shipped hallucination via two queued Tier 2 fixes (per-claim re-retrieval; CONTRADICTS Auditor full implementation per spec).
The Tier 3 fine-tuned reranker for sub-2% is months out and requires a labeled training corpus.
Acquihire diligence: 106/171 master-checklist items verified with evidence (62%); 17 of the remainder are Adam's strategic / ethics / financial decisions (VoxCore_Open_Questions.md).

Credits

TrinityCore — legacy WoW-server subtree under src/server/ (separate from the AI/retrieval product)
Anthropic — Claude API used as Triad Executor and citation judge
Google — Gemini 3.1 Pro used as Triad Architect and Auditor
Ollama — local model runtime for embeddings, NER, and free-tier LLM-as-judge

Name		Name	Last commit message	Last commit date
Latest commit History 46,744 Commits
.agents		.agents
.circleci		.circleci
.claude		.claude
.codex		.codex
.github		.github
AI_Studio		AI_Studio
_canonical_state		_canonical_state
config		config
cowork		cowork
data		data
doc		doc
docs		docs
handoffs		handoffs
hotfix_audit		hotfix_audit
logs/audit		logs/audit
shortcuts		shortcuts
tests		tests
tools-dev/arcanum		tools-dev/arcanum
tools		tools
.agentrules		.agentrules
.claude.json		.claude.json
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
AUTHORS		AUTHORS
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
COPYING		COPYING
Claude_Triad_Instructions.md		Claude_Triad_Instructions.md
INSTALL		INSTALL
PreLoad.cmake		PreLoad.cmake
README.md		README.md
Setup-VoxCoreEnv.ps1		Setup-VoxCoreEnv.ps1
TrinityCore.natvis		TrinityCore.natvis
_build_ps.ps1		_build_ps.ps1
_clipath.txt		_clipath.txt
appveyor.yml		appveyor.yml
issue_template.md		issue_template.md
pull_request_template.md		pull_request_template.md
revision_data.h.in.cmake		revision_data.h.in.cmake

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoxCore

What this is

Who this is for

Measured numbers (current, as of 2026-05-02 evening)

Tech stack

Hardware (the production runtime)

License

Diligence-grade documentation map

Building / running

Status

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VoxCore

What this is

Who this is for

Measured numbers (current, as of 2026-05-02 evening)

Tech stack

Hardware (the production runtime)

License

Diligence-grade documentation map

Building / running

Status

Credits

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages