A local-first semantic search system for personal documents with Claude Code integration via MCP.
pipx - preferred!
pipx install okbOr pip:
pip install okb# 1. Start the database
okb db start
# 2. (Optional) Deploy Modal embedder for faster batch ingestion
okb modal deploy
# 3. Ingest your documents
okb ingest ~/notes ~/docs
# 4. Configure Claude Code MCP (see below)| Command | Description |
|---|---|
okb db start |
Start pgvector database container |
okb db stop |
Stop database container |
okb db status |
Show database status |
okb db migrate [name] |
Apply pending migrations (optionally for specific db) |
okb db list |
List configured databases |
okb db destroy |
Remove container and volume (destructive) |
okb db snapshot save [name] |
Create database snapshot (default: timestamp) |
okb db snapshot list |
List available snapshots |
okb db snapshot restore <name> |
Restore from snapshot (creates pre-restore backup) |
okb db snapshot restore <name> --no-backup |
Restore without pre-restore backup |
okb db snapshot delete <name> |
Delete a snapshot |
okb ingest <paths> |
Ingest documents into knowledge base |
okb ingest <paths> --local |
Ingest using local GPU/CPU embedding (no Modal) |
okb serve |
Start MCP server (stdio, for Claude Code) |
okb serve --http |
Start HTTP MCP server with token auth |
okb watch <paths> |
Watch directories for changes |
okb config init |
Create default config file |
okb config show |
Show current configuration |
okb config path |
Print config file path |
okb modal deploy |
Deploy GPU embedder to Modal |
okb token create |
Create API token for HTTP server |
okb token list |
List tokens for a database |
okb token revoke [TOKEN] --id <n> |
Revoke token by full value or ID |
okb sync list |
List available API sources (plugins) |
okb sync list-projects <source> |
List projects from source (for config) |
okb sync run <sources> |
Sync data from external APIs |
okb sync auth <source> |
Interactive OAuth setup (e.g., dropbox-paper) |
okb sync status |
Show last sync times |
okb rescan |
Check indexed files for changes, re-ingest stale |
okb rescan --dry-run |
Show what would change without executing |
okb rescan --delete |
Also remove documents for missing files |
okb llm status |
Show LLM config and connectivity |
okb llm deploy |
Deploy Modal LLM for open model inference |
okb llm clear-cache |
Clear LLM response cache |
okb synthesize run |
Generate knowledge synthesis proposals |
okb synthesize run --dry-run |
Preview what would be sampled |
okb synthesize pending |
List pending synthesis proposals |
okb synthesize approve <id> |
Approve a synthesis proposal |
okb synthesize reject <id> |
Reject a synthesis proposal |
okb synthesize review |
Interactive review loop (A/E/R/S/Q) |
okb synthesize analyze |
Analyze database and update description/topics |
okb synthesize analyze --stats-only |
Show stats without LLM call |
okb schedule add <source> <interval> |
Schedule periodic sync via systemd timer |
okb schedule remove <source> |
Remove a scheduled sync timer |
okb schedule list |
List all active sync timers |
okb service install |
Install systemd user services for background operation |
okb service uninstall |
Remove systemd user services |
okb service status |
Show service status |
okb service start |
Start okb services |
okb service stop |
Stop okb services |
okb service restart |
Restart services (use after upgrading okb) |
okb service logs [-f] |
Show service logs (optionally follow) |
Configuration is loaded from ~/.config/okb/config.yaml (or $XDG_CONFIG_HOME/okb/config.yaml).
Create default config:
okb config initExample config:
databases:
personal:
url: postgresql://knowledge:localdev@localhost:5433/personal_kb
default: true # Used when --db not specified (only one can be default)
managed: true # okb manages via Docker
work:
url: postgresql://knowledge:localdev@localhost:5433/work_kb
managed: true
docker:
port: 5433
container_name: okb-pgvector
chunking:
chunk_size: 512
chunk_overlap: 64Use --db <name> to target a specific database with any command.
Environment variables override config file settings:
OKB_DATABASE_URL- Database connection stringOKB_DOCKER_PORT- Docker port mappingOKB_CONTAINER_NAME- Docker container nameOKB_SERVER_URL- Remote server URL (overrides default server)OKB_TOKEN- Remote server token (overrides default server)
Config file permissions: Config files must be mode 0600 (not readable by group/other) since they may contain secrets. OKB checks on load and errors if too open.
Override global config per-project with .okbconf.yaml (searched from CWD upward):
# .okbconf.yaml
default_database: work # Use 'work' db in this project
extensions:
skip_directories: # Extends global list
- test_fixturesMerge: scalars replace, lists extend, dicts deep-merge.
Connect to remote OKB HTTP servers:
servers:
personal:
url: http://localhost:8080/mcp
token: ${OKB_PERSONAL_TOKEN}
default: true
work:
url: http://work-host:8080/mcp
token: ${OKB_WORK_TOKEN}Only one server can be default: true. If none is marked, the first is used.
Local config can override the default server per-project:
# .okbconf.yaml
default_server: workDatabases can override global plugin source configs (full replacement per source, no merge):
databases:
work:
url: postgresql://...
managed: true
sources:
github:
enabled: true
token: ${WORK_GITHUB_TOKEN}
todoist:
enabled: falseEnable LLM-based document classification, filtering, and synthesis:
llm:
provider: claude # "claude", "modal", or null (disabled)
model: claude-haiku-4-5-20251001
timeout: 30
cache_responses: trueProviders:
| Provider | Setup | Cost |
|---|---|---|
claude |
export ANTHROPIC_API_KEY=... |
~$0.25/1M tokens |
modal |
okb llm deploy |
~$0.02/min GPU |
Modal LLM Setup (no API key needed, runs on Modal's GPUs):
llm:
provider: modal
model: microsoft/Phi-3-mini-4k-instruct # Recommended: no gatingNon-gated models (work immediately):
microsoft/Phi-3-mini-4k-instruct- Good quality, 4K contextQwen/Qwen2-1.5B-Instruct- Smaller/faster
Gated models (require HuggingFace approval + token):
meta-llama/Llama-3.2-3B-Instruct- Requires accepting license at HuggingFace- Setup:
modal secret create huggingface HF_TOKEN=hf_...
Deploy after configuring:
okb llm deployPre-ingest filtering - skip low-value content during sync:
plugins:
sources:
dropbox-paper:
llm_filter:
enabled: true
prompt: "Skip meeting notes and drafts"
action_on_skip: discard # or "archive"LLM-based synthesis generates topic summaries, entity profiles, and cross-cutting insights from your knowledge base:
okb synthesize run # Generate synthesis proposals
okb synthesize run --project myproject # Scope to specific project
okb synthesize run --max-proposals 5 # Limit proposals
okb synthesize run --dry-run # Preview what would be sampled
okb synthesize pending # List pending proposals
okb synthesize approve <id> # Approve → creates searchable document
okb synthesize reject <id> # Reject proposal
okb synthesize review # Interactive review (A/E/R/S/Q)Analyze the knowledge base to generate/update description and topics:
okb synthesize analyze # Analyze and update metadata
okb synthesize analyze --stats-only # Show stats without LLM call
okb synthesize analyze --project myproj # Analyze specific projectCLI commands:
okb llm status # Show config and connectivity
okb llm deploy # Deploy Modal LLM (for provider: modal)
okb llm clear-cache # Clear response cacheAdd to your Claude Code MCP configuration:
{
"mcpServers": {
"knowledge-base": {
"command": "okb",
"args": ["serve"]
}
}
}First, start the HTTP server and create a token:
# Create a token
okb token create --db default -d "Claude Code"
# Output: okb_default_rw_a1b2c3d4e5f6g7h8
# Start HTTP server
okb serve --http --host 0.0.0.0 --port 8080The server supports two transports:
Streamable HTTP (primary, RFC 9728 compliant):
POST /mcp- Send JSON-RPC messages, receive SSE responseGET /mcp- Establish SSE connection for server notificationsDELETE /mcp- Terminate session/sseis an alias for/mcp
Legacy SSE (for older MCP clients):
GET /legacy/sse- Establish SSE streamPOST /legacy/messages- Send JSON-RPC messages
Configure your MCP client to connect:
{
"mcpServers": {
"knowledge-base": {
"type": "sse",
"url": "http://localhost:8080/mcp",
"headers": {
"Authorization": "Bearer okb_default_rw_a1b2c3d4e5f6g7h8"
}
}
}
}| Tool | Purpose |
|---|---|
search_knowledge |
Semantic search with natural language queries |
keyword_search |
Exact keyword/symbol matching |
hybrid_search |
Combined semantic + keyword (RRF fusion) |
get_document |
Retrieve full document by path |
list_sources |
Show indexed document stats |
list_projects |
List known projects |
list_documents_by_project |
List all documents for a specific project |
get_project_stats |
List projects with document counts |
rename_project |
Rename a project across all documents |
set_document_project |
Set/clear project for a single document |
recent_documents |
Show recently indexed files |
save_knowledge |
Save knowledge from Claude (source_type: claude-note or synthesis) |
update_knowledge |
Update an existing document in-place |
delete_knowledge |
Delete any document by source path |
get_actionable_items |
Query tasks/events with structured filters |
get_database_info |
Get database description, topics, and stats |
set_database_description |
Update database description/topics (LLM can self-document) |
add_todo |
Create a TODO item in the knowledge base |
trigger_sync |
Sync API sources (runs in background, use list_sync_sources to check) |
trigger_rescan |
Check indexed files for changes and re-ingest (background) |
list_sync_sources |
List API sync sources with status (idle/running/error) |
ingest_documents |
Ingest pre-parsed documents (chunking, embedding, storage) |
analyze_knowledge_base |
Analyze content and generate description/topics |
get_synthesis_samples |
Get document samples and stats for LLM-driven synthesis |
synthesize_knowledge |
Analyze DB and propose synthetic knowledge documents |
list_pending_synthesis |
List pending synthesis proposals |
approve_synthesis |
Approve a proposal, creating a searchable document |
reject_synthesis |
Reject a pending synthesis proposal |
edit_pending_synthesis |
Edit a proposal before approve/reject |
save_snapshot |
Create database snapshot for backup |
list_snapshots |
List available database snapshots |
restore_snapshot |
Restore database from snapshot |
Claude.ai requires OAuth 2.1 for MCP server connections. The oauth/ directory contains a
Cloudflare Worker that bridges OAuth 2.1 to OKB's bearer token auth:
Claude.ai ──OAuth 2.1──▶ Cloudflare Worker ──Bearer token──▶ OKB HTTP server
│
GitHub login → maps to pre-existing OKB token
See oauth/README.md for setup instructions.
Documents are chunked with context for better retrieval:
Document: Django Performance Notes
Project: student-app ← inferred from path or frontmatter
Section: Query Optimization ← extracted from markdown headers
Topics: django, performance ← from frontmatter tags
Content: Use `select_related()` to avoid N+1 queries...
---
tags: [django, postgresql, performance]
project: student-app
category: backend
---
# Your Document Title
Content here...OKB supports plugins for custom file parsers and API data sources (GitHub, Todoist, etc).
# File parser plugin
from okb.plugins import FileParser, Document
class EpubParser:
extensions = ['.epub']
source_type = 'epub'
def can_parse(self, path): return path.suffix.lower() == '.epub'
def parse(self, path, extra_metadata=None) -> Document: ...
# API source plugin
from okb.plugins import APISource, SyncState, Document
class GitHubSource:
name = 'github'
source_type = 'github-issue'
def configure(self, config): ...
def fetch(self, state: SyncState | None) -> tuple[list[Document], SyncState]: ...In your plugin's pyproject.toml:
[project.entry-points."okb.parsers"]
epub = "okb_epub:EpubParser"
[project.entry-points."okb.sources"]
github = "okb_github:GitHubSource"# ~/.config/okb/config.yaml
plugins:
sources:
github:
enabled: true
token: ${GITHUB_TOKEN} # Resolved from environment
repos: [owner/repo1, owner/repo2]
todoist:
enabled: true
token: ${TODOIST_TOKEN}
include_completed: false # Sync completed tasks
completed_days: 30 # Days of completed history
include_comments: false # Include task comments (1 API call per task)
project_filter: [] # List of project IDs (use sync list-projects to find)
dropbox-paper:
enabled: true
# Option 1: Refresh token (recommended, auto-refreshes)
app_key: ${DROPBOX_APP_KEY}
app_secret: ${DROPBOX_APP_SECRET}
refresh_token: ${DROPBOX_REFRESH_TOKEN}
# Option 2: Access token (short-lived, expires after ~4 hours)
# token: ${DROPBOX_TOKEN}
folders: [/] # Optional: filter to specific foldersDropbox Paper OAuth Setup:
okb sync auth dropbox-paperThis interactive command will guide you through getting a refresh token from Dropbox.
MIT