Filesystem watcher that keeps a Qdrant vector store in sync with document changes.
jeeves-watcher monitors a configured set of directories for file changes, extracts text content, generates embeddings, and maintains a synchronized Qdrant vector store for semantic search. It automatically:
- Watches directories for file additions, modifications, and deletions
- Extracts text from various formats (Markdown, PDF, DOCX, HTML, JSON, plain text)
- Chunks large documents for optimal embedding
- Embeds content using configurable providers (Google Gemini, mock for testing)
- Syncs to Qdrant for fast semantic search
- Enriches metadata via rules and API endpoints
For detailed architecture documentation, see packages/service/guides/architecture.md.
npm install -g @karmaniverous/jeeves-watcherCreate a new configuration file in your project:
jeeves-watcher initThis generates a jeeves-watcher.config.json file with sensible defaults.
Edit jeeves-watcher.config.json to specify:
- Watch paths: Directories to monitor
- Embedding provider: Google Gemini or mock (for testing)
- Qdrant connection: URL and collection name
- Inference rules: Automatic metadata enrichment based on file patterns
Example minimal configuration:
{
"watch": {
"paths": ["./docs"],
"ignored": ["**/node_modules/**", "**/.git/**"]
},
"embedding": {
"provider": "gemini",
"model": "gemini-embedding-001",
"apiKey": "${GOOGLE_API_KEY}"
},
"vectorStore": {
"url": "http://localhost:6333",
"collectionName": "my_docs"
}
}jeeves-watcher startThe watcher will:
- Index all existing files in watched directories
- Monitor for changes
- Update Qdrant automatically
| Command | Description |
|---|---|
jeeves-watcher start |
Start the filesystem watcher (foreground) |
jeeves-watcher init |
Initialize a new configuration file |
jeeves-watcher status |
Show watcher status |
jeeves-watcher reindex |
Reindex all watched files |
jeeves-watcher rebuild-metadata |
Rebuild metadata files from Qdrant payloads |
jeeves-watcher search <query> |
Search the vector store |
jeeves-watcher enrich <path> |
Enrich document metadata with key-value pairs |
jeeves-watcher validate |
Validate the configuration |
jeeves-watcher service |
Manage the watcher as a system service |
jeeves-watcher scan |
Scan the vector store with filter-only queries |
jeeves-watcher config |
Query effective config via JSONPath |
jeeves-watcher issues |
Show indexing issues and errors |
jeeves-watcher helpers |
Show loaded map and template helpers |
jeeves-watcher config-apply |
Validate, write, and reload configuration from file |
Config strings support ${VAR_NAME} syntax for environment variable injection:
{
"embedding": {
"apiKey": "${GOOGLE_API_KEY}"
}
}If GOOGLE_API_KEY is set in the environment, the value is substituted at config load time. Set templates in inference rules use Handlebars {{...}} syntax (e.g. {{frontmatter.title}}), which is distinct from the ${...} environment variable syntax used in config values like embedding.apiKey.
{
"watch": {
"paths": ["./docs", "./notes"],
"ignored": ["**/node_modules/**", "**/*.tmp"]
}
}paths: Array of glob patterns or directories to watchignored: Array of patterns to excluderespectGitignore: (default:true) Skip processing files ignored by.gitignorein git repositories. Nested.gitignorefiles are respected within their subtree.moveDetection: (optional) Correlate unlink+add events as file moves to avoid re-embedding.enabled(default:true),bufferMs(default:2000) — how long to buffer unlink events before treating as deletes.
{
"embedding": {
"provider": "gemini",
"model": "gemini-embedding-001",
"apiKey": "${GOOGLE_API_KEY}"
}
}{
"vectorStore": {
"url": "http://localhost:6333",
"collectionName": "my_collection"
}
}Automatically enrich metadata based on file patterns using declarative JSON Schemas:
{
"schemas": {
"base": {
"type": "object",
"properties": {
"domain": {
"type": "string",
"description": "Content domain"
}
}
}
},
"inferenceRules": [
{
"name": "meeting-classifier",
"description": "Classify files under meetings directory",
"match": {
"properties": {
"file": {
"type": "object",
"properties": {
"path": { "type": "string", "glob": "**/meetings/**" }
}
}
}
},
"schema": [
"base",
{
"properties": {
"domain": { "set": "meetings" },
"category": { "type": "string", "set": "notes" }
}
}
]
}
]
}New in v0.5.0: Inference rules now use schema arrays that reference global named schemas. Type coercion automatically converts string interpolation results to declared types (integer, number, boolean, array, object). See Inference Rules Guide for details.
Chunking settings are configured under embedding:
{
"embedding": {
"chunkSize": 1000,
"chunkOverlap": 200
}
}Enrichment metadata (from POST /metadata or watcher_enrich) is stored in a SQLite database at <stateDir>/enrichments.sqlite. Enrichments survive full reindexes. Composable merge: scalar fields overwrite, array fields union+deduplicate with inference rule output.
{
"stateDir": ".jeeves-metadata"
}The watcher provides a REST API (default port: 1936):
| Endpoint | Method | Description |
|---|---|---|
/status |
GET | Health check, uptime, and collection stats |
/search |
POST | Semantic search ({ query: string, limit?: number, filter?: object }) |
/render |
POST | Render a file through inference rules ({ path: string }) (v0.8.0+) |
/search/facets |
GET | Schema-derived search facet definitions with live values (v0.8.0+) |
/metadata |
POST | Update document metadata with schema validation ({ path: string, metadata: object }) |
/reindex |
POST | Scoped reindex with blast area plan (issues, rules, full, path, prune + dryRun). path accepts string | string[]. |
/rebuild-metadata |
POST | Rebuild metadata files from Qdrant |
/config |
GET | Full resolved effective config; optional ?path=<jsonpath> filter. Rules include source attribution. |
/config/schema |
GET | JSON Schema of merged virtual document (v0.5.0+) |
/walk |
POST | Filesystem walk with glob intersection ({ globs: string[] }). Returns { paths, matchedCount, scannedRoots }. |
/config/match |
POST | Test paths against inference rules ({ paths: string[] }) (v0.5.0+) |
/issues |
GET | Current embedding failures and processing errors (v0.5.0+) |
/rules/register |
POST | Register virtual inference rules from an external source |
/rules/unregister |
DELETE | Remove all virtual rules from a source ({ source }) |
/rules/unregister/:source |
DELETE | Remove all virtual rules from a named source |
/scan |
POST | Filter-only point query with cursor pagination ({ filter, limit?, cursor?, fields?, countOnly? }) |
/config/validate |
POST | Validate a configuration without applying ({ config?, testPaths? }) |
/config/apply |
POST | Validate, write, and reload configuration ({ config }) |
/rules/reapply |
POST | Re-apply inference rules to files matching globs ({ globs }) |
/points/delete |
POST | Delete points matching a Qdrant filter ({ filter }) |
curl -X POST http://localhost:1936/search \
-H "Content-Type: application/json" \
-d '{"query": "machine learning algorithms", "limit": 5}'curl -X POST http://localhost:1936/search \
-H "Content-Type: application/json" \
-d '{
"query": "error handling",
"limit": 10,
"filter": {
"must": [{ "key": "domain", "match": { "value": "backend" } }]
}
}'curl -X POST http://localhost:1936/metadata \
-H "Content-Type: application/json" \
-d '{
"path": "/path/to/document.md",
"metadata": {
"priority": "high",
"category": "research"
}
}'This repo includes an OpenClaw plugin (packages/openclaw) that exposes the jeeves-watcher API as native agent tools:
| Tool | Description |
|---|---|
watcher_status |
Service health, uptime, and collection stats |
watcher_search |
Semantic search across indexed documents |
watcher_enrich |
Set or update document metadata |
watcher_config |
Query the effective runtime config via JSONPath |
watcher_walk |
Walk watched filesystem paths with glob intersection |
watcher_validate |
Validate a watcher configuration |
watcher_config_apply |
Apply a new configuration |
watcher_reindex |
Trigger a scoped reindex with blast area plan |
watcher_scan |
Filter-only point query with cursor pagination |
watcher_issues |
List indexing issues and errors |
The plugin integrates with @karmaniverous/jeeves core to manage workspace content (TOOLS.md, SOUL.md, AGENTS.md) via a ComponentWriter that refreshes every 71 seconds. See the OpenClaw Integration Guide for details.
Plugin configuration supports apiUrl (defaults to http://127.0.0.1:1936) and configRoot (defaults to j:/config).
- Markdown (
.md,.markdown) — with YAML frontmatter support - PDF (
.pdf) — text extraction - DOCX (
.docx) — Microsoft Word documents - HTML (
.html,.htm) — content extraction (scripts/styles removed) - JSON (
.json) — with smart text field detection - Plain Text (
.txt,.text)
BSD-3-Clause
Built for you with ❤️ on Bali by Jason Williscroft & Jeeves.
