Jeeves Watcher 🎩

Filesystem watcher that keeps a Qdrant vector store in sync with document changes.

Overview

jeeves-watcher monitors a configured set of directories for file changes, extracts text content, generates embeddings, and maintains a synchronized Qdrant vector store for semantic search. It automatically:

Watches directories for file additions, modifications, and deletions
Extracts text from various formats (Markdown, PDF, DOCX, HTML, JSON, plain text)
Chunks large documents for optimal embedding
Embeds content using configurable providers (Google Gemini, mock for testing)
Syncs to Qdrant for fast semantic search
Enriches metadata via rules and API endpoints

Architecture

For detailed architecture documentation, see packages/service/guides/architecture.md.

Quick Start

Installation

npm install -g @karmaniverous/jeeves-watcher

Initialize Configuration

Create a new configuration file in your project:

jeeves-watcher init

This generates a jeeves-watcher.config.json file with sensible defaults.

Configure

Edit jeeves-watcher.config.json to specify:

Watch paths: Directories to monitor
Embedding provider: Google Gemini or mock (for testing)
Qdrant connection: URL and collection name
Inference rules: Automatic metadata enrichment based on file patterns

Example minimal configuration:

{
  "watch": {
    "paths": ["./docs"],
    "ignored": ["**/node_modules/**", "**/.git/**"]
  },
  "embedding": {
    "provider": "gemini",
    "model": "gemini-embedding-001",
    "apiKey": "${GOOGLE_API_KEY}"
  },
  "vectorStore": {
    "url": "http://localhost:6333",
    "collectionName": "my_docs"
  }
}

Start Watching

jeeves-watcher start

The watcher will:

Index all existing files in watched directories
Monitor for changes
Update Qdrant automatically

CLI Commands

Command	Description
`jeeves-watcher start`	Start the filesystem watcher (foreground)
`jeeves-watcher init`	Initialize a new configuration file
`jeeves-watcher status`	Show watcher status
`jeeves-watcher reindex`	Reindex all watched files
`jeeves-watcher rebuild-metadata`	Rebuild metadata files from Qdrant payloads
`jeeves-watcher search <query>`	Search the vector store
`jeeves-watcher enrich <path>`	Enrich document metadata with key-value pairs
`jeeves-watcher validate`	Validate the configuration
`jeeves-watcher service`	Manage the watcher as a system service
`jeeves-watcher scan`	Scan the vector store with filter-only queries
`jeeves-watcher config`	Query effective config via JSONPath
`jeeves-watcher issues`	Show indexing issues and errors
`jeeves-watcher helpers`	Show loaded map and template helpers
`jeeves-watcher config-apply`	Validate, write, and reload configuration from file

Configuration

Environment Variable Substitution

Config strings support ${VAR_NAME} syntax for environment variable injection:

{
  "embedding": {
    "apiKey": "${GOOGLE_API_KEY}"
  }
}

If GOOGLE_API_KEY is set in the environment, the value is substituted at config load time. Set templates in inference rules use Handlebars {{...}} syntax (e.g. {{frontmatter.title}}), which is distinct from the ${...} environment variable syntax used in config values like embedding.apiKey.

Watch Paths

{
  "watch": {
    "paths": ["./docs", "./notes"],
    "ignored": ["**/node_modules/**", "**/*.tmp"]
  }
}

paths: Array of glob patterns or directories to watch
ignored: Array of patterns to exclude
respectGitignore: (default: true) Skip processing files ignored by .gitignore in git repositories. Nested .gitignore files are respected within their subtree.
moveDetection: (optional) Correlate unlink+add events as file moves to avoid re-embedding. enabled (default: true), bufferMs (default: 2000) — how long to buffer unlink events before treating as deletes.

Embedding Provider

Google Gemini

{
  "embedding": {
    "provider": "gemini",
    "model": "gemini-embedding-001",
    "apiKey": "${GOOGLE_API_KEY}"
  }
}

Vector Store

{
  "vectorStore": {
    "url": "http://localhost:6333",
    "collectionName": "my_collection"
  }
}

Inference Rules

Automatically enrich metadata based on file patterns using declarative JSON Schemas:

{
  "schemas": {
    "base": {
      "type": "object",
      "properties": {
        "domain": {
          "type": "string",
          "description": "Content domain"
        }
      }
    }
  },
  "inferenceRules": [
    {
      "name": "meeting-classifier",
      "description": "Classify files under meetings directory",
      "match": {
        "properties": {
          "file": {
            "type": "object",
            "properties": {
              "path": { "type": "string", "glob": "**/meetings/**" }
            }
          }
        }
      },
      "schema": [
        "base",
        {
          "properties": {
            "domain": { "set": "meetings" },
            "category": { "type": "string", "set": "notes" }
          }
        }
      ]
    }
  ]
}

New in v0.5.0: Inference rules now use schema arrays that reference global named schemas. Type coercion automatically converts string interpolation results to declared types (integer, number, boolean, array, object). See Inference Rules Guide for details.

Chunking

Chunking settings are configured under embedding:

{
  "embedding": {
    "chunkSize": 1000,
    "chunkOverlap": 200
  }
}

Enrichment Store

Enrichment metadata (from POST /metadata or watcher_enrich) is stored in a SQLite database at <stateDir>/enrichments.sqlite. Enrichments survive full reindexes. Composable merge: scalar fields overwrite, array fields union+deduplicate with inference rule output.

{
  "stateDir": ".jeeves-metadata"
}

API Endpoints

The watcher provides a REST API (default port: 1936):

Endpoint	Method	Description
`/status`	GET	Health check, uptime, and collection stats
`/search`	POST	Semantic search (`{ query: string, limit?: number, filter?: object }`)
`/render`	POST	Render a file through inference rules (`{ path: string }`) (v0.8.0+)
`/search/facets`	GET	Schema-derived search facet definitions with live values (v0.8.0+)
`/metadata`	POST	Update document metadata with schema validation (`{ path: string, metadata: object }`)
`/reindex`	POST	Scoped reindex with blast area plan (`issues`, `rules`, `full`, `path`, `prune` + `dryRun`). `path` accepts `string \| string[]`.
`/rebuild-metadata`	POST	Rebuild metadata files from Qdrant
`/config`	GET	Full resolved effective config; optional `?path=<jsonpath>` filter. Rules include `source` attribution.
`/config/schema`	GET	JSON Schema of merged virtual document (v0.5.0+)
`/walk`	POST	Filesystem walk with glob intersection (`{ globs: string[] }`). Returns `{ paths, matchedCount, scannedRoots }`.
`/config/match`	POST	Test paths against inference rules (`{ paths: string[] }`) (v0.5.0+)
`/issues`	GET	Current embedding failures and processing errors (v0.5.0+)
`/rules/register`	POST	Register virtual inference rules from an external source
`/rules/unregister`	DELETE	Remove all virtual rules from a source (`{ source }`)
`/rules/unregister/:source`	DELETE	Remove all virtual rules from a named source
`/scan`	POST	Filter-only point query with cursor pagination (`{ filter, limit?, cursor?, fields?, countOnly? }`)
`/config/validate`	POST	Validate a configuration without applying (`{ config?, testPaths? }`)
`/config/apply`	POST	Validate, write, and reload configuration (`{ config }`)
`/rules/reapply`	POST	Re-apply inference rules to files matching globs (`{ globs }`)
`/points/delete`	POST	Delete points matching a Qdrant filter (`{ filter }`)

Example: Search

curl -X POST http://localhost:1936/search \
  -H "Content-Type: application/json" \
  -d '{"query": "machine learning algorithms", "limit": 5}'

Example: Search With Filter

curl -X POST http://localhost:1936/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "error handling",
    "limit": 10,
    "filter": {
      "must": [{ "key": "domain", "match": { "value": "backend" } }]
    }
  }'

Example: Update Metadata

curl -X POST http://localhost:1936/metadata \
  -H "Content-Type: application/json" \
  -d '{
    "path": "/path/to/document.md",
    "metadata": {
      "priority": "high",
      "category": "research"
    }
  }'

OpenClaw Plugin

This repo includes an OpenClaw plugin (packages/openclaw) that exposes the jeeves-watcher API as native agent tools:

Tool	Description
`watcher_status`	Service health, uptime, and collection stats
`watcher_search`	Semantic search across indexed documents
`watcher_enrich`	Set or update document metadata
`watcher_config`	Query the effective runtime config via JSONPath
`watcher_walk`	Walk watched filesystem paths with glob intersection
`watcher_validate`	Validate a watcher configuration
`watcher_config_apply`	Apply a new configuration
`watcher_reindex`	Trigger a scoped reindex with blast area plan
`watcher_scan`	Filter-only point query with cursor pagination
`watcher_issues`	List indexing issues and errors

The plugin integrates with @karmaniverous/jeeves core to manage workspace content (TOOLS.md, SOUL.md, AGENTS.md) via a ComponentWriter that refreshes every 71 seconds. See the OpenClaw Integration Guide for details.

Plugin configuration supports apiUrl (defaults to http://127.0.0.1:1936) and configRoot (defaults to j:/config).

Supported File Formats

Markdown (.md, .markdown) — with YAML frontmatter support
PDF (.pdf) — text extraction
DOCX (.docx) — Microsoft Word documents
HTML (.html, .htm) — content extraction (scripts/styles removed)
JSON (.json) — with smart text field detection
Plain Text (.txt, .text)

License

BSD-3-Clause

Built for you with ❤️ on Bali by Jason Williscroft & Jeeves.

Name		Name	Last commit message	Last commit date
Latest commit History 636 Commits
.github/workflows		.github/workflows
.lefthook		.lefthook
.stan/system		.stan/system
.vscode		.vscode
packages		packages
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TOOLS.md		TOOLS.md
eslint.config.ts		eslint.config.ts
knip.json		knip.json
lefthook.yml		lefthook.yml
package-lock.json		package-lock.json
package.json		package.json
pr-body.txt		pr-body.txt
stan.config.yml		stan.config.yml
tsconfig.json		tsconfig.json
tsdoc.json		tsdoc.json
typedoc.json		typedoc.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jeeves Watcher 🎩

Overview

Architecture

Quick Start

Installation

Initialize Configuration

Configure

Start Watching

CLI Commands

Configuration

Environment Variable Substitution

Watch Paths

Embedding Provider

Google Gemini

Vector Store

Inference Rules

Chunking

Enrichment Store

API Endpoints

Example: Search

Example: Search With Filter

Example: Update Metadata

OpenClaw Plugin

Supported File Formats

License

About

Uh oh!

Releases 112

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Jeeves Watcher 🎩

Overview

Architecture

Quick Start

Installation

Initialize Configuration

Configure

Start Watching

CLI Commands

Configuration

Environment Variable Substitution

Watch Paths

Embedding Provider

Google Gemini

Vector Store

Inference Rules

Chunking

Enrichment Store

API Endpoints

Example: Search

Example: Search With Filter

Example: Update Metadata

OpenClaw Plugin

Supported File Formats

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 112

Contributors

Uh oh!

Languages