diff --git a/README.md b/README.md index 3834054..1399bf3 100644 --- a/README.md +++ b/README.md @@ -2,8 +2,9 @@ AI Workbench is a self-hosted product surface for building, inspecting, and operating retrieval-backed AI applications on **DataStax Astra**. -It gives teams one place to manage workspaces, catalogs, vector stores, -document ingest, saved queries, API keys, and retrieval experiments. +It gives teams one place to manage workspaces, knowledge bases, +chunking / embedding / reranking services, document ingest, API keys, +and retrieval experiments. Under the product UI is a stable HTTP runtime. The default TypeScript runtime ships in the same Docker image as the UI; alternative @@ -12,11 +13,15 @@ language-native runtimes ("green boxes") live under ## At a glance -- **Workspace command center.** Workspaces isolate catalogs, vector - stores, documents, saved queries, jobs, credentials, and API keys. -- **Knowledge operations.** Ingest raw text or files into catalogs, - track sync/async job state, and bind content to the vector store that - powers retrieval. +- **Workspace command center.** Workspaces isolate knowledge bases, + execution services, documents, jobs, credentials, and API keys. +- **Knowledge bases as first-class.** A KB owns its Astra collection + end-to-end and binds the chunking + embedding + (optional) + reranking services that produce its content. The collection is + auto-provisioned on create. +- **Knowledge operations.** Ingest raw text or files into a KB, + track sync/async job state, and let the KB's bound services drive + chunking and embedding. - **Retrieval playground.** Run text, vector, hybrid, and rerank searches in the browser against real workspace data. - **Production-friendly controls.** Start in memory, switch to file @@ -51,17 +56,21 @@ language-native runtimes ("green boxes") live under └──────────────── same HTTP contract ───────────────────┘ │ ▼ (per-runtime Astra SDK) - ┌─────────────────────────────┐ - │ Astra Data API │ - │ Tables (control plane): │ - │ wb_workspaces │ - │ wb_catalog_by_ws │ - │ wb_vector_store_by_ws │ - │ wb_documents_by_cat │ - │ Collections (data │ - │ plane): one per │ - │ vector store │ - └─────────────────────────────┘ + ┌──────────────────────────────────┐ + │ Astra Data API │ + │ Tables (control plane): │ + │ wb_workspaces │ + │ wb_config_knowledge_ │ + │ bases_by_workspace │ + │ wb_config_chunking/ │ + │ embedding/reranking │ + │ _service_by_workspace │ + │ wb_rag_documents_ │ + │ by_knowledge_base │ + │ Collections (data plane): │ + │ wb_vectors_ │ + │ (one per knowledge base) │ + └──────────────────────────────────┘ ``` See [`docs/architecture.md`](docs/architecture.md) for the full model. @@ -109,26 +118,21 @@ All routes documented at `/docs` (Scalar UI) and | `GET / POST` | `/api/v1/workspaces` | List / create workspaces | | `GET / PUT / DELETE` | `/api/v1/workspaces/{w}` | Workspace CRUD (DELETE cascades) | | `POST` | `/api/v1/workspaces/{w}/test-connection` | Resolve configured workspace credential refs | -| `GET / POST` | `/api/v1/workspaces/{w}/catalogs` | List / create catalogs | -| `GET / PUT / DELETE` | `/api/v1/workspaces/{w}/catalogs/{c}` | Catalog CRUD (DELETE cascades to documents + saved queries) | -| `GET / POST` | `/api/v1/workspaces/{w}/catalogs/{c}/documents` | List / create document metadata | -| `GET / PUT / DELETE` | `/api/v1/workspaces/{w}/catalogs/{c}/documents/{d}` | Document metadata CRUD (DELETE cascades chunks via the bound vector store) | -| `GET` | `/api/v1/workspaces/{w}/catalogs/{c}/documents/{d}/chunks` | List the chunks under a document (id, chunkIndex, text, payload) | -| `POST` | `/api/v1/workspaces/{w}/catalogs/{c}/documents/search` | Catalog-scoped search (vector / text, optional hybrid + rerank) | -| `POST` | `/api/v1/workspaces/{w}/catalogs/{c}/ingest` | Sync ingest (chunk → embed → upsert → register Document) | -| `POST` | `/api/v1/workspaces/{w}/catalogs/{c}/ingest?async=true` | Same pipeline, returns 202 + job pointer | -| `GET / POST` | `/api/v1/workspaces/{w}/catalogs/{c}/queries` | List / create saved queries | -| `GET / PUT / DELETE` | `/api/v1/workspaces/{w}/catalogs/{c}/queries/{q}` | Saved-query CRUD | -| `POST` | `/api/v1/workspaces/{w}/catalogs/{c}/queries/{q}/run` | Replay a saved query through catalog-scoped search | +| `GET / POST` | `/api/v1/workspaces/{w}/knowledge-bases` | List / create knowledge bases (POST auto-provisions the underlying vector collection) | +| `GET / PUT / DELETE` | `/api/v1/workspaces/{w}/knowledge-bases/{kb}` | KB CRUD (DELETE drops the collection + cascades RAG documents) | +| `GET / POST` | `/api/v1/workspaces/{w}/knowledge-bases/{kb}/documents` | List / register a document in a KB | +| `GET / PUT / DELETE` | `/api/v1/workspaces/{w}/knowledge-bases/{kb}/documents/{d}` | Document metadata CRUD (DELETE cascades chunks in the KB's collection) | +| `GET` | `/api/v1/workspaces/{w}/knowledge-bases/{kb}/documents/{d}/chunks` | List the chunks under a document | +| `POST` | `/api/v1/workspaces/{w}/knowledge-bases/{kb}/ingest` | Sync ingest (chunk → embed → upsert → register Document) | +| `POST` | `/api/v1/workspaces/{w}/knowledge-bases/{kb}/ingest?async=true` | Same pipeline, returns 202 + job pointer | +| `POST` | `/api/v1/workspaces/{w}/knowledge-bases/{kb}/records` | Upsert vector or text records (text → server-side `$vectorize` when supported, otherwise client-side embed) | +| `DELETE` | `/api/v1/workspaces/{w}/knowledge-bases/{kb}/records/{rid}` | Delete one | +| `POST` | `/api/v1/workspaces/{w}/knowledge-bases/{kb}/search` | KB-scoped search (vector / text, optional hybrid + rerank) | +| `GET / POST / DELETE` | `/api/v1/workspaces/{w}/chunking-services` | Chunking-service CRUD | +| `GET / POST / DELETE` | `/api/v1/workspaces/{w}/embedding-services` | Embedding-service CRUD | +| `GET / POST / DELETE` | `/api/v1/workspaces/{w}/reranking-services` | Reranking-service CRUD | | `GET` | `/api/v1/workspaces/{w}/jobs/{jobId}` | Poll an async-ingest job | | `GET` | `/api/v1/workspaces/{w}/jobs/{jobId}/events` | SSE stream of job updates until terminal state | -| `GET / POST` | `/api/v1/workspaces/{w}/vector-stores` | List / create vector-store descriptors (POST provisions the collection too) | -| `GET` | `/api/v1/workspaces/{w}/vector-stores/discoverable` | List data-plane collections not yet wrapped in a descriptor | -| `POST` | `/api/v1/workspaces/{w}/vector-stores/adopt` | Wrap an existing collection in a descriptor without re-provisioning | -| `GET / PUT / DELETE` | `/api/v1/workspaces/{w}/vector-stores/{v}` | Descriptor CRUD (DELETE drops the collection) | -| `POST` | `/api/v1/workspaces/{w}/vector-stores/{v}/records` | Upsert vector or text records (text → server-side `$vectorize` when supported, otherwise client-side embed) | -| `DELETE` | `/api/v1/workspaces/{w}/vector-stores/{v}/records/{rid}` | Delete one | -| `POST` | `/api/v1/workspaces/{w}/vector-stores/{v}/search` | Vector or text search; supports `hybrid`, `lexicalWeight`, `rerank` | | `GET / POST` | `/api/v1/workspaces/{w}/api-keys` | List / issue workspace API keys | | `DELETE` | `/api/v1/workspaces/{w}/api-keys/{keyId}` | Revoke a workspace API key | diff --git a/apps/web/README.md b/apps/web/README.md index c7d4c29..9ffde93 100644 --- a/apps/web/README.md +++ b/apps/web/README.md @@ -6,13 +6,13 @@ consumes `/api/v1/workspaces` on the default TypeScript runtime. ## Status **Shipped.** First-run onboarding wizard, workspace list / detail / -edit / destructive delete, full CRUD over catalogs, vector-store -descriptors, and workspace-scoped API keys. Async ingest from the -browser (file upload → chunk → embed → upsert) with live progress -streamed via SSE. Saved-query CRUD per catalog, runnable from the UI. -Playground for ad-hoc text / vector / hybrid / rerank queries against -any vector store. OIDC login + silent refresh and paste-a-token -fallback are both wired through the same auth layer. +edit / destructive delete, full CRUD over knowledge bases, +chunking / embedding / reranking services, and workspace-scoped API +keys. Async ingest from the browser (file upload → chunk → embed → +upsert) with live progress streamed via SSE. Playground for ad-hoc +text / vector / hybrid / rerank queries against any knowledge base. +OIDC login + silent refresh and paste-a-token fallback are both wired +through the same auth layer. HCD and OpenRAG kinds are visible in the onboarding picker but intentionally non-selectable ("Coming soon" badge) — the runtime @@ -95,38 +95,27 @@ navigation shows the shared loader while the chunk streams. |---|---| | `/` | Workspaces list. Redirects to `/onboarding` when empty. | | `/onboarding` | Two-step wizard — pick a backend kind, then fill details. HCD / OpenRAG tiles render but are non-selectable. | -| `/workspaces/:uid` | Detail + edit + destructive delete (type-to-confirm). Hosts the catalogs, vector-stores, and API-keys panels for this workspace. | -| `/workspaces/:uid/catalogs/:catalogId` | Catalog explorer — sortable / filterable document table with file-type badges, sizes, statuses, and a click-through detail dialog. Multi-file / folder ingest queue lives here. | -| `/playground` | Ad-hoc text / vector / hybrid / rerank queries against a workspace's vector stores. See [`docs/playground.md`](../../docs/playground.md). | +| `/workspaces/:uid` | Detail + edit + destructive delete (type-to-confirm). Hosts the knowledge-bases, services, and API-keys panels for this workspace. | +| `/workspaces/:uid/knowledge-bases/:kbId` | Knowledge-base explorer — sortable / filterable document table with file-type badges, sizes, statuses, and a click-through detail dialog. Multi-file / folder ingest queue lives here. | +| `/playground` | Ad-hoc text / vector / hybrid / rerank queries against a workspace's knowledge bases. See [`docs/playground.md`](../../docs/playground.md). | The workspace detail page composes four panels (collapsible cards): | Panel | What it does | |---|---| -| Catalogs | List + create + delete catalogs. Each row expands to a quick-look document preview and houses the saved-queries section. The "Open" button on every row jumps to the catalog explorer for the full table; "Ingest" pops the multi-file / folder upload queue. | -| Vector stores | List + create + delete vector-store descriptors. Create flow provisions the underlying collection on the bound driver. | +| Knowledge bases | List + create + delete knowledge bases. Create flow auto-provisions the underlying vector collection sized to the bound embedding service. The "Open" button on every row jumps to the KB explorer for the full document table; "Ingest" pops the multi-file / folder upload queue. | +| Services | List + create + delete chunking, embedding, and reranking service definitions. Services are reusable across knowledge bases in the same workspace. | | API keys | List + issue + revoke workspace-scoped `wb_live_*` keys. Fresh keys are shown once, then masked. | | Detail / edit | The kind-aware edit form (kind is read-only after create) and the destructive delete dialog. | -The catalog explorer adds: +The KB explorer adds: - A document table with sortable columns (name, size, chunks, status, ingestedAt) and an inline filename/source-id filter. - Color-coded `FileTypeBadge` (Markdown violet, structured-data emerald, tabular amber, code blue, etc.) and pill-shaped `DocumentStatusBadge` (animated glyph for in-flight states). -- Per-row trash button that pops a confirm dialog and runs the cascade-delete: the bound vector store's chunks are wiped before the document row is dropped, so deleted documents don't surface in playground searches. +- Per-row trash button that pops a confirm dialog and runs the cascade-delete: the KB's chunks are wiped before the document row is dropped, so deleted documents don't surface in playground searches. - Click-through metadata dialog showing the full Document record, the failure message verbatim when status is `failed`, **and** the chunks the runtime extracted (chunk index, id, and snippet text — text comes from the reserved `chunkText` payload key the ingest pipeline stamps). - An ingest queue dialog accepting drag-drop, multi-file picker, or a folder picker (`webkitdirectory`). Files run sequentially through async ingest with a per-row live progress bar — sequential rather than parallel so embedding-provider rate limits stay predictable and a misbehaving file doesn't tank the others. -The vector-stores panel on the workspace detail also exposes an -**Adopt existing** button. It opens a dialog listing collections -that already live in the workspace's data plane but aren't yet -wrapped in a workbench descriptor (created by another tool, by -hand, by an older workbench install whose state was wiped). One -click adopts the collection — the runtime reads the live vector / -lexical / rerank options off the data plane and stamps a matching -descriptor without re-provisioning. Mock workspaces always see -the empty state since the mock driver has no notion of "external" -collections. - ## Stack - **Vite + React 19 + TypeScript** — standard modern baseline. @@ -137,7 +126,7 @@ collections. - **React Hook Form + Zod** for forms; the same Zod schemas that describe API shapes drive form validation, so the UI and backend can't disagree about request shape. -- **React Router** for the five routes (`/`, `/onboarding`, `/workspaces/:uid`, `/workspaces/:uid/catalogs/:catalogId`, `/playground`). +- **React Router** for the five routes (`/`, `/onboarding`, `/workspaces/:uid`, `/workspaces/:uid/knowledge-bases/:kbId`, `/playground`). - **Sonner** for toasts. - **Lucide React** for icons. @@ -162,11 +151,10 @@ apps/web/ │ │ └── utils.ts ← cn() + formatDate() │ ├── hooks/ │ │ ├── useWorkspaces.ts ← list/get/create/update/delete -│ │ ├── useCatalogs.ts ← catalog CRUD -│ │ ├── useDocuments.ts ← per-catalog document list -│ │ ├── useVectorStores.ts ← vector-store descriptor CRUD +│ │ ├── useKnowledgeBases.ts ← knowledge-base CRUD +│ │ ├── useServices.ts ← chunking/embedding/reranking service CRUD +│ │ ├── useDocuments.ts ← per-KB document list │ │ ├── useIngest.ts ← async ingest + SSE progress -│ │ ├── useSavedQueries.ts ← saved-query CRUD + /run │ │ ├── usePlaygroundSearch.ts ← /search dispatch + result hits │ │ ├── useApiKeys.ts ← workspace API-key mutations │ │ ├── useAuthToken.ts ← reactive bearer-token hook @@ -190,22 +178,19 @@ apps/web/ │ │ ├── TestConnectionPanel.tsx │ │ ├── ApiKeysPanel.tsx │ │ ├── CreateApiKeyDialog.tsx -│ │ ├── CatalogsPanel.tsx ← catalog list + per-row docs preview -│ │ ├── CreateCatalogDialog.tsx +│ │ ├── KnowledgeBasesPanel.tsx ← KB list + per-row docs preview +│ │ ├── CreateKnowledgeBaseDialog.tsx +│ │ ├── ServicesPanel.tsx ← chunking/embedding/reranking services │ │ ├── DocumentTable.tsx ← sortable doc table for the explorer │ │ ├── DocumentDetailDialog.tsx │ │ ├── DocumentStatusBadge.tsx │ │ ├── FileTypeBadge.tsx -│ │ ├── IngestQueueDialog.tsx ← multi-file / folder ingest queue -│ │ ├── SavedQueriesSection.tsx -│ │ ├── VectorStoresPanel.tsx -│ │ ├── CreateVectorStoreDialog.tsx -│ │ └── AdoptCollectionDialog.tsx ← discover + adopt existing collections +│ │ └── IngestQueueDialog.tsx ← multi-file / folder ingest queue │ └── pages/ │ ├── WorkspacesPage.tsx │ ├── OnboardingPage.tsx │ ├── WorkspaceDetailPage.tsx -│ ├── CatalogExplorerPage.tsx +│ ├── KnowledgeBaseExplorerPage.tsx │ └── PlaygroundPage.tsx ``` @@ -218,7 +203,8 @@ apps/web/ `provider:path` shape inline and drops empty rows before submit. The runtime rejects raw secrets with `400` anyway. - **Destructive delete requires typing the workspace name.** Cascade - is real — catalogs, vector-store collections, and documents all go. + is real — knowledge bases, their underlying vector collections, + service definitions, and documents all go. - **Empty state → onboarding redirect.** First-run users never see a bare "no workspaces" screen; they land directly in the wizard. - **List order is deterministic.** The runtime sorts by `createdAt` @@ -246,11 +232,11 @@ apps/web/ | `npm test` | Unit + component tests under `src/**/*.{test,spec}.{ts,tsx}` (vitest + jsdom + RTL). Fast — no browser. | | `npm run test:watch` | Same in watch mode. | | `npm run test:coverage` | Same as `npm test` but with v8 coverage. **Gates `src/lib/**` at lines: 50, statements: 50, branches: 80, functions: 20.** Components are exercised end-to-end through Playwright; locking thresholds on them prematurely pushes toward shallow tests. | -| `npm run test:e2e` | Playwright golden-path spec. Builds the runtime + SPA, boots the runtime against the bundled `examples/workbench.yaml` (memory backend, auth disabled), drives Chromium through the onboarding → vector-store → upsert → playground flow. Reuses an existing `:8080` server in dev; CI starts a fresh one. | +| `npm run test:e2e` | Playwright golden-path spec. Builds the runtime + SPA, boots the runtime against the bundled `examples/workbench.yaml` (memory backend, auth disabled), drives Chromium through the onboarding → services → knowledge-base → upsert → playground flow. Reuses an existing `:8080` server in dev; CI starts a fresh one. | | `npm run test:e2e:ui` | Same in Playwright's UI mode for debugging. | | `npm run e2e:install` | One-time: `playwright install chromium --with-deps`. | -E2E specs deliberately stay on the **vector** lane. The route's `resolveQuery()` always builds an `Embedder` for any text query (so hybrid search has a vector handle); with a mock embedding descriptor the production embedder factory throws `embedding_unavailable`. Vector input bypasses that path entirely. Adding text-search coverage to the E2E suite needs either a real provider key in CI or a runtime override that lets a fake embedder run alongside production code — both deferred. +E2E specs deliberately stay on the **vector** lane. The route's `resolveQuery()` always builds an `Embedder` for any text query (so hybrid search has a vector handle); with a mock embedding-service config the production embedder factory throws `embedding_unavailable`. Vector input bypasses that path entirely. Adding text-search coverage to the E2E suite needs either a real provider key in CI or a runtime override that lets a fake embedder run alongside production code — both deferred. ## House rules diff --git a/docs/api-spec.md b/docs/api-spec.md index 0c0f1c8..e5265fb 100644 --- a/docs/api-spec.md +++ b/docs/api-spec.md @@ -42,8 +42,9 @@ Every nested resource carries its parent UIDs in the path: ``` /api/v1/workspaces/{workspaceUid} -/api/v1/workspaces/{workspaceUid}/catalogs/{catalogUid} -/api/v1/workspaces/{workspaceUid}/vector-stores/{vectorStoreUid} +/api/v1/workspaces/{workspaceUid}/knowledge-bases/{knowledgeBaseUid} +/api/v1/workspaces/{workspaceUid}/knowledge-bases/{kb}/documents/{documentUid} +/api/v1/workspaces/{workspaceUid}/{chunking,embedding,reranking}-services/{uid} ``` A request whose path references a non-existent workspace returns @@ -94,19 +95,16 @@ human-readable and may change. Currently emitted: | 413 | `payload_too_large` | `/api/v1/workspaces/*` request body exceeded the runtime's 1 MB JSON body limit. | | 404 | `not_found` | Unknown route | | 404 | `workspace_not_found` | Workspace UID doesn't exist | -| 404 | `catalog_not_found` | Catalog UID doesn't exist in workspace | -| 404 | `vector_store_not_found` | Vector-store UID doesn't exist in workspace | -| 404 | `document_not_found` | Document UID doesn't exist in the catalog | +| 404 | `knowledge_base_not_found` | Knowledge-base UID doesn't exist in workspace | +| 404 | `document_not_found` | Document UID doesn't exist in the knowledge base | +| 404 | `chunking_service_not_found` / `embedding_service_not_found` / `reranking_service_not_found` | Service UID doesn't exist in workspace | | 404 | `job_not_found` | Job ID doesn't exist in the workspace | -| 404 | `saved_query_not_found` | Saved query UID doesn't exist in the catalog | -| 409 | `conflict` | Create with an already-taken UID | -| 409 | `catalog_not_bound_to_vector_store` | Catalog-scoped search against a catalog whose `vectorStore` is `null` | +| 409 | `conflict` | Create with an already-taken UID, or service deletion refused while a KB still references it | | 501 | `hybrid_not_supported` | Caller asked for hybrid search on a workspace kind whose driver doesn't implement `searchHybrid` | | 501 | `rerank_not_supported` | Caller asked for rerank on a workspace kind whose driver doesn't implement `rerank` | -| 409 | `catalog_not_bound_to_vector_store` | Catalog-scoped search, ingest, or saved-query run against a catalog whose `vectorStore` is `null` | -| 400 | `dimension_mismatch` | Supplied vector length doesn't match the vector-store descriptor | -| 400 | `embedding_unavailable` | Text search/upsert fallback could not build an embedder for the descriptor | -| 400 | `embedding_dimension_mismatch` | Embedder output dimension doesn't match the descriptor | +| 400 | `dimension_mismatch` | Supplied vector length doesn't match the KB's bound embedding service | +| 400 | `embedding_unavailable` | Text search/upsert fallback could not build an embedder for the KB's bound embedding service | +| 400 | `embedding_dimension_mismatch` | Embedder output dimension doesn't match the bound embedding service | | 422 | `workspace_misconfigured` | Workspace is missing endpoint, token, keyspace, or similar driver-required config | | 500 | `internal_error` | Unhandled exception | | 503 | `control_plane_unavailable` | Backing store is unreachable | @@ -264,7 +262,7 @@ omitted. `kind` is one of `astra | hcd | openrag | mock`. (`mock` stays a first-class option for CI and offline work.) Once set, `kind` is immutable — changing it would orphan any already-provisioned -vector-store collections. +KB collections. `endpoint` is the workspace's data-plane URL (for `astra` / `hcd`, the Astra Data API endpoint). Accepts either a literal URL or a @@ -299,14 +297,15 @@ Patch one or more of `name`, `endpoint`, `credentialsRef`, ### `DELETE /api/v1/workspaces/{workspaceUid}` -Cascades to the workspace's catalogs, vector-store descriptors, and -documents. Before removing the control-plane rows, the runtime drops -each underlying vector-store collection through the workspace's driver. +Cascades to the workspace's knowledge bases, execution services, +RAG documents, and API keys. Before removing the control-plane +rows, the runtime drops each KB's underlying Astra collection +through the workspace's driver. - **204** — deleted - **404** `workspace_not_found` -- **503** `driver_unavailable` — workspace has vector stores but no - registered driver to drop their collections +- **503** `driver_unavailable` — workspace has knowledge bases but + no registered driver to drop their collections ### `POST /api/v1/workspaces/{workspaceUid}/test-connection` @@ -404,79 +403,97 @@ no-op that still returns `204`. --- -## `/api/v1/workspaces/{workspaceUid}/catalogs` +## `/api/v1/workspaces/{workspaceUid}/{chunking,embedding,reranking}-services` + +Workspace-scoped execution services. Knowledge bases compose one +chunking + one embedding + (optionally) one reranking service at +create time. The three surfaces share an identical CRUD shape; only +the body fields differ. ### `GET` -List catalogs in the workspace. +List services in the workspace. -- **200** — paginated `Catalog` records +- **200** — paginated `ChunkingService` / `EmbeddingService` / + `RerankingService` records (sorted by `createdAt` ascending, + `*ServiceId` as tie-breaker) - **404** `workspace_not_found` -A `Catalog`: - -```json -{ - "workspace": "…", - "uid": "…", - "name": "support", - "description": null, - "vectorStore": "…", - "createdAt": "…", - "updatedAt": "…" -} -``` - ### `POST` -Create a catalog. `vectorStore` is optional and refers to a vector -store in the same workspace (N:1 — multiple catalogs may share a -single vector store). +Create a service. The runtime generates a UID if `uid` is omitted. +Required fields by kind: -**Request** +| Kind | Required | +|---|---| +| chunking | `name`, `engine` | +| embedding | `name`, `provider`, `modelName`, `embeddingDimension` | +| reranking | `name`, `provider`, `modelName` | + +Optional fields cover endpoint config (`endpointBaseUrl`, +`endpointPath`, `requestTimeoutMs`, `authType`, `credentialRef`), +provider/engine tuning, and supported language/content tags. See +the OpenAPI spec for the full per-kind shape. ```json -{ "name": "support", "vectorStore": "" } +{ + "name": "openai-3-small", + "provider": "openai", + "modelName": "text-embedding-3-small", + "embeddingDimension": 1536, + "distanceMetric": "cosine", + "endpointBaseUrl": "https://api.openai.com/v1", + "credentialRef": "env:OPENAI_API_KEY", + "supportedLanguages": ["en", "fr"], + "supportedContent": ["text"] +} ``` -- **201** — the created `Catalog` +`supportedLanguages` and `supportedContent` arrive as arrays and are +returned deduplicated + sorted on the wire. (Astra-row layer keeps +them as `SET`; the converter normalises at the boundary.) + +- **201** — the created record (with the generated `*ServiceId`) +- **400** `validation_error` — schema failure - **404** `workspace_not_found` -- **404** `vector_store_not_found` — `vectorStore` points at a missing descriptor - **409** `conflict` — `uid` collision -### `GET /{catalogUid}` / `PUT /{catalogUid}` / `DELETE /{catalogUid}` +### `GET /{serviceId}` / `PUT /{serviceId}` / `DELETE /{serviceId}` -Fetch / patch / delete. `DELETE` cascades to the catalog's documents. +Fetch / patch / delete. `PUT` accepts every field from create +(all optional). Strict bodies — unknown keys return `400`. + +`DELETE` is **refused with `409 conflict` while any KB still +references the service**. Drop or rebind the dependent KBs first. +The error message names the offending KB so operators can navigate +straight to it. --- -## `/api/v1/workspaces/{workspaceUid}/vector-stores` +## `/api/v1/workspaces/{workspaceUid}/knowledge-bases` ### `GET` -List vector-store descriptors in the workspace. +List knowledge bases in the workspace. -- **200** — paginated `VectorStore` records +- **200** — paginated `KnowledgeBase` records - **404** `workspace_not_found` -A `VectorStore` descriptor: +A `KnowledgeBase`: ```json { - "workspace": "…", - "uid": "…", - "name": "support-vectors", - "vectorDimension": 1536, - "vectorSimilarity": "cosine", - "embedding": { - "provider": "openai", - "model": "text-embedding-3-small", - "endpoint": null, - "dimension": 1536, - "secretRef": "env:OPENAI_API_KEY" - }, - "lexical": { "enabled": false, "analyzer": null, "options": {} }, - "reranking": { "enabled": false, "provider": null, "model": null, "endpoint": null, "secretRef": null }, + "workspaceId": "…", + "knowledgeBaseId": "…", + "name": "support-docs", + "description": "customer support knowledge base", + "status": "active", + "embeddingServiceId": "…", + "chunkingServiceId": "…", + "rerankingServiceId": null, + "language": "en", + "vectorCollection": "wb_vectors_", + "lexical": { "enabled": false, "analyzer": null, "options": {} }, "createdAt": "…", "updatedAt": "…" } @@ -484,93 +501,50 @@ A `VectorStore` descriptor: ### `POST` -Create a descriptor **and** provision the underlying Data API -Collection via the workspace's driver. Transactional — if collection -provisioning fails, the descriptor row is rolled back so the control -plane and data plane never drift. - -`vectorSimilarity` defaults to `cosine`; `lexical` and `reranking` -default to `{ enabled: false, ... }` if omitted. +Create a KB **and** auto-provision its underlying Astra collection. +Transactional — if collection provisioning fails, the KB row is +rolled back so the control plane and data plane never drift. -**Required fields:** `name`, `vectorDimension`, `embedding`. +`vectorCollection` is generated as `wb_vectors_` (hyphen- +stripped) by default; supply your own to adopt a pre-existing +collection. -- **201** — the created `VectorStore` (collection now exists) -- **404** `workspace_not_found` -- **409** `conflict` -- **422** `workspace_misconfigured` — workspace is missing `endpoint` or `credentialsRef.token` required by its driver -- **503** `driver_unavailable` — no driver registered for the workspace's `kind` - -### `GET /discoverable` - -List collections that exist in the workspace's data plane but aren't -yet wrapped in a workbench descriptor — useful for adopting -collections created by another tool, by hand, or by an older -workbench install whose control-plane state was lost. Returns `[]` -for drivers that don't expose external collections (the mock -driver). +**Request** ```json -[ - { - "name": "legacy_openai_coll", - "vectorDimension": 1536, - "vectorSimilarity": "cosine", - "embedding": { "provider": "openai", "model": "text-embedding-3-small" }, - "lexicalEnabled": true, - "rerankEnabled": false, - "rerankProvider": null, - "rerankModel": null - } -] +{ + "name": "support-docs", + "description": "customer support", + "embeddingServiceId": "…", + "chunkingServiceId": "…", + "rerankingServiceId": null, + "language": "en" +} ``` -- **200** — list of `AdoptableCollection`s (already-adopted - collections are filtered out) -- **404** `workspace_not_found` +`embeddingServiceId` and `chunkingServiceId` are required. Both +must reference services that exist in the same workspace. + +- **201** — the created `KnowledgeBase` (collection now exists) +- **404** `workspace_not_found` / `embedding_service_not_found` / + `chunking_service_not_found` / `reranking_service_not_found` +- **409** `conflict` — `uid` collision +- **422** `workspace_misconfigured` — workspace is missing + `endpoint` or `credentialsRef.token` required by its driver - **503** `driver_unavailable` — no driver registered for the workspace's `kind` -### `POST /adopt` - -Wrap an existing data-plane collection in a workbench descriptor -without re-provisioning it. The route reads the live collection's -vector / lexical / rerank options off the data plane and stamps a -descriptor matching them; the descriptor's `name` equals the -collection's name (which is already a valid Astra identifier by -construction). - -**Request:** +### `GET /{knowledgeBaseUid}` / `PUT /{knowledgeBaseUid}` / `DELETE /{knowledgeBaseUid}` -```json -{ "collectionName": "legacy_openai_coll" } -``` +`GET` reads the record. `PUT` accepts a partial — `name`, +`description`, `status`, `rerankingServiceId`, `language`, `lexical` +are mutable; **`embeddingServiceId` and `chunkingServiceId` are +immutable post-create** and the schema is `.strict()`, so accidentally +including them in a body returns `400`. `DELETE` drops the underlying +Astra collection first, then the KB row, then cascades RAG document +rows. -- **201** — the created `VectorStore` descriptor -- **404** `collection_not_found` — the named collection isn't on - the data plane (or the driver no longer reports it) -- **409** `collection_already_adopted` — a descriptor with that name - already exists in this workspace -- **503** `adopt_not_supported` — driver doesn't expose - `listAdoptable` - -Vectorless / vector-only collections (no `$vectorize` service -configured) get a placeholder `embedding: { provider: "external", -model: "external", … }` — clients still need to supply vectors at -upsert / search time. Create a new vector store when you need a -different provider or dimension; descriptors intentionally mirror the -underlying collection and are immutable after creation. - -### `GET /{vectorStoreUid}` / `PUT /{vectorStoreUid}` / `DELETE /{vectorStoreUid}` - -`GET` reads the descriptor. `PUT` accepts an empty patch and returns -the existing descriptor, but rejects any field changes with -`409 conflict` because dimensions, similarity, embedding, lexical, -rerank, and collection naming are physical collection properties. -`DELETE` drops the underlying Data API Collection **and** removes the descriptor. -If any catalog still references the vector store, `DELETE` returns -`409 conflict`; clear or move those catalog bindings first. - -### `POST /{vectorStoreUid}/records` — upsert records +### `POST /{knowledgeBaseUid}/records` — upsert records **Request** — each record carries exactly one of `vector` or `text`: @@ -587,15 +561,15 @@ If any catalog still references the vector store, `DELETE` returns - `records` — 1..500 items per request. - `id` is the application's identifier; re-upsert replaces the prior value. -- `vector.length` must equal the descriptor's `vectorDimension`. +- `vector.length` must equal the bound embedding service's + `embeddingDimension`. - **Text dispatch** mirrors search: the route tries `driver.upsertByText()` for all-text batches (Astra `$vectorize` inserts for collections with a service block). On `NotSupportedError` the runtime embeds each text record via the - vector store's `embedding` config and retries through plain - `upsert`. Mixed batches always embed client-side so the whole - batch stays in one transactional call. See - [`docs/playground.md`](playground.md). + KB's bound embedding service and retries through plain `upsert`. + Mixed batches always embed client-side so the whole batch stays + in one transactional call. **Response 200** @@ -603,62 +577,48 @@ If any catalog still references the vector store, `DELETE` returns { "upserted": 2 } ``` -- **400** `validation_error` — a record has neither or both of `vector`/`text` -- **400** `dimension_mismatch` — at least one vector has the wrong length -- **400** `embedding_unavailable` — text records + descriptor's embedding config can't be resolved -- **400** `embedding_dimension_mismatch` — provider returned a vector whose length doesn't match the descriptor -- **404** `workspace_not_found` / `vector_store_not_found` - -### `DELETE /{vectorStoreUid}/records/{recordId}` +- **400** `validation_error` — record has neither/both of `vector`/`text` +- **400** `dimension_mismatch` — vector length doesn't match the + bound embedding service's `embeddingDimension` +- **400** `embedding_unavailable` / `embedding_dimension_mismatch` +- **404** `workspace_not_found` / `knowledge_base_not_found` -Delete a single record. `recordId` is the application's `id` (not a -UUID — any non-empty string). +### `DELETE /{knowledgeBaseUid}/records/{recordId}` -**Response 200** +Delete a single record. `recordId` is the application's `id` (any +non-empty string). ```json -{ "deleted": true } // or false, if the record wasn't present +{ "deleted": true } ``` -### `POST /{vectorStoreUid}/search` — vector or text search - -**Request** — exactly one of `vector` or `text`: +### `POST /{knowledgeBaseUid}/search` — vector or text search -```json -{ - "vector": [0.01, -0.02, ...], - "topK": 10, - "filter": { "tag": "keep" }, - "includeEmbeddings": false -} -``` +**Request** — exactly one of `vector` or `text`, plus optional +`hybrid` / `lexicalWeight` / `rerank`: ```json { - "text": "winter sweater in blue", - "topK": 10 + "text": "how do refunds work?", + "topK": 5, + "filter": { "section": "billing" }, + "hybrid": true, + "lexicalWeight": 0.3, + "rerank": true } ``` - `topK` defaults to 10, clamped to `[1, 1000]`. -- `filter` is shallow-equal on payload keys. Backends with richer - filter languages may accept more; the portable subset is - shallow-equal. -- `includeEmbeddings: true` returns the stored vector on each hit. - -**Text dispatch**: the route tries the driver's `searchByText()` -first — for Astra collections whose descriptor names a supported -vectorize provider (`openai`, `azureOpenAI`, `cohere`, `jinaAI`, -`mistral`, `nvidia`, `voyageAI`) and carries a `secretRef`, the -driver opens a collection handle with the resolved API key as -`embeddingApiKey` and issues `find(sort: { $vectorize: text })`. -The runtime never sees or transmits the vector. Legacy -collections (no `service` block) return a "vectorize not -configured" error; the driver catches it and rethrows as -`NotSupportedError`, after which the runtime falls back to a -client-side embedding (built from the vector store's `embedding` -config via the Vercel AI SDK) and runs a normal vector search. -See [`docs/playground.md`](playground.md) for the mental model. +- `filter` is shallow-equal on payload keys. +- `hybrid: true` runs the driver's vector + lexical lane (defaults + to the KB's `lexical.enabled`). Requires `text`. +- `rerank: true` reorders hits through the KB's bound reranking + service. Defaults to `true` when `rerankingServiceId` is non-null. + Requires `text`. + +The route synthesises a driver-facing descriptor from the KB plus +its bound services (see `kb-descriptor.ts`) so the dispatch layer +stays unchanged. **Response 200** — array of hits, sorted by `score` descending: @@ -669,7 +629,8 @@ See [`docs/playground.md`](playground.md) for the mental model. ] ``` -Score semantics match the descriptor's `vectorSimilarity`: +Score semantics match the bound embedding service's +`distanceMetric`: | Metric | Score | |---|---| @@ -677,37 +638,32 @@ Score semantics match the descriptor's `vectorSimilarity`: | `dot` | Raw dot product; unbounded | | `euclidean` | `1 / (1 + distance)` so higher = closer | -- **400** `validation_error` — neither or both of `vector`/`text` -- **400** `dimension_mismatch` — supplied vector length mismatched -- **400** `embedding_unavailable` — text search but the vector - store's `embedding` config can't be resolved (missing secret, - unknown provider, ...) -- **400** `embedding_dimension_mismatch` — provider returned a - vector whose length doesn't match the store's declared dim -- **404** `workspace_not_found` / `vector_store_not_found` +- **400** `validation_error` — neither/both of `vector`/`text`, + or `hybrid`/`rerank` without `text` +- **400** `dimension_mismatch` / `embedding_unavailable` / + `embedding_dimension_mismatch` +- **404** `workspace_not_found` / `knowledge_base_not_found` +- **501** `hybrid_not_supported` / `rerank_not_supported` ---- +### `GET /{knowledgeBaseUid}/documents` -## `/api/v1/workspaces/{workspaceUid}/catalogs/{catalogUid}/documents` +List RAG documents in the KB. -Document **metadata** CRUD. A `Document` is a named entry in a -catalog — the metadata row the in-process ingest pipeline attaches -vectors to. `PUT` updates metadata only; content changes go through -`POST /ingest` (sync) or `POST /ingest?async=true` (returns 202 with -a job pointer), both documented further down. +- **200** — paginated `RagDocument` records +- **404** `workspace_not_found` / `knowledge_base_not_found` -A `Document`: +A `RagDocument`: ```json { - "workspace": "…", - "catalogUid": "…", - "documentUid": "…", + "workspaceId": "…", + "knowledgeBaseId": "…", + "documentId": "…", "sourceDocId": null, "sourceFilename": "readme.md", "fileType": "text/markdown", "fileSize": 1024, - "md5Hash": null, + "contentHash": "sha256:…", "chunkTotal": null, "ingestedAt": null, "updatedAt": "…", @@ -717,61 +673,45 @@ A `Document`: } ``` -`status` is one of `pending | chunking | embedding | writing | ready | -failed`. The in-process ingest pipeline (sync + async) is the -canonical writer of `status` / `errorMessage` / `chunkTotal` / -`ingestedAt`. Clients can also set these directly via `PUT` so an -external ingest driver can own the lifecycle if it prefers. +`status` is one of `pending | chunking | embedding | writing | ready +| failed`. The KB ingest pipeline is the canonical writer of +`status` / `errorMessage` / `chunkTotal` / `ingestedAt`. Clients +can also set these directly via `PUT` if they own the lifecycle +externally. -### `GET` +### `POST /{knowledgeBaseUid}/documents` -List documents in the catalog. - -- **200** — paginated `Document` records -- **404** `workspace_not_found` / `catalog_not_found` - -### `POST` - -Register a document in the catalog. - -**Request** — all fields optional except uniqueness of `uid` within -the catalog: +Register a document in the KB without running the ingest pipeline. ```json { "sourceFilename": "readme.md", "fileType": "text/markdown", "fileSize": 1024, + "contentHash": "sha256:…", "metadata": { "source": "upload" } } ``` -- **201** — the created `Document` (`status` defaults to `pending`, - `metadata` defaults to `{}`) -- **404** `workspace_not_found` / `catalog_not_found` -- **409** `conflict` — `uid` collision within the same catalog - -### `GET /{documentUid}` / `PUT /{documentUid}` / `DELETE /{documentUid}` +- **201** — the created `RagDocument` (`status` defaults to + `pending`, `metadata` defaults to `{}`) +- **404** `workspace_not_found` / `knowledge_base_not_found` +- **409** `conflict` — `uid` collision within the same KB -Fetch / patch / delete. `PUT` accepts every field from the create body -(all optional) and updates only the fields present. Cross-catalog -access — requesting a document from a catalog it does not belong to — -returns `404 document_not_found`. +### `GET /{knowledgeBaseUid}/documents/{documentUid}` / `PUT /{documentUid}` / `DELETE /{documentUid}` -`DELETE` cascades into the bound vector store: the document's chunks -(matched by `payload.documentUid`) are removed before the document -row is dropped, so a successful delete leaves no traces in -catalog-scoped search. Drivers exposing `deleteRecords` use a single -bulk call; older drivers fall back to a `listRecords` + per-row -delete loop. Catalogs with no `vectorStore` binding skip the cascade -and only drop the row. +Fetch / patch / delete. `PUT` accepts every field from create (all +optional). `DELETE` cascades into the KB's collection: chunks +matched by `payload.documentUid` are removed before the row is +dropped, so a successful delete leaves no traces in KB-scoped +search. Drivers exposing `deleteRecords` use a single bulk call; +older drivers fall back to a `listRecords` + per-row delete loop. -### `GET /{documentUid}/chunks` +### `GET /{knowledgeBaseUid}/documents/{documentUid}/chunks` Lists the chunks the ingest pipeline extracted from this document. -Reads raw records out of the catalog's bound vector store filtered -on `documentUid`, sorts by the `chunkIndex` payload key, and -returns: +Reads raw records out of the KB's collection filtered on +`documentUid`, sorts by the `chunkIndex` payload key, and returns: ```json [ @@ -780,7 +720,7 @@ returns: "chunkIndex": 0, "text": "First paragraph about apples.", "payload": { - "catalogUid": "…", + "knowledgeBaseUid": "…", "documentUid": "…", "chunkIndex": 0, "chunkText": "First paragraph about apples.", @@ -795,104 +735,19 @@ Query params: - `limit` (1–1000, default 1000) — caps the number of chunks returned. -The ingest pipeline stamps the chunk's text into the reserved -`chunkText` payload key, so the response always carries the source -text — even on collections with no `$vectorize` round-trip. -Records ingested before the `chunkText` key landed return -`text: null`. - - **200** — array of chunks, sorted by `chunkIndex` ascending -- **404** `workspace_not_found` / `catalog_not_found` / - `document_not_found` / `vector store_not_found` -- **409** `catalog_not_bound_to_vector_store` +- **404** `workspace_not_found` / `knowledge_base_not_found` / + `document_not_found` - **501** `list_records_not_supported` — driver doesn't expose `listRecords` -### `POST /search` - -Catalog-scoped vector / text search. Delegates to the vector store -bound at `catalog.vectorStore`, merging `catalogUid = catalog.uid` -into the effective filter so records outside the catalog are -invisible. - -**Request** — identical envelope to -`POST /vector-stores/{vectorStoreUid}/search`. Either `vector` OR `text` is -required; never both. - -```json -{ - "text": "how do refunds work?", - "topK": 5, - "filter": { "section": "billing" }, - "hybrid": true, - "lexicalWeight": 0.3, - "rerank": true -} -``` - -**Response** — `200` array of `SearchHit`, highest score first. - -**Scope merging.** The server sets `filter.catalogUid` to the path's -catalog UID unconditionally. Any caller-supplied `catalogUid` is -overridden — a search can never escape its catalog. Other filter -keys merge normally. - -**Hybrid + rerank lanes.** - -- `hybrid: true` runs the driver's combined vector + lexical lane. - Defaults to the bound store's `lexical.enabled`. Requires `text` — - the lexical signal has nothing to score against without it. - `lexicalWeight` (0..1, default 0.5) tunes how much the lexical - score contributes vs. the vector score. -- `rerank: true` post-processes the retrieval hits through the - driver's reranker. Defaults to the bound store's - `reranking.enabled`. Also requires `text`. - -Drivers can support either, both, or neither. - -- `mock` — supports both when the descriptor's `embedding.provider` - is `"mock"`. Hybrid and rerank are two separate phases in the - dispatcher. -- `astra` — supports hybrid natively via `findAndRerank` (astra- - db-ts's built-in API). Requires the descriptor to opt into both - `lexical.enabled: true` **and** `reranking.enabled: true` — the - collection is provisioned with a lexical index and reranker - service at create time. Standalone `rerank` is **not** exposed on - Astra because the Data API combines retrieval + reranking in one - call; callers that want rerank set `hybrid: true`. `lexicalWeight` - is ignored on Astra — the reranker owns the blend. A - `rerank: true` request against an Astra workspace therefore - returns 501 unless paired with `hybrid: true`. - -**Errors** - -- **400** `validation_error` — `vector` / `text` presence rules, - including "hybrid: true requires text" and "rerank: true requires - text" -- **400** `embedding_unavailable` — the fallback embedder could not be - built (text path only) -- **400** `embedding_dimension_mismatch` — provider returned a vector - whose length doesn't match the bound store's declared dim -- **404** `workspace_not_found` / `catalog_not_found` -- **404** `vector_store_not_found` — the binding exists but the - referenced store no longer does (stale binding) -- **409** `catalog_not_bound_to_vector_store` — `catalog.vectorStore` - is `null` -- **501** `hybrid_not_supported` / `rerank_not_supported` — the - workspace kind's driver doesn't implement the requested lane - -Text records written through `POST /ingest` carry a `catalogUid` -stamp on every chunk payload — that's what lets this route scope -correctly. The route also works against any records that carry a -matching `catalogUid` regardless of how they arrived. - -### `POST /ingest` +### `POST /{knowledgeBaseUid}/ingest` Synchronous end-to-end ingest. Chunks the input text, embeds every -chunk (server-side via `$vectorize` where the bound store supports -it, otherwise client-side via the descriptor's `embedding` config), -upserts the chunks into the bound vector store, and creates a -`Document` metadata row with `status: ready` + `chunkTotal`. +chunk through the KB's bound embedding service (server-side via +`$vectorize` where the driver supports it, otherwise client-side), +upserts the chunks into the KB's collection, and creates a +`RagDocument` row with `status: ready` + `chunkTotal`. **Request** @@ -905,11 +760,11 @@ upserts the chunks into the bound vector store, and creates a } ``` -All fields except `text` are optional. `chunker` overrides the -runtime defaults for this call only. `metadata` is merged onto every -chunk's payload; the reserved keys `catalogUid`, `documentUid`, and -`chunkIndex` are always set by the runtime and will override any -caller-supplied values. `text` is capped at 200,000 characters. +`chunker` overrides the runtime defaults for this call only. +`metadata` is merged onto every chunk's payload; the reserved keys +`knowledgeBaseUid`, `documentUid`, `chunkIndex`, and `chunkText` are +always set by the runtime and override any caller-supplied values. +`text` is capped at 200,000 characters. **Response 201** @@ -920,39 +775,22 @@ caller-supplied values. `text` is capped at 200,000 characters. } ``` -**Chunk payloads.** Every chunk upserted to the vector store carries: +**Chunk payloads.** Every chunk upserted carries: -- `catalogUid` — the catalog's UID (used by `/documents/search`) -- `documentUid` — the UID of the `Document` row this ingest created +- `knowledgeBaseUid` — the KB's UID (used by `/search`) +- `documentUid` — the UID of the `RagDocument` row this ingest created - `chunkIndex` — 0-based position within the source document -- `chunker.id` — the chunker impl that produced the slice - (`recursive-char:1` today) +- `chunkText` — the chunk's raw text (read back through `/chunks`) - Plus every caller-supplied `metadata` key -**Errors** - -- **400** `validation_error` — missing/empty `text`, bad chunker - config, or the Zod schema otherwise fails -- **400** `embedding_unavailable` — client-side embedding fallback - could not build an embedder (missing secret, etc.) -- **400** `embedding_dimension_mismatch` — embedder dimension - disagrees with the bound store -- **404** `workspace_not_found` / `catalog_not_found` -- **404** `vector_store_not_found` — stale binding (catalog points at - a deleted store) -- **409** `catalog_not_bound_to_vector_store` — `catalog.vectorStore` - is `null` - **Failure semantics.** When chunking or upsert throws, the -`Document` row is marked `status: failed` with `errorMessage` before -the error is re-raised. Operators can inspect the row via -`GET /documents/{documentUid}`. +`RagDocument` row is marked `status: failed` with `errorMessage` +before the error is re-raised. -### `POST /ingest?async=true` +### `POST /{knowledgeBaseUid}/ingest?async=true` -Same request body as the sync variant. The pipeline runs in the -background; the response returns immediately with a job pointer so -the UI doesn't block on long uploads. +Same body. The pipeline runs in the background; the response +returns immediately with a job pointer. **Response 202** @@ -962,7 +800,7 @@ the UI doesn't block on long uploads. "workspace": "…", "jobId": "…", "kind": "ingest", - "catalogUid": "…", + "knowledgeBaseUid": "…", "documentUid": "…", "status": "pending", "processed": 0, @@ -976,19 +814,13 @@ the UI doesn't block on long uploads. } ``` -Errors are the same set as the sync path — validation / -embedding / not-found / 409. A 4xx means the request was rejected -outright; nothing was enqueued and no job row exists. - -Once a job is running, failures are captured into the job record -(`status: failed`, `errorMessage` populated) and the document row -(also `status: failed`). The HTTP response has already been sent by -then. +Errors are the same set as the sync path. A 4xx means the request +was rejected outright; nothing was enqueued and no job row exists. -**Progress callbacks.** The background worker reports -`{processed, total}` via `JobStore.update`. Today it fires once -before upsert (`processed: 0`) and once after (`processed: total`); -later slices can emit per-batch updates without a contract change. +Once the job is running, failures are captured into the job record +(`status: failed`, `errorMessage` populated) and the document row. +The `runKbIngestJob` worker resolves the KB descriptor on every +call so renames or service swaps mid-flight don't drift. --- @@ -1028,13 +860,15 @@ persistent job backends. | `workspace` | uuid | Owning workspace | | `jobId` | uuid | | | `kind` | `"ingest"` | Discriminator — more kinds arrive with more async ops | -| `catalogUid` | uuid or null | Set for ingest jobs | +| `knowledgeBaseUid` | uuid or null | Set for ingest jobs | | `documentUid` | uuid or null | Set for ingest jobs | | `status` | `"pending"` \| `"running"` \| `"succeeded"` \| `"failed"` | Terminal: succeeded, failed | | `processed` | int | Units completed | | `total` | int or null | Units expected (null if unknown) | | `result` | object or null | Kind-specific summary on success (ingest: `{ chunks: N }`) | | `errorMessage` | string or null | Populated on `failed` | +| `leasedBy` | string or null | Replica id holding the lease on a `running` job (cross-replica resume) | +| `leasedAt` | iso-8601 or null | Last heartbeat from the lease holder | | `createdAt` | iso-8601 | | | `updatedAt` | iso-8601 | | @@ -1057,106 +891,26 @@ resume-worker promotes it to `failed`. Callers that need restart-resume today should treat any `running` job older than a heartbeat threshold as failed and resubmit. ---- - -## `/api/v1/workspaces/{workspaceUid}/catalogs/{catalogUid}/queries` - -Saved search recipes scoped to a catalog. Each `SavedQuery` carries a -`text` plus optional `topK` and `filter`, and is replayed through the -catalog-scoped search path by `POST /{queryUid}/run`. - -Deleting a workspace or catalog cascades to its saved queries (every -backend — memory, file, astra). - -A `SavedQuery`: - -```json -{ - "workspace": "…", - "catalogUid": "…", - "queryUid": "…", - "name": "refunds", - "description": "billing questions", - "text": "how do refunds work?", - "topK": 5, - "filter": { "section": "billing" }, - "createdAt": "…", - "updatedAt": "…" -} -``` - -Text-only by design — saved vectors are rarely the right abstraction -and serialize heavily. Callers wanting vector-form queries write the -search body directly against `POST /documents/search`. - -### `GET` - -List saved queries in the catalog. - -- **200** — paginated `SavedQuery` records -- **404** `workspace_not_found` / `catalog_not_found` - -### `POST` - -Create a saved query. `uid` is optional. - -```json -{ - "name": "refunds", - "description": "billing questions", - "text": "how do refunds work?", - "topK": 5, - "filter": { "section": "billing" } -} -``` - -- **201** — the created `SavedQuery` -- **404** `workspace_not_found` / `catalog_not_found` -- **409** `conflict` — `uid` collision within the same catalog - -### `GET /{queryUid}` / `PUT /{queryUid}` / `DELETE /{queryUid}` - -Fetch / patch / delete. `PUT` accepts every field from create (all -optional). Deleting a non-existent query returns -`404 saved_query_not_found`. - -### `POST /{queryUid}/run` - -Execute a saved query and return the hits. The catalog's UID is -merged into the effective filter — a saved filter carrying a -different `catalogUid` is silently overridden, so a saved query can -never escape its catalog. - -**Response 200** — array of `SearchHit` (same shape as -`/documents/search`). - -**Errors** - -- **400** `embedding_unavailable` / `embedding_dimension_mismatch` - (client-side embedding fallback path) -- **404** `workspace_not_found` / `catalog_not_found` / - `saved_query_not_found` / `vector_store_not_found` -- **409** `catalog_not_bound_to_vector_store` - ---- ## Planned routes These do not exist yet. Shapes may shift before they land. -The Phase 2 routes (saved queries CRUD + `/run`, async ingest, jobs -poll + SSE) and the Phase 3 playground dispatch (text/vector via the -existing `POST .../search` route) shipped in #53–#60 and are -documented above. - -### Phase 4+ — Chats, MCP +### Stage 2 — agents, conversations, messages -Reserved: +The schema for `wb_agentic_*` and `wb_config_llm_service_*` / +`wb_config_mcp_tools_*` is provisioned at boot but not yet wired +through the runtime. The route shapes are reserved: -- `/api/v1/workspaces/{w}/chats/…` -- `/api/v1/workspaces/{w}/mcp/…` +- `/api/v1/workspaces/{w}/llm-services` — CRUD +- `/api/v1/workspaces/{w}/mcp-tools` — CRUD +- `/api/v1/workspaces/{w}/agents` — CRUD; an agent composes one LLM + + a list of MCP tools + a list of knowledge bases +- `/api/v1/workspaces/{w}/agents/{a}/conversations` — CRUD; nested + `messages` resource +- `/api/v1/workspaces/{w}/agents/{a}/run` — execution loop -Contracts finalized as those phases approach. +See [`roadmap.md`](roadmap.md) for the phase plan. --- diff --git a/docs/architecture.md b/docs/architecture.md index e58fe74..44ad4a1 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1,43 +1,49 @@ # Architecture AI Workbench is a polyglot HTTP runtime sitting in front of Astra DB. -It exposes a stable `/api/v1/*` contract for workspaces, document -catalogs, vector-store descriptors, and (in later phases) documents, -ingestion, and search. Each **language-native implementation of the -runtime** is a "green box"; the default TypeScript green box is -embedded with the UI, and alternatives live under -[`runtimes/`](../runtimes/README.md). +It exposes a stable `/api/v1/*` contract for workspaces, knowledge +bases, execution services (chunking / embedding / reranking), +documents, ingestion, and search. Each **language-native +implementation of the runtime** is a "green box"; the default +TypeScript green box is embedded with the UI, and alternatives live +under [`runtimes/`](../runtimes/README.md). ## Design principles -1. **One HTTP contract, N runtimes.** Workspaces, catalogs, and - vector-store descriptors are defined by the HTTP API — not by any - one runtime's internals. Every language green box honors the same - contract, enforced by +1. **One HTTP contract, N runtimes.** Workspaces, knowledge bases, + execution services, and RAG documents are defined by the HTTP API + — not by any one runtime's internals. Every language green box + honors the same contract, enforced by [fixture-based conformance tests](./conformance.md). 2. **Thin, boring runtime core.** The runtime is an HTTP server + a pluggable control-plane store. Complexity lives in pluggable - services (chunking, embedding, reranking in later phases). + services bound to a knowledge base (chunking, embedding, + reranking). 3. **Workspaces are runtime data, not config.** `workbench.yaml` picks which control-plane backend to use; workspaces themselves are mutable records managed via the HTTP API. -4. **Driver-based control plane.** `memory` for CI and demos, `file` +4. **A KB owns its collection end-to-end.** Creating a knowledge + base auto-provisions the underlying Astra collection + (`wb_vectors_`), sized to the bound embedding service's + dimension; deleting the KB drops the collection. The control + plane and data plane never diverge. +5. **Driver-based control plane.** `memory` for CI and demos, `file` for single-node self-hosted, `astra` for production. Same contract. -5. **Astra-native where real.** The `astra` backend uses +6. **Astra-native where real.** The `astra` backend uses [`@datastax/astra-db-ts`](https://github.com/datastax/astra-db-ts) directly. The Python runtime uses [`astrapy`](https://github.com/datastax/astrapy). No wrapper libraries in between. -6. **Secrets by reference.** Credentials live behind +7. **Secrets by reference.** Credentials live behind `SecretRef` pointers (`env:FOO` / `file:/path`) resolved at use time by a pluggable provider. No raw secrets in config, records, or logs. -7. **Immutable records.** Every update returns a new object. The +8. **Immutable records.** Every update returns a new object. The in-memory backend holds `Map`; the file backend rewrites atomically; the astra backend does `$set` updates through the Data API. -8. **Contract-first for new surfaces.** The HTTP API is versioned +9. **Contract-first for new surfaces.** The HTTP API is versioned (`/api/v1/…`) and documented in [`api-spec.md`](api-spec.md) and the generated OpenAPI at `/api/v1/openapi.json`. @@ -93,21 +99,30 @@ All three pass the same shared contract suite in ### Vector-store drivers (`runtimes/typescript/src/drivers/`) Data-plane counterparts to the control-plane store. Where -`ControlPlaneStore` owns **descriptors**, the `VectorStoreDriver` -owns **actual vectors** on a per-workspace backend. +`ControlPlaneStore` owns **records** (workspaces, KBs, services, +RAG documents), the `VectorStoreDriver` owns **actual vectors** in +the per-KB Astra collection. | File | Purpose | |---|---| -| [`vector-store.ts`](../runtimes/typescript/src/drivers/vector-store.ts) | Driver interface — `createCollection`, `dropCollection`, `upsert`, `deleteRecord`, `search`, plus optional `searchByText`, `upsertByText`, `searchHybrid`, `rerank`, `listAdoptable` (adopt-existing), `listRecords` (chunks under a document), `deleteRecords` (delete-document cascade) | +| [`vector-store.ts`](../runtimes/typescript/src/drivers/vector-store.ts) | Driver interface — `createCollection`, `dropCollection`, `upsert`, `deleteRecord`, `search`, plus optional `searchByText`, `upsertByText`, `searchHybrid`, `rerank`, `listRecords` (chunks under a document), `deleteRecords` (delete-document cascade) | | [`mock/store.ts`](../runtimes/typescript/src/drivers/mock/store.ts) | In-memory driver; used by workspaces with `kind: "mock"` and by the conformance suite | | [`astra/store.ts`](../runtimes/typescript/src/drivers/astra/store.ts) | Data API Collections via `astra-db-ts`; per-workspace `DataAPIClient` cache, lazy init | | [`registry.ts`](../runtimes/typescript/src/drivers/registry.ts) | Dispatches based on `workspace.kind`; unknown kinds surface as `503 driver_unavailable` | | [`factory.ts`](../runtimes/typescript/src/drivers/factory.ts) | Wires the registry at startup from the `SecretResolver` | -`POST /api/v1/workspaces/{w}/vector-stores` is the transactional -entry point: it writes the descriptor, calls the driver to create -the collection, and rolls back the descriptor on failure so the -control plane and data plane never diverge. +The route layer in +[`api-v1/kb-descriptor.ts`](../runtimes/typescript/src/routes/api-v1/kb-descriptor.ts) +materialises a driver-facing descriptor on the fly from a KB plus +its bound embedding/reranking services. Drivers and the search / +upsert dispatch surfaces consume this synthesised shape unchanged — +they don't need to know KBs exist. + +`POST /api/v1/workspaces/{w}/knowledge-bases` is the transactional +entry point: it writes the KB row, calls the driver to create the +collection, and rolls back the row on failure so the control plane +and data plane never diverge. `DELETE` reverses this — drop the +collection first, then the row. Both drivers pass the same 8-assertion [driver contract suite](../runtimes/typescript/tests/drivers/contract.ts). The Astra @@ -117,7 +132,7 @@ gated on `ASTRA_DB_*` env vars and lives in a follow-up. ### Astra client (`runtimes/typescript/src/astra-client/`) -Thin layer over `astra-db-ts` scoped to the four `wb_*` tables: +Thin layer over `astra-db-ts` scoped to the `wb_*` tables: - [`table-definitions.ts`](../runtimes/typescript/src/astra-client/table-definitions.ts) — Data API Table DDL. @@ -129,7 +144,7 @@ Thin layer over `astra-db-ts` scoped to the four `wb_*` tables: narrow structural interface used by the astra store (lets tests inject fakes). - [`client.ts`](../runtimes/typescript/src/astra-client/client.ts) — `openAstraClient()`: - creates the four tables idempotently at init and returns a + creates the tables idempotently at init and returns a `TablesBundle`. The Python runtime has a symmetric internal layer that wraps @@ -154,8 +169,13 @@ talking to workspace-scoped backends. |---|---|---| | [`operational.ts`](../runtimes/typescript/src/routes/operational.ts) | (unversioned) | `/`, `/healthz`, `/readyz`, `/version` | | [`api-v1/workspaces.ts`](../runtimes/typescript/src/routes/api-v1/workspaces.ts) | `/api/v1/workspaces` | Workspace CRUD | -| [`api-v1/catalogs.ts`](../runtimes/typescript/src/routes/api-v1/catalogs.ts) | `/api/v1/workspaces/{w}/catalogs` | Catalog CRUD | -| [`api-v1/vector-stores.ts`](../runtimes/typescript/src/routes/api-v1/vector-stores.ts) | `/api/v1/workspaces/{w}/vector-stores` | Descriptor CRUD | +| [`api-v1/knowledge-bases.ts`](../runtimes/typescript/src/routes/api-v1/knowledge-bases.ts) | `/api/v1/workspaces/{w}/knowledge-bases` | KB CRUD (POST auto-provisions collection) | +| [`api-v1/kb-data-plane.ts`](../runtimes/typescript/src/routes/api-v1/kb-data-plane.ts) | `…/knowledge-bases/{kb}/{records,search}` | Upsert / delete record / search | +| [`api-v1/kb-documents.ts`](../runtimes/typescript/src/routes/api-v1/kb-documents.ts) | `…/knowledge-bases/{kb}/{documents,ingest}` | Document metadata, sync + async ingest, chunk listing | +| [`api-v1/kb-descriptor.ts`](../runtimes/typescript/src/routes/api-v1/kb-descriptor.ts) | — | `resolveKb()` — synthesises a driver-facing descriptor from a KB + bound services | +| [`api-v1/{chunking,embedding,reranking}-services.ts`](../runtimes/typescript/src/routes/api-v1/) | `…/{chunking,embedding,reranking}-services` | Service CRUD | +| [`api-v1/jobs.ts`](../runtimes/typescript/src/routes/api-v1/jobs.ts) | `/api/v1/workspaces/{w}/jobs` | Job poll + SSE stream | +| [`api-v1/api-keys.ts`](../runtimes/typescript/src/routes/api-v1/api-keys.ts) | `/api/v1/workspaces/{w}/api-keys` | Per-workspace API-key management | | [`api-v1/helpers.ts`](../runtimes/typescript/src/routes/api-v1/helpers.ts) | — | Error mapping (invoked from app-level `onError`) | Route handlers validate with Zod (via `@hono/zod-openapi`) and @@ -166,28 +186,64 @@ envelope. ## Data model -Four `wb_*` Data API tables backed by CQL-style schemas. The exact -DDL lives in +Data API tables backed by CQL-style schemas. The exact DDL lives in [`runtimes/typescript/src/astra-client/table-definitions.ts`](../runtimes/typescript/src/astra-client/table-definitions.ts); here's the logical shape: ``` -wb_workspaces PK (uid) - uid, name, url, kind, credentials_ref, keyspace, created_at, updated_at - -wb_catalog_by_workspace PK ((workspace), uid) - name, description, vector_store, created_at, updated_at +wb_workspaces PK (uid) + uid, name, endpoint, kind, credentials_ref, keyspace, + created_at, updated_at -wb_vector_store_by_workspace PK ((workspace), uid) - name, vector_dimension, vector_similarity, - embedding_{provider,model,endpoint,dimension,secret_ref}, +wb_config_knowledge_bases_by_workspace PK ((workspace_id), knowledge_base_id) + name, description, status, + embedding_service_id, chunking_service_id, reranking_service_id, + language, vector_collection, lexical_{enabled,analyzer,options}, - reranking_{enabled,provider,model,endpoint,secret_ref}, created_at, updated_at -wb_documents_by_catalog PK ((workspace, catalog_uid), document_uid) - source_*, file_*, md5_hash, chunk_total, ingested_at, updated_at, +wb_config_chunking_service_by_workspace PK ((workspace_id), chunking_service_id) + name, description, status, + engine, engine_version, strategy, + {min,max}_chunk_size, chunk_unit, + overlap_size, overlap_unit, preserve_structure, + language, max_payload_size_kb, + enable_ocr, extract_tables, extract_figures, reading_order, + endpoint_*, request_timeout_ms, auth_type, credential_ref, + created_at, updated_at + +wb_config_embedding_service_by_workspace PK ((workspace_id), embedding_service_id) + name, description, status, + provider, model_name, embedding_dimension, distance_metric, + max_batch_size, max_input_tokens, + supported_languages SET, supported_content SET, + endpoint_*, request_timeout_ms, auth_type, credential_ref, + created_at, updated_at + +wb_config_reranking_service_by_workspace PK ((workspace_id), reranking_service_id) + name, description, status, + provider, engine, model_name, model_version, + max_candidates, scoring_strategy, + score_normalized, return_scores, max_batch_size, + supported_languages SET, supported_content SET, + endpoint_*, request_timeout_ms, auth_type, credential_ref, + created_at, updated_at + +wb_rag_documents_by_knowledge_base PK ((workspace_id, knowledge_base_id), document_id) + source_*, file_*, content_hash, chunk_total, + ingested_at, updated_at, status, error_message, metadata + +wb_rag_documents_by_knowledge_base_and_status (secondary index, by status) +wb_rag_documents_by_content_hash (dedup lookup) + +wb_jobs_by_workspace PK ((workspace), job_id) + kind, knowledge_base_uid, document_uid, status, + processed, total, result_json, error_message, + leased_by, leased_at, ingest_input_json, + created_at, updated_at + +wb_api_key_by_workspace, wb_api_key_lookup (per-workspace tokens) ``` **`kind`** on workspaces is one of `astra | hcd | openrag | mock`. It @@ -196,11 +252,26 @@ later, when a single runtime routes requests to different data-plane backends per workspace). The runtime's own control plane is separate — chosen via `workbench.yaml`. -**`wb_vector_store_by_workspace` is a DESCRIPTOR row**, not the -vector data. The actual Data API Collection holding vectors is a -separate object, provisioned transactionally by the workspace's -vector-store driver (see the *Vector-store drivers* section above) -when the descriptor is created. +**Knowledge bases own their collection.** `vector_collection` on +the KB row is the auto-provisioned Astra collection name +(`wb_vectors_`, hyphen-stripped). The actual vector data +lives in that Data API Collection, provisioned transactionally +when the KB is created and dropped when it's deleted. + +**Reserved chunk-payload keys.** The KB-scoped ingest pipeline +stamps `knowledgeBaseUid`, `documentUid`, `chunkIndex`, and +`chunkText` onto every chunk's payload so KB-scoped search and the +chunk listing endpoint can filter / display them without a +secondary lookup. + +**Stage 2 schema.** Three additional tables — +`wb_config_llm_service_by_workspace`, +`wb_config_mcp_tools_by_workspace`, +`wb_agentic_agents_by_workspace`, +`wb_agentic_conversations_by_agent`, +`wb_agentic_messages_by_conversation` — are provisioned at boot but +are not yet wired through the runtime. They land with the agent +execution loop (roadmap Stage 2). ## Isolation and scoping @@ -210,14 +281,23 @@ when the descriptor is created. returning nested resources. Requests against a non-existent workspace return `404 workspace_not_found`. - Cascade delete: - - `DELETE /api/v1/workspaces/{w}` → drops the workspace, its - catalogs, its vector-store descriptors, its documents, and the - underlying vector-store collections. - - `DELETE /api/v1/workspaces/{w}/catalogs/{c}` → drops the - catalog and its documents. -- **Catalog → vector-store binding is N:1** (multiple catalogs may - share one underlying collection). This was a deliberate relaxation - from an earlier draft's strict 1:1 constraint. + - `DELETE /api/v1/workspaces/{w}` → drops the workspace, all + knowledge bases (and their underlying collections), all + execution services, all RAG documents, all API keys. + - `DELETE /api/v1/workspaces/{w}/knowledge-bases/{kb}` → drops + the underlying Astra collection first, then the KB row, then + cascades RAG document rows. +- **Service → KB binding is N:1.** A KB binds exactly one + embedding service, one chunking service, and (optionally) one + reranking service. Multiple KBs can share the same service. A + service deletion is refused (409) while any KB still references + it. +- **Service references are immutable post-create.** The + `embeddingServiceId` and `chunkingServiceId` on a KB are pinned + at creation time — vectors and chunks on disk are bound to the + models that produced them. Re-embedding requires a new KB; the + PUT schema is `.strict()` so accidentally including those keys + in an update body returns 400. ## Request flow (reference) @@ -248,12 +328,14 @@ Client ──► POST /api/v1/workspaces body={name, kind} c.json(record, 201) ``` -The catalog ingest pipeline (Phase 2b — shipped) extends the same -shape with calls to a `Chunker`, an `Embedder`, and the catalog's -bound vector store, plus a `Document` row that tracks ingest -status. Synchronous and async (`?async=true`) variants live at -`POST /catalogs/{c}/ingest`; the async path returns 202 with a job -pointer and updates progress through the `JobStore` until terminal. +The KB ingest pipeline extends the same shape with calls to a +`Chunker`, an `Embedder`, and the KB's auto-provisioned vector +collection (resolved through `resolveKb`), plus a `RagDocument` +row that tracks ingest status. Synchronous and async +(`?async=true`) variants live at +`POST /knowledge-bases/{kb}/ingest`; the async path returns 202 +with a job pointer and updates progress through the `JobStore` +until terminal. ## Conformance diff --git a/docs/configuration.md b/docs/configuration.md index 9df2429..810f0b0 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -4,9 +4,9 @@ Runtime behavior is driven by a single YAML file, conventionally named `workbench.yaml`. The runtime loads it at startup and validates it against a strict schema. -**Workspaces, catalogs, and vector stores are not in config.** They're -runtime data, mutable via the HTTP API. `workbench.yaml` decides two -things: +**Workspaces, knowledge bases, and execution services are not in +config.** They're runtime data, mutable via the HTTP API. +`workbench.yaml` decides two things: 1. Where that data is persisted (the **control-plane backend**). 2. Optionally, which **seed workspaces** to load into the memory @@ -83,7 +83,7 @@ Production deployments should start from ### `controlPlane` -Picks where workspaces, catalogs, vector-store descriptors, and +Picks where workspaces, knowledge bases, execution services, and RAG documents are persisted. Discriminated on `driver`. #### `memory` (default) @@ -132,7 +132,7 @@ multi-writer-safe. |-------|------|----------|-------| | `endpoint` | URL | yes | Astra Data API endpoint | | `tokenRef` | SecretRef | yes | Pointer to the application token (`env:…` / `file:…`) | -| `keyspace` | string | no (default `workbench`) | Keyspace hosting the four `wb_*` tables | +| `keyspace` | string | no (default `workbench`) | Keyspace hosting the `wb_*` control-plane tables | | `jobPollIntervalMs` | int (50–60000) | `500` | Cross-replica job-subscriber poll interval in ms. Each subscribed `(workspace, jobId)` pair is re-read at this cadence so SSE clients on a different replica from the worker still see updates. Same-replica updates fan out instantly; the poller is a no-op when no one is subscribed. Raise for cost-sensitive deployments where second-scale staleness is fine; lower for hot SSE paths. Astra-only — `memory` and `file` are single-replica by definition. | | `jobsResume` | object | off | Cross-replica orphan-sweeper config. See below. | diff --git a/docs/conformance.md b/docs/conformance.md index 9a2f176..aa73749 100644 --- a/docs/conformance.md +++ b/docs/conformance.md @@ -24,9 +24,10 @@ conformance/ ├── scenarios.md ← narrative counterpart ├── fixtures/ ← expected normalized responses │ ├── workspace-crud-basic.json -│ ├── catalog-under-workspace.json -│ ├── vector-store-definition.json -│ └── vector-store-upsert-and-search.json +│ ├── workspace-kind-is-immutable.json +│ ├── workspace-credentials-must-be-secret-ref.json +│ ├── workspace-test-connection-mock.json +│ └── workspace-api-key-lifecycle.json ├── mock-astra/ │ └── server.ts ← stand-in Astra endpoint (Node) ├── normalize.mjs ← shape-agnostic placeholder scrubber @@ -71,38 +72,33 @@ Current scenarios: | Slug | Covers | |---|---| | `workspace-crud-basic` | Workspace POST / GET / PUT / DELETE lifecycle | -| `catalog-under-workspace` | Catalogs scoped per workspace | -| `vector-store-definition` | Vector-store descriptor create + read | -| `vector-store-upsert-and-search` | Phase 1b data plane — upsert, search with payload filter, single-record delete (and re-delete noop) | -| `catalog-vector-store-reference-integrity` | Catalog bindings must reference existing same-workspace vector stores; referenced vector stores delete with `409 conflict` | -| `document-crud-basic` | Document metadata CRUD + cross-catalog isolation | | `workspace-kind-is-immutable` | Workspace `kind` cannot be changed after creation | | `workspace-credentials-must-be-secret-ref` | Raw credential values are rejected before reaching the SecretResolver | | `workspace-test-connection-mock` | Mock workspace connection probe response shape | | `workspace-api-key-lifecycle` | API-key issue, list, revoke, list lifecycle | -| `catalog-ingest-basic` | Sync ingest — chunk + embed + upsert + Document row, plus `409 catalog_not_bound_to_vector_store` on an unbound catalog | -| `catalog-scoped-document-search` | Search merges `catalogUid` into the filter; foreign-catalog records stay invisible; unbound catalogs return 409 | -| `catalog-saved-queries` | Saved-query CRUD + post-delete 404 | -| `vector-store-text-dispatch-mock` | Driver-native `searchByText` on a `mock` workspace with `embedding.provider: mock` | -| `vector-store-hybrid-and-rerank-mock` | `hybrid: true` + `rerank: true` + `lexicalWeight` lanes; `400 validation_error` for hybrid with a vector body | -| `catalog-async-ingest-202` | 202 wire shape for `?async=true` (job snapshot at creation time is deterministic; eventual completion stays in runtime tests) | - -The runtime additionally tests the following routes through its -own Vitest suite (timing- or driver-method-dependent, so they -don't fit the cross-runtime fixture model): - -- `GET /catalogs/{c}/documents/{d}/chunks` — driver-side + +The corpus shrank during the catalog → knowledge-base refactor: every +prior catalog / vector-store fixture was retired, and the +knowledge-base equivalents have not yet been authored. They will land +back as the new fixture set bakes in. Until then, the runtime +exercises every KB / services / ingest / search route through its +Vitest suite (`tests/knowledge-bases.test.ts`, `tests/ingest/`, +plus the route-level tests under `tests/`). + +Routes that stay runtime-only by design (timing- or +driver-method-dependent): + +- `GET /knowledge-bases/{kb}/documents/{d}/chunks` — driver-side `listRecords` filtered by `documentUid` -- `GET /vector-stores/discoverable` + `POST /vector-stores/adopt` - — driver-side `listAdoptable`, mocked -- `DELETE /catalogs/{c}/documents/{d}` — chunk-cascade via +- `DELETE /knowledge-bases/{kb}/documents/{d}` — chunk-cascade via driver `deleteRecords` These move into conformance once a second runtime starts implementing them and the fixture format proves stable across drivers. -More land as chat and MCP routes ship. +More land as KB scenarios are reauthored and as chat / MCP routes +ship. ## Fixtures @@ -246,7 +242,7 @@ The conformance harness above runs against the deterministic harness lives at [`runtimes/typescript/scripts/smoke-astra.ts`](../runtimes/typescript/scripts/smoke-astra.ts) that boots the runtime in-process against a **real** Astra Data API -and exercises the full workspace → vector-store → catalog → +and exercises the full workspace → services → knowledge-base → sync ingest → async ingest → search → cleanup pipeline. Run locally with: diff --git a/docs/cross-replica-jobs.md b/docs/cross-replica-jobs.md index f75bae8..c19ef47 100644 --- a/docs/cross-replica-jobs.md +++ b/docs/cross-replica-jobs.md @@ -32,11 +32,11 @@ terminal state. ## Today's behavior The async-ingest path lives in -[`runtimes/typescript/src/routes/api-v1/documents.ts`](../runtimes/typescript/src/routes/api-v1/documents.ts): +[`runtimes/typescript/src/routes/api-v1/kb-documents.ts`](../runtimes/typescript/src/routes/api-v1/kb-documents.ts): -1. `POST /catalogs/{c}/ingest?async=true` calls `jobs.create(...)`, - spawns `void runAsyncIngest({...})`, and returns 202 to the - caller with the job pointer. +1. `POST /knowledge-bases/{kb}/ingest?async=true` calls + `jobs.create(...)`, spawns `void runAsyncIngest({...})`, and + returns 202 to the caller with the job pointer. 2. The detached worker drives chunking → embedding → upsert, updating the job record via `jobsStore.update(...)` along the way. Failure modes flip the record to `failed` with a sanitized diff --git a/docs/green-boxes.md b/docs/green-boxes.md index 63f87ff..ac07d06 100644 --- a/docs/green-boxes.md +++ b/docs/green-boxes.md @@ -21,7 +21,7 @@ picks which one to target at deploy time via `BACKEND_URL`. | Runtime | Location | Status | Astra SDK | |---|---|---|---| -| **TypeScript** (default) | [`runtimes/typescript/`](../runtimes/typescript/) | Operational through Phase 3 + auth (UI, playground, API keys, OIDC login + silent refresh, vector/text search, hybrid + rerank, sync/async ingest with pipeline resume after orphan reclaim, durable JobStore with cross-replica subscription polling + lease/heartbeat + orphan sweeper, saved queries, chunks listing, document delete cascade, adopt-existing-collection flow) | `@datastax/astra-db-ts` | +| **TypeScript** (default) | [`runtimes/typescript/`](../runtimes/typescript/) | Operational through Phase 3 + auth (UI, playground, API keys, OIDC login + silent refresh, knowledge bases with auto-provisioned collections, chunking / embedding / reranking services, vector/text search, hybrid + rerank, sync/async ingest with pipeline resume after orphan reclaim, durable JobStore with cross-replica subscription polling + lease/heartbeat + orphan sweeper, chunks listing, document delete cascade) | `@datastax/astra-db-ts` | | **Python** | [`runtimes/python/`](../runtimes/python/) | FastAPI scaffold — routes return 501 until implemented | `astrapy` (pending) | | **Java** | [`runtimes/java/`](../runtimes/java/) | Spring Boot scaffold — routes return 501 until implemented | `astra-db-java` (pending) | @@ -44,9 +44,11 @@ Every green box serves: | `GET /docs` | OpenAPI reference UI | | `GET /api/v1/openapi.json` | Machine-readable OpenAPI 3.1 doc | | `(CRUD)` `/api/v1/workspaces[/{uid}]` | Workspace lifecycle | -| `(CRUD)` `/api/v1/workspaces/{w}/catalogs[/{uid}]` | Catalog lifecycle | -| `(CRUD)` `/api/v1/workspaces/{w}/vector-stores[/{uid}]` | Descriptor lifecycle (POST also provisions the collection) | -| `POST / DELETE / POST` | `/api/v1/workspaces/{w}/vector-stores/{v}/records`, `.../records/{rid}`, `.../search` | Data plane — upsert, delete, vector search | +| `(CRUD)` `/api/v1/workspaces/{w}/{chunking,embedding,reranking}-services[/{uid}]` | Service-definition lifecycle | +| `(CRUD)` `/api/v1/workspaces/{w}/knowledge-bases[/{uid}]` | KB lifecycle (POST auto-provisions the underlying vector collection; DELETE drops it) | +| `POST / DELETE / POST` | `/api/v1/workspaces/{w}/knowledge-bases/{kb}/records`, `.../records/{rid}`, `.../search` | Data plane — upsert, delete, vector / hybrid search | +| `(CRUD)` `/api/v1/workspaces/{w}/knowledge-bases/{kb}/documents[/{uid}]` | Document metadata + chunks listing under a KB | +| `POST` | `/api/v1/workspaces/{w}/knowledge-bases/{kb}/ingest[?async=true]` | Sync / async ingest pipeline | Full contract details: [`api-spec.md`](api-spec.md). diff --git a/docs/overview.md b/docs/overview.md index cf80d6b..e637d5a 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -2,9 +2,10 @@ AI Workbench is a self-hosted control center for building and operating retrieval-backed AI applications on DataStax Astra. It gives teams one -place to connect workspaces, organize source material, create vector -stores, ingest documents, test search behavior, and keep the same -workflow portable across runtime implementations. +place to connect workspaces, register chunking / embedding / reranking +services, compose them into knowledge bases, ingest documents, test +search behavior, and keep the same workflow portable across runtime +implementations. The goal is not to make operators think about runtimes first. The goal is to help a team get from "we have documents and embeddings" to "we can @@ -17,13 +18,14 @@ together a one-off admin app for every project. to manage. - **Connect Astra-backed stores** while keeping credentials outside records and config. -- **Model catalogs** around the content domains your application queries. +- **Define execution services** (chunking, embedding, reranking) once + per workspace and bind them into knowledge bases. +- **Spin up knowledge bases** that auto-provision an Astra collection + sized to the bound embedding service. - **Ingest documents** through sync or async flows with job status and server-sent progress updates. - **Test retrieval quality** in the browser with text, vector, hybrid, - and rerank search paths. -- **Save repeatable queries** so useful checks do not live only in a - developer's scratch file. + and rerank search paths against a chosen knowledge base. - **Run the same HTTP contract** from the default TypeScript runtime or another language-native runtime as the project evolves. @@ -40,7 +42,7 @@ runtime and UI: | Need | Workbench surface | |---|---| | Bring up a retrieval environment quickly | One Docker image with the UI and default runtime | -| Keep project data isolated | Workspace-scoped catalogs, vector stores, documents, jobs, and API keys | +| Keep project data isolated | Workspace-scoped knowledge bases, services, documents, jobs, and API keys | | Avoid storing secrets in records | `SecretRef` pointers such as `env:OPENAI_API_KEY` and `file:/path` | | Inspect search behavior | Playground for text, vector, hybrid, and rerank queries | | Move from demo to production | Memory, file, and Astra-backed control-plane stores | @@ -51,9 +53,10 @@ runtime and UI: AI Workbench has three connected surfaces: 1. **Workspace management.** Create and configure the spaces that own - catalogs, vector stores, documents, saved queries, jobs, and API keys. -2. **Knowledge operations.** Ingest content, track status, bind catalogs - to vector stores, and keep the operational state visible. + knowledge bases, execution services, documents, jobs, and API keys. +2. **Knowledge operations.** Compose chunking + embedding + reranking + services into a knowledge base, ingest content into it, track job + status, and keep the operational state visible. 3. **Retrieval playground.** Try real searches against real workspace data before wiring the same API into an application. @@ -70,8 +73,9 @@ npm run dev ``` Then open the bundled UI at `http://localhost:8080`, create a workspace, -add a vector store, ingest content from the workspace detail page, and -use the playground to inspect the results. +register at least one chunking + embedding service, create a knowledge +base that binds them, ingest content from the workspace detail page, +and use the playground to inspect the results. The generated API reference is available from the running runtime at `http://localhost:8080/docs`, and the machine-readable contract is diff --git a/docs/playground.md b/docs/playground.md index f870a64..9d1d60e 100644 --- a/docs/playground.md +++ b/docs/playground.md @@ -1,10 +1,11 @@ # Playground The playground is a browser scratchpad for running ad-hoc vector -and text queries against a workspace's vector stores. It's the -"aha moment" path for the product — after onboarding a workspace -and upserting data (via API or an external ingester), open -[`/playground`](../apps/web/README.md) to see what the store +and text queries against a workspace's knowledge bases. It's the +"aha moment" path for the product — after onboarding a workspace, +registering a chunking + embedding service, creating a knowledge +base that binds them, and ingesting some content, open +[`/playground`](../apps/web/README.md) to see what the KB actually returns. No persistence. Nothing is saved between queries. If you want a @@ -13,15 +14,15 @@ repeatable run, script it against the same HTTP API the UI uses. ## UI flow 1. Pick a workspace. -2. Pick one of its vector stores. The form unlocks. +2. Pick one of its knowledge bases. The form unlocks. 3. **Text tab** — type a query. The runtime embeds it (see [Dispatch](#dispatch) below) and runs an ANN search. Useful - when the store's `embedding` block points at a provider the + when the KB's bound embedding service points at a provider the runtime can reach (OpenAI today). 4. **Vector tab** — paste a raw vector. The runtime sends it straight through to the driver. Useful for debugging, for - stores with no `embedding` config, or when you want to sanity- - check a specific coordinate. + KBs whose embedding service the runtime can't currently reach, + or when you want to sanity-check a specific coordinate. 5. **Top-K** (1–25) and an **optional filter** (JSON object, shallow-equal over payload) round out the knobs. 6. Hit Run. Results land in a table; each row expands to show the @@ -29,7 +30,7 @@ repeatable run, script it against the same HTTP API the UI uses. ## Dispatch -`POST /api/v1/workspaces/{w}/vector-stores/{vs}/search` accepts +`POST /api/v1/workspaces/{w}/knowledge-bases/{kb}/search` accepts either `{ vector }` or `{ text }` (exactly one). When the request carries a vector it goes straight to `driver.search()`. Text queries pick one of two paths: @@ -40,7 +41,7 @@ queries pick one of two paths: (e.g. Astra's `$vectorize`). Nothing about the vector reaches the runtime. 2. **Client-side embedding** — otherwise, the runtime builds an - `Embedder` from the vector store's `embedding` config, embeds + `Embedder` from the KB's bound embedding-service config, embeds the text locally via the Vercel AI SDK, then does a normal vector search. @@ -64,14 +65,15 @@ Vercel AI SDK: ```ts interface Embedder { readonly id: string; // e.g. "openai:text-embedding-3-small" - readonly dimension: number; // matched against the vector store's declared dim + readonly dimension: number; // matched against the KB's declared dim embed(text: string): Promise; embedMany(texts: readonly string[]): Promise; } ``` -The factory (`EmbedderFactory.forConfig(config)`) takes a vector -store's `EmbeddingConfig` and returns an `Embedder`. It resolves +The factory (`EmbedderFactory.forConfig(config)`) takes an +embedding-service `EmbeddingConfig` (resolved from the KB's +`embeddingServiceId`) and returns an `Embedder`. It resolves the `secretRef` through the existing `SecretResolver`, then dispatches on `provider`. Today: OpenAI. Adding another provider (Cohere, Voyage, Bedrock, …) is one `npm install @ai-sdk/` @@ -81,22 +83,22 @@ Errors surface as `EmbedderUnavailableError` (`400 embedding_unavailable`) when the config is missing a secret or names an unsupported provider, and `embedding_dimension_mismatch` (`400`) when the provider returns a vector whose length doesn't -match the vector store's declared dimension. +match the KB's declared dimension. ## Astra vectorize Astra's Data API can do the embedding itself when a collection is created with a `vector.service` block. The driver detects this -path from the descriptor's `embedding` config: when the provider +path from the KB's embedding-service config: when the provider is one of `openai`, `azureOpenAI`, `cohere`, `jinaAI`, `mistral`, `nvidia`, `voyageAI` (allowlist in [`drivers/astra/vectorize.ts`](../runtimes/typescript/src/drivers/astra/vectorize.ts)) **and** a `secretRef` is configured, the driver: -1. At `createCollection` time, attaches +1. At KB-create / `createCollection` time, attaches `{ provider, modelName }` to the collection's `vector.service`. - New collections under this runtime get server-side embedding by - default. + New KB collections under this runtime get server-side embedding + by default. 2. At `searchByText` time, resolves the embedding secret, opens the collection handle with `embeddingApiKey: `, and runs `find(sort: { $vectorize: text })`. The runtime never @@ -125,9 +127,9 @@ Upsert uses the same dispatch: - `{id, vector, payload}` → `driver.upsert` (unchanged) - `{id, text, payload}` → `driver.upsertByText` first (Astra `$vectorize` on insertMany, mock driver's pseudo-embed when - the descriptor opts in). On `NotSupportedError` — unsupported - provider or legacy collection — the route embeds client-side - via the Vercel AI SDK and retries through plain `upsert`. + the KB opts in). On `NotSupportedError` — unsupported provider + or legacy collection — the route embeds client-side via the + Vercel AI SDK and retries through plain `upsert`. - Mixed batches → client-embed the text records, combine with the vector records, one transactional `upsert` call. (Splitting across `upsertByText` + `upsert` would break transactional @@ -135,8 +137,9 @@ Upsert uses the same dispatch: ## Hybrid + rerank toggles -The query form exposes two optional toggles when the bound vector -store has the relevant capabilities enabled on its descriptor: +The query form exposes two optional toggles when the bound knowledge +base has the relevant capabilities enabled (lexical configured on +the KB, reranking service bound): - **Hybrid** — flips `hybrid: true` on the search request. The driver runs a combined vector + lexical lane. On `astra` this @@ -151,20 +154,21 @@ store has the relevant capabilities enabled on its descriptor: request body. Step is `0.05`. Honored on `mock`; ignored on `astra` (the reranker owns the blend, so any value the slider sends is dropped server-side). -- **Rerank** — flips `rerank: true`. On `mock` this is a - standalone post-processing phase over the retrieval hits. On - `astra` standalone rerank is **not** exposed — pair `rerank` - with `hybrid: true` to get the combined Astra path; otherwise - the API returns 501. - -Both toggles default to the bound store's descriptor-level -`lexical.enabled` / `reranking.enabled`. Drivers that lack the -relevant method return 501 (`hybrid_not_supported` / -`rerank_not_supported`); the UI surfaces these as a toast. +- **Rerank** — flips `rerank: true`. Requires the KB to have a + `rerankingServiceId` bound. On `mock` this is a standalone + post-processing phase over the retrieval hits. On `astra` + standalone rerank is **not** exposed — pair `rerank` with + `hybrid: true` to get the combined Astra path; otherwise the + API returns 501. + +Both toggles default to the bound KB's `lexical.enabled` / +`rerankingServiceId != null`. Drivers that lack the relevant +method return 501 (`hybrid_not_supported` / `rerank_not_supported`); +the UI surfaces these as a toast. ## Hits are chunks, not documents -The vector store indexes at the chunk level. A document ingested +The KB indexes at the chunk level. A document ingested with three paragraphs becomes three chunks; a search query can return all three as separate hits. The results table reflects that shape directly: each row shows the chunk's `chunkIndex` (its @@ -173,49 +177,39 @@ shape directly: each row shows the chunk's `chunkIndex` (its row to expand the full payload and score. To browse chunks **under** a specific document — for inspection, -not search — open the catalog explorer's document detail dialog -(click any row in the documents table). The detail dialog lists -the chunks under that document directly, sorted by `chunkIndex`, -sourced from `GET /catalogs/{c}/documents/{d}/chunks`. +not search — open the KB documents view and click any row in the +documents table. The detail dialog lists the chunks under that +document directly, sorted by `chunkIndex`, sourced from +`GET /knowledge-bases/{kb}/documents/{d}/chunks`. -## Catalog ingest from the workspace UI +## Knowledge base ingest from the workspace UI Ingest now has a dedicated UI surface, complementing the data-plane `POST .../records` upsert path: -- **Workspace detail → Catalogs → Ingest** (or **Open** → catalog - explorer → **Ingest**) opens a multi-file / folder queue. Drop +- **Workspace detail → Knowledge Bases → Ingest** (or **Open** → KB + detail → **Ingest**) opens a multi-file / folder queue. Drop files (or pick a folder via the directory picker) and they - ingest sequentially through the bound vector store. The queue - accepts plain-text documents, data, config, and source files such - as Markdown, YAML, TOML, JSON, CSV, logs, SQL, and TypeScript. - Each row shows live progress for the active file and terminal - status for everything before it. + ingest sequentially through the KB's bound chunking + embedding + services. The queue accepts plain-text documents, data, config, + and source files such as Markdown, YAML, TOML, JSON, CSV, logs, + SQL, and TypeScript. Each row shows live progress for the active + file and terminal status for everything before it. - Async ingest jobs stream progress via the SSE `GET .../jobs/{jobId}/events` endpoint until a terminal state. The dialog renders the live `processed/total` counter and surfaces the final `status` + `errorMessage`. The playground stays a scratchpad — no ingest in the playground -itself. Use the workspace UI to populate a catalog, then come back +itself. Use the workspace UI to populate a KB, then come back to the playground to query it. ## Document delete cascade -The catalog explorer's per-row trash button removes a document -**and** its chunks. The runtime resolves the catalog → bound -vector store and runs `deleteRecords` on the driver before -dropping the document row, so deleted documents stop surfacing in -catalog-scoped search hits immediately. - -## Saved queries - -Saved queries live under a **catalog**, not the playground. CRUD -+ `POST /{q}/run` ship under -`/api/v1/workspaces/{w}/catalogs/{c}/queries`; the workspace UI -exposes a panel to create/edit/run them. The playground itself -intentionally stays stateless — it's the scratchpad, saved -queries are the "I want to keep this around" bucket. +The KB documents view's per-row trash button removes a document +**and** its chunks. The runtime runs `deleteRecords` on the KB's +driver before dropping the document row, so deleted documents stop +surfacing in KB-scoped search hits immediately. ## Future extensions diff --git a/docs/roadmap.md b/docs/roadmap.md index f0872bc..0cadc1d 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -8,13 +8,14 @@ runnable artifact and a stable slice of the HTTP contract. | Phase | Scope | Status | |---|---|---| | 0 | Runtime bootstrap + docs | ✅ Shipped | -| 1a | Control-plane CRUD (`/api/v1/workspaces`, `/catalogs`, `/vector-stores`) | ✅ Shipped | -| 1b | Vector-store data plane (provisioning, upsert, search) | ✅ Shipped | -| 2a | Document metadata CRUD (`/catalogs/{c}/documents`) | ✅ Shipped | -| 2b | Ingest + catalog-scoped search + saved queries + cross-replica jobs + adopt + document chunks/delete cascade | ✅ Shipped | +| 1a | Control-plane CRUD (`/api/v1/workspaces`, `/catalogs`, `/vector-stores`) | ✅ Shipped (later refactored — see Phase KB) | +| 1b | Vector-store data plane (provisioning, upsert, search) | ✅ Shipped (later refactored — see Phase KB) | +| 2a | Document metadata CRUD (`/catalogs/{c}/documents`) | ✅ Shipped (later refactored — see Phase KB) | +| 2b | Ingest + catalog-scoped search + saved queries + cross-replica jobs + adopt + document chunks/delete cascade | ✅ Shipped (saved queries / adopt retired in Phase KB) | | 2c | Server-side embedding (Astra `$vectorize`) for search + upsert | ✅ Shipped | | 3 | Playground + UI | ✅ Shipped | | Auth | Middleware, API keys, OIDC verifier, browser login, silent refresh | ✅ Shipped (1–3c); 4 (RBAC) planned | +| KB | Catalogs + vector-store descriptors → knowledge bases + chunking/embedding/reranking services | ✅ Shipped | | 4+ | Chats, MCP | Reserved | ## Phase 0 — Bootstrap ✅ @@ -269,11 +270,53 @@ workspace UI rather than the playground itself): the workspace UI. The playground itself remains a stateless scratchpad by design. +## Phase KB — Knowledge bases & execution services ✅ + +Refactored the catalog / vector-store / saved-query model into a +single first-class concept: the **knowledge base**. A KB owns its +Astra collection end-to-end and binds the chunking + embedding + +(optional) reranking services that produce its content. + +Shipped: + +- **Knowledge bases.** New `wb_config_knowledge_bases_by_workspace` + table. KB create transactionally provisions the underlying + `wb_vectors_` collection through the workspace's driver, + using the bound embedding service to determine vector dimensions + and similarity. KB delete drops the collection and cascades RAG + documents. +- **Execution services.** Three new tables — + `wb_config_chunking_service_by_workspace`, + `wb_config_embedding_service_by_workspace`, + `wb_config_reranking_service_by_workspace`. Multiple KBs may + share a service definition; deleting an in-use embedding / + chunking service is blocked with `409 conflict`. +- **Service immutability for vector-determining bindings.** A KB's + `embeddingServiceId` and `chunkingServiceId` are pinned at create + time (the collection's dimensions follow the embedding service); + `rerankingServiceId` stays mutable. +- **`resolveKb` synthesis layer.** Existing driver / dispatch / + ingest code keeps a vector-store-shaped descriptor view by + resolving a KB + its bound services on demand, so the data-plane + surface stayed stable across the refactor. +- **Routes.** All catalog / vector-store / saved-query routes + retired in favor of: + - `/api/v1/workspaces/{w}/{chunking,embedding,reranking}-services` + - `/api/v1/workspaces/{w}/knowledge-bases[/{kb}]` + - `.../knowledge-bases/{kb}/{records,search,documents,ingest}` +- **UI.** Catalogs panel + vector-stores panel removed; replaced + with `KnowledgeBasesPanel` and `ServicesPanel`. Playground now + picks a KB rather than a vector-store descriptor. + +Saved queries and the adopt-existing-collection flow were retired +in this phase — the new shape doesn't need them, and re-adding +either would land cleaner under the new model than as a port. + ## Phase 4+ — Chats, MCP Reserved for integrating: -- A chat harness that runs against a workspace's catalogs. +- A chat harness that runs against a workspace's knowledge bases. - An MCP server view of the workspace for external agents. Contracts will be defined as those phases approach. diff --git a/docs/workspaces.md b/docs/workspaces.md index 731b99e..c0a57ff 100644 --- a/docs/workspaces.md +++ b/docs/workspaces.md @@ -1,8 +1,9 @@ # Workspaces A **workspace** is the unit of isolation in AI Workbench — a named -tenant that owns its own catalogs, vector-store descriptors, -documents, saved queries, and async-ingest jobs. +tenant that owns its own knowledge bases, execution services +(chunking / embedding / reranking), RAG documents, async-ingest jobs, +and API keys. Workspaces are **runtime records**, not config. They're created via `POST /api/v1/workspaces`, fetched via `GET /api/v1/workspaces/{uid}`, @@ -39,11 +40,11 @@ DELETE /api/v1/workspaces/{uid} → cascade delete `DELETE` cascades to: -- Every catalog under the workspace. -- Every vector-store descriptor under the workspace, after dropping its - underlying collection through the workspace's driver. -- Every document under any of those catalogs. -- Every saved query under any of those catalogs. +- Every knowledge base under the workspace, after dropping each KB's + underlying vector collection through the workspace's driver. +- Every RAG document registered against any of those knowledge bases. +- Every chunking, embedding, and reranking service definition under the + workspace. - Every async-ingest job record scoped to the workspace. - Every workspace API key issued from the workspace. @@ -51,11 +52,11 @@ DELETE /api/v1/workspaces/{uid} → cascade delete - A request carrying workspace UID `A` can never read or mutate resources in workspace `B`. Nested routes call - `ControlPlaneStore.listCatalogs(workspace)` / `…getCatalog(workspace, - uid)` etc. and the store asserts the workspace exists before - returning anything. + `ControlPlaneStore.listKnowledgeBases(workspace)` / + `…getKnowledgeBase(workspace, uid)` etc. and the store asserts the + workspace exists before returning anything. - Logs carry `requestId`. Structured OTel attributes (workspaceUid, - catalogUid, jobId) are on the cross-cutting observability + knowledgeBaseUid, jobId) are on the cross-cutting observability workstream — see [`roadmap.md`](roadmap.md). ### `kind` @@ -77,10 +78,10 @@ service. **`kind` is immutable after creation.** `PUT /api/v1/workspaces/{uid}` rejects a `kind` field with `400`. Changing a workspace's kind would -orphan any vector-store collections already provisioned on the -original backend — there's no safe way to transparently migrate them, -so the runtime doesn't try. Delete and recreate the workspace if the -backend needs to change. +orphan any KB collections already provisioned on the original backend +— there's no safe way to transparently migrate them, so the runtime +doesn't try. Delete and recreate the workspace if the backend needs to +change. ### `name` and `endpoint` @@ -89,7 +90,7 @@ backend needs to change. display the name but disambiguate by uid when needed. - `endpoint` is the **data-plane URL** for this workspace's backend. For `astra` / `hcd` workspaces it's the Astra Data API endpoint - the vector-store driver dials (`https://-.apps.astra.datastax.com`). + the KB driver dials (`https://-.apps.astra.datastax.com`). Each Astra DB has its own endpoint — put one workspace per DB to route correctly. - `endpoint` accepts either a **literal URL** or a **SecretRef** @@ -123,37 +124,48 @@ Every value in the map must match the `:` shape — `400`. The runtime resolves refs through its `SecretResolver` at the moment the workspace's backend needs to be contacted. -## Catalogs and vector stores +## Knowledge bases and execution services A workspace owns: -- **Vector-store descriptors** — the `wb_vector_store_by_workspace` - rows. Each declares dimensions, similarity, embedding config, - lexical config, reranking config. These are *descriptors*, not the - vector data itself — the underlying Data API Collection is - provisioned transactionally by the workspace's vector-store driver - when the descriptor is created. -- **Catalogs** — named document collections, each optionally - `vectorStore`-bound to one of the workspace's descriptors. +- **Knowledge bases** — the `wb_config_knowledge_bases_by_workspace` + rows. Each KB pins an embedding service (which determines the + dimensions and similarity metric of its vector collection) and a + chunking service, and may optionally bind a reranking service. A + KB's underlying Astra collection (`wb_vectors_`) is + provisioned transactionally when the KB is created and dropped when + it is deleted. +- **Execution services** — three families of `wb_config_*_service_by_workspace` + rows describing the chunking, embedding, and reranking + implementations available to KBs in this workspace. + +### Knowledge base ↔ service binding (N:1) + +**Multiple knowledge bases may share one service definition.** A KB +holds: + +- `embeddingServiceId` (required, **immutable** after KB create — the + vector collection's dimensions are pinned at provisioning time) +- `chunkingServiceId` (required, immutable) +- `rerankingServiceId` (optional, mutable — reranking is applied at + query time and can be added/removed without affecting stored + vectors) -### Catalog ↔ vector-store binding (N:1) - -**Multiple catalogs may share one vector store.** This was a -deliberate relaxation from an earlier draft's strict 1:1 constraint. The store enforces: -- A catalog's `vectorStore` field (if non-null) must reference a - vector store in the same workspace. -- `DELETE` a vector store is blocked with `409 conflict` while any - catalog references it. Clear or move the catalog binding first, then - delete the vector store. +- A KB's `embeddingServiceId` and `chunkingServiceId` must reference + services in the same workspace. +- `DELETE` on an embedding or chunking service is blocked with + `409 conflict` while any KB references it. Reassign or delete the + KBs first, then delete the service. The relationship: ``` -workspace ──► catalog ──► vector-store descriptor (N:1) - │ - └──► documents +workspace ──► knowledge base ──► chunking service (N:1) + │ ──► embedding service (N:1) + │ ──► reranking service (N:1, optional) + └──► RAG documents ``` ## Seeding workspaces for local dev @@ -178,21 +190,35 @@ current count of workspaces, not a list. Listing is at `GET ## Example session -Create a mock workspace, add a catalog, list: +Create a mock workspace, register a chunking + embedding service, +create a KB binding them, list: ```bash WS_BODY='{"name":"demo","kind":"mock"}' WS_UID=$(curl -s -X POST http://localhost:8080/api/v1/workspaces \ -H "content-type: application/json" -d "$WS_BODY" | jq -r .uid) -CAT_BODY='{"name":"support"}' -curl -s -X POST http://localhost:8080/api/v1/workspaces/$WS_UID/catalogs \ - -H "content-type: application/json" -d "$CAT_BODY" +CHUNK_BODY='{"name":"default-chunker","provider":"mock"}' +CHUNK_UID=$(curl -s -X POST \ + http://localhost:8080/api/v1/workspaces/$WS_UID/chunking-services \ + -H "content-type: application/json" -d "$CHUNK_BODY" | jq -r .uid) + +EMBED_BODY='{"name":"default-embedder","provider":"mock","dimensions":1536,"similarity":"cosine"}' +EMBED_UID=$(curl -s -X POST \ + http://localhost:8080/api/v1/workspaces/$WS_UID/embedding-services \ + -H "content-type: application/json" -d "$EMBED_BODY" | jq -r .uid) + +KB_BODY=$(jq -n --arg c "$CHUNK_UID" --arg e "$EMBED_UID" \ + '{name:"support",chunkingServiceId:$c,embeddingServiceId:$e}') +curl -s -X POST \ + http://localhost:8080/api/v1/workspaces/$WS_UID/knowledge-bases \ + -H "content-type: application/json" -d "$KB_BODY" -curl -s http://localhost:8080/api/v1/workspaces/$WS_UID/catalogs +curl -s http://localhost:8080/api/v1/workspaces/$WS_UID/knowledge-bases ``` -Delete the workspace — the catalog goes with it: +Delete the workspace — the KB, its collection, the services, and any +documents go with it: ```bash curl -X DELETE http://localhost:8080/api/v1/workspaces/$WS_UID diff --git a/runtimes/README.md b/runtimes/README.md index 71e5f75..c8bef99 100644 --- a/runtimes/README.md +++ b/runtimes/README.md @@ -22,7 +22,7 @@ Astra Data API (via language-native SDK: astrapy, astra-db-ts, …) | Runtime | Path | Status | Astra SDK | |---|---|---|---| -| TypeScript | [`typescript/`](./typescript/) | Operational through Phase 3 + auth (UI, playground, API keys, OIDC login + silent refresh, vector/text search, hybrid + rerank, sync/async ingest with pipeline resume after orphan reclaim, durable JobStore with cross-replica subscription polling + lease/heartbeat + orphan sweeper, saved queries, chunks listing, document delete cascade, adopt-existing-collection flow) | `@datastax/astra-db-ts` | +| TypeScript | [`typescript/`](./typescript/) | Operational through Phase 3 + auth (UI, playground, API keys, OIDC login + silent refresh, knowledge bases with auto-provisioned collections, chunking / embedding / reranking services, vector/text search, hybrid + rerank, sync/async ingest with pipeline resume after orphan reclaim, durable JobStore with cross-replica subscription polling + lease/heartbeat + orphan sweeper, chunks listing, document delete cascade) | `@datastax/astra-db-ts` | | Python | [`python/`](./python/) | Scaffold — routes return 501 until implemented | `astrapy` (pending) | | Java | [`java/`](./java/) | Scaffold (Spring Boot) — routes return 501 until implemented | `astra-db-java` (pending) | diff --git a/runtimes/java/README.md b/runtimes/java/README.md index bb6201b..6227b83 100644 --- a/runtimes/java/README.md +++ b/runtimes/java/README.md @@ -60,13 +60,14 @@ corresponds to one step in [`../../conformance/scenarios.md`](../../conformance/scenarios.md). Suggested order: -1. `POST /api/v1/workspaces` — scenario 1 step 1. Add an `astra` package - that wraps `astra-db-java` for the `wb_*` tables, and wire it into - `WorkspaceController`. -2. `GET` / `PUT` / `DELETE` for workspaces — completes scenario 1. -3. Catalog routes — completes scenario 2. -4. Vector-store descriptor routes — completes scenario 3. -5. Vector-store data plane + documents — scenarios 4, 5. +1. `POST /api/v1/workspaces` — scenario `workspace-crud-basic` step 1. + Add an `astra` package that wraps `astra-db-java` for the `wb_*` + tables, and wire it into `WorkspaceController`. +2. `GET` / `PUT` / `DELETE` for workspaces — completes the workspace + scenarios. +3. Chunking / embedding / reranking service CRUD. +4. Knowledge-base CRUD with auto-provisioned vector collections. +5. KB data plane + documents + ingest. Every time you flip a conformance test green, remove its `@Disabled` annotation in @@ -151,11 +152,11 @@ runtimes/java/ │ │ │ ├── web/ │ │ │ │ └── RequestIdFilter.java ← X-Request-Id │ │ │ ├── model/ ← records mirroring TS types -│ │ │ └── routes/ +│ │ │ └── routes/ ← scaffold; align with TS routes when implemented │ │ │ ├── OperationalController.java ← working: /healthz, /readyz, /version, / │ │ │ ├── WorkspaceController.java ← 501 stubs -│ │ │ ├── CatalogController.java ← 501 stubs -│ │ │ ├── VectorStoreController.java ← 501 stubs +│ │ │ ├── ServicesController.java ← chunking/embedding/reranking — 501 stubs +│ │ │ ├── KnowledgeBaseController.java ← 501 stubs │ │ │ └── DocumentController.java ← 501 stubs │ │ └── resources/ │ │ └── application.yml diff --git a/runtimes/python/README.md b/runtimes/python/README.md index 8e6160f..127f449 100644 --- a/runtimes/python/README.md +++ b/runtimes/python/README.md @@ -69,12 +69,15 @@ step in [`../../conformance/scenarios.md`](../../conformance/scenarios.md). Suggested order: -1. `POST /api/v1/workspaces` — scenario 1 step 1. Plumb astrapy into - [`workbench/astra.py`](./src/workbench/) (new file) and wire the - route in [`workbench/routes/workspaces.py`](./src/workbench/routes/workspaces.py). -2. `GET` / `PUT` / `DELETE` for workspaces — completes scenario 1. -3. Catalog routes — completes scenario 2. -4. Vector-store descriptor routes — completes scenario 3. +1. `POST /api/v1/workspaces` — scenario `workspace-crud-basic` step 1. + Plumb astrapy into [`workbench/astra.py`](./src/workbench/) (new + file) and wire the route in + [`workbench/routes/workspaces.py`](./src/workbench/routes/workspaces.py). +2. `GET` / `PUT` / `DELETE` for workspaces — completes the workspace + scenarios. +3. Chunking / embedding / reranking service CRUD. +4. Knowledge-base CRUD with auto-provisioned vector collections. +5. KB data plane — records, search, documents, ingest. Each time you flip a conformance test green, remove its `@pytest.mark.xfail` decorator in @@ -157,10 +160,10 @@ runtimes/python/ │ ├── config.py ← env-var resolution (ASTRA_*, WORKBENCH_*) │ ├── errors.py ← ApiError + subclasses + HTTP mapping │ ├── models.py ← Pydantic models mirroring TS types -│ └── routes/ +│ └── routes/ ← scaffold; align with TS routes when implemented │ ├── workspaces.py -│ ├── catalogs.py -│ ├── vector_stores.py +│ ├── services.py ← chunking / embedding / reranking +│ ├── knowledge_bases.py │ └── documents.py └── tests/ ├── conftest.py ← FastAPI + mock-astra wiring