diff --git a/docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md b/docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md index 070a2a24c..27d02a26a 100644 --- a/docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md +++ b/docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md @@ -200,7 +200,7 @@ backend-specific synthesizer translates the canonical LCORE vocabulary to its target shape; we do not adopt either backend's surface verbatim. **Pydantic AI research findings** (full report: -[`poc-results/pydantic-ai-research.md`](poc-results/pydantic-ai-research.md), +[`pydantic-ai-research.md`](https://github.com/lightspeed-core/lightspeed-stack/blob/42844d068b488cc7d72928068b5606a7941f8c15/docs/design/llama-stack-config-merge/poc-results/pydantic-ai-research.md), pass dated 2026-05-20 against `pydantic-ai 1.98.0`): - Pydantic AI's per-Agent `:` string + `Provider(...)` @@ -863,15 +863,18 @@ rebuild time was impractical. ### Results -Full evidence bundle for the library-mode PoC (paths relative to this doc): +Full evidence bundle for the library-mode PoC. The `poc-results/` dir was +removed from the tree after merge (PoC validation results aren't kept on +`main`); the links below are permalinks to the files at the merge commit, +where they remain in history: -- [`poc-results/lightspeed-stack-unified-library.yaml`](poc-results/lightspeed-stack-unified-library.yaml) +- [`lightspeed-stack-unified-library.yaml`](https://github.com/lightspeed-core/lightspeed-stack/blob/42844d068b488cc7d72928068b5606a7941f8c15/docs/design/llama-stack-config-merge/poc-results/lightspeed-stack-unified-library.yaml) — the unified-mode config used. -- [`poc-results/library-mode/synthesized-run.yaml`](poc-results/library-mode/synthesized-run.yaml) +- [`library-mode/synthesized-run.yaml`](https://github.com/lightspeed-core/lightspeed-stack/blob/42844d068b488cc7d72928068b5606a7941f8c15/docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml) — what LCORE produced (3.7 KB). -- [`poc-results/library-mode/query-response.json`](poc-results/library-mode/query-response.json) +- [`library-mode/query-response.json`](https://github.com/lightspeed-core/lightspeed-stack/blob/42844d068b488cc7d72928068b5606a7941f8c15/docs/design/llama-stack-config-merge/poc-results/library-mode/query-response.json) — a real `/v1/query` round-trip. -- [`poc-results/library-mode/README.md`](poc-results/library-mode/README.md) +- [`library-mode/README.md`](https://github.com/lightspeed-core/lightspeed-stack/blob/42844d068b488cc7d72928068b5606a7941f8c15/docs/design/llama-stack-config-merge/poc-results/library-mode/README.md) — walkthrough. Summary of validation: @@ -924,7 +927,7 @@ Summary of validation: query. The implementation JIRAs' e2e coverage must exercise a real Llama Guard model (e.g. `meta-llama/Llama-Guard-3-8B`) end-to-end. Caught by CodeRabbit on the PoC artifact at - `poc-results/library-mode/synthesized-run.yaml:110`. + [`synthesized-run.yaml` L110](https://github.com/lightspeed-core/lightspeed-stack/blob/42844d068b488cc7d72928068b5606a7941f8c15/docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml#L110). --- @@ -1147,13 +1150,22 @@ Relative to `upstream/main`: ## Appendix B — Commands to reproduce the library-mode PoC +The PoC config was removed from the tree after merge but is preserved at +the PR-1580 merge commit; step 0 fetches it from there so the commands +stay runnable. (That config also carried a machine-local `profile:` path — +adjust it for your environment before running.) + ```bash -# 1. Start LCORE in library mode with a unified config +# 0. Fetch the PoC config from the merge commit (removed from the tree post-merge) +mkdir -p /tmp/lcore-836-poc +curl -sSL -o /tmp/lcore-836-poc/lightspeed-stack-unified-library.yaml \ + https://raw.githubusercontent.com/lightspeed-core/lightspeed-stack/42844d068b488cc7d72928068b5606a7941f8c15/docs/design/llama-stack-config-merge/poc-results/lightspeed-stack-unified-library.yaml + +# 1. Start LCORE in library mode with the unified config export OPENAI_API_KEY= export E2E_OPENAI_MODEL=gpt-4o-mini -mkdir -p /tmp/lcore-836-poc uv run lightspeed-stack \ - -c docs/design/llama-stack-config-merge/poc-results/lightspeed-stack-unified-library.yaml + -c /tmp/lcore-836-poc/lightspeed-stack-unified-library.yaml # 2. In another shell — query curl -s http://localhost:8080/liveness diff --git a/docs/design/llama-stack-config-merge/llama-stack-config-merge.md b/docs/design/llama-stack-config-merge/llama-stack-config-merge.md index ba624cbeb..59d9a5480 100644 --- a/docs/design/llama-stack-config-merge/llama-stack-config-merge.md +++ b/docs/design/llama-stack-config-merge/llama-stack-config-merge.md @@ -510,7 +510,8 @@ reference. `Literal` value on `UnifiedInferenceProvider.type` has a `PROVIDER_TYPE_MAP` entry. - e2e behave tests: migrate `tests/e2e/configuration/**` configs to - unified form as part of LCORE-???? (test migration JIRA). + unified form as part of LCORE-2342 (Migrate in-repo e2e / integration + test configurations). ## Open Questions for Future Work diff --git a/docs/design/llama-stack-config-merge/poc-results/library-mode/README.md b/docs/design/llama-stack-config-merge/poc-results/library-mode/README.md deleted file mode 100644 index c099c2041..000000000 --- a/docs/design/llama-stack-config-merge/poc-results/library-mode/README.md +++ /dev/null @@ -1,26 +0,0 @@ -# Library-mode PoC evidence - -Command: -```bash -export OPENAI_API_KEY= -export E2E_OPENAI_MODEL=gpt-4o-mini -uv run lightspeed-stack -c docs/design/llama-stack-config-merge/poc-results/lightspeed-stack-unified-library.yaml -``` - -## What the unified config does - -- `llama_stack.config.profile: /abs/path/to/tests/e2e/configs/run-ci.yaml` — baseline loaded from the CI profile -- `llama_stack.config.native_override.safety.default_shield_id: llama-guard` — override proves merge works - -## Evidence - -- `synthesized-run.yaml` — the full run.yaml LCORE produced from the unified config -- `query-response.json` — a successful `/v1/query` round-trip - -## Proves - -- `llama_stack.library_client_config_path` was NOT used (no external run.yaml needed) -- `llama_stack.config.profile` was used as the synthesis baseline (path resolution works with absolute paths) -- `llama_stack.config.native_override` was merged onto the baseline -- `AsyncLlamaStackAsLibraryClient` accepts the synthesized file path (answered item #24: file-only, not dict) -- `/v1/query` succeeded end-to-end through the synthesized stack diff --git a/docs/design/llama-stack-config-merge/poc-results/library-mode/query-response.json b/docs/design/llama-stack-config-merge/poc-results/library-mode/query-response.json deleted file mode 100644 index 5664cbd00..000000000 --- a/docs/design/llama-stack-config-merge/poc-results/library-mode/query-response.json +++ /dev/null @@ -1 +0,0 @@ -{"conversation_id":"976ef32527283085ba2f1d0cfb4c16d97071bf64391a8200","response":"The three primary colors are red, blue, and yellow.","rag_chunks":[],"referenced_documents":[],"truncated":false,"input_tokens":24,"output_tokens":12,"available_quotas":{},"tool_calls":[],"tool_results":[]} \ No newline at end of file diff --git a/docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml b/docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml deleted file mode 100644 index 34e3e1fc9..000000000 --- a/docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml +++ /dev/null @@ -1,148 +0,0 @@ -apis: - - agents - - batches - - datasetio - - eval - - files - - inference - - safety - - scoring - - tool_runtime - - vector_io -benchmarks: [] -datasets: [] -image_name: starter -providers: - agents: - - config: - persistence: - agent_state: - backend: kv_default - namespace: agents_state - responses: - backend: sql_default - table_name: agents_responses - provider_id: meta-reference - provider_type: inline::meta-reference - batches: - - config: - kvstore: - backend: kv_default - namespace: batches_store - provider_id: reference - provider_type: inline::reference - datasetio: - - config: - kvstore: - backend: kv_default - namespace: huggingface_datasetio - provider_id: huggingface - provider_type: remote::huggingface - - config: - kvstore: - backend: kv_default - namespace: localfs_datasetio - provider_id: localfs - provider_type: inline::localfs - eval: - - config: - kvstore: - backend: kv_default - namespace: eval_store - provider_id: meta-reference - provider_type: inline::meta-reference - files: - - config: - metadata_store: - backend: sql_default - table_name: files_metadata - storage_dir: ~/.llama/storage/files - provider_id: meta-reference-files - provider_type: inline::localfs - inference: - - config: - allowed_models: - - ${env.E2E_OPENAI_MODEL:=gpt-4o-mini} - api_key: ${env.OPENAI_API_KEY} - provider_id: openai - provider_type: remote::openai - - config: {} - provider_id: sentence-transformers - provider_type: inline::sentence-transformers - safety: - - config: - excluded_categories: [] - provider_id: llama-guard - provider_type: inline::llama-guard - scoring: - - config: {} - provider_id: basic - provider_type: inline::basic - - config: {} - provider_id: llm-as-judge - provider_type: inline::llm-as-judge - - config: - openai_api_key: '********' - provider_id: braintrust - provider_type: inline::braintrust - tool_runtime: - - config: {} - provider_id: rag-runtime - provider_type: inline::rag-runtime - - config: {} - provider_id: model-context-protocol - provider_type: remote::model-context-protocol - vector_io: [] -registered_resources: - benchmarks: [] - datasets: [] - models: - - metadata: - embedding_dimension: 768 - model_id: all-mpnet-base-v2 - model_type: embedding - provider_id: sentence-transformers - provider_model_id: all-mpnet-base-v2 - scoring_fns: [] - shields: - - provider_id: llama-guard - provider_shield_id: openai/gpt-4o-mini - shield_id: llama-guard - tool_groups: - - provider_id: rag-runtime - toolgroup_id: builtin::rag - vector_stores: [] -safety: - default_shield_id: llama-guard -scoring_fns: [] -server: - port: 8321 -storage: - backends: - kv_default: - db_path: ${env.KV_STORE_PATH:=~/.llama/storage/kv_store.db} - type: kv_sqlite - sql_default: - db_path: ${env.SQL_STORE_PATH:=~/.llama/storage/sql_store.db} - type: sql_sqlite - stores: - conversations: - backend: sql_default - table_name: openai_conversations - inference: - backend: sql_default - max_write_queue_size: 10000 - num_writers: 4 - table_name: inference_store - metadata: - backend: kv_default - namespace: registry - prompts: - backend: kv_default - namespace: prompts -vector_stores: - default_embedding_model: - model_id: all-mpnet-base-v2 - provider_id: sentence-transformers - default_provider_id: faiss -version: 2 diff --git a/docs/design/llama-stack-config-merge/poc-results/lightspeed-stack-unified-library.yaml b/docs/design/llama-stack-config-merge/poc-results/lightspeed-stack-unified-library.yaml deleted file mode 100644 index a75ad5bf6..000000000 --- a/docs/design/llama-stack-config-merge/poc-results/lightspeed-stack-unified-library.yaml +++ /dev/null @@ -1,33 +0,0 @@ -name: Lightspeed Core Service (LCS) - Unified PoC -service: - host: 0.0.0.0 - port: 8080 - base_url: http://localhost:8080 - auth_enabled: false - workers: 1 - color_log: true - access_log: true -# Unified mode: no `library_client_config_path`. Operational LS config is -# synthesized by LCORE from `llama_stack.config` below. -llama_stack: - use_as_library_client: true - config: - # Use the CI-friendly baseline via `profile` (no EXTERNAL_PROVIDERS_DIR - # env var required). Equivalent to what tests/e2e/configs/run-ci.yaml - # provides; this exercises the `profile:` path of the synthesizer. - profile: /home/msvistun/repos/lightspeed/stack/tests/e2e/configs/run-ci.yaml - # Small native_override: prove overrides take effect end-to-end. - native_override: - safety: - default_shield_id: llama-guard -user_data_collection: - feedback_enabled: false - feedback_storage: "/tmp/lcore-836-poc/feedback" - transcripts_enabled: false - transcripts_storage: "/tmp/lcore-836-poc/transcripts" -conversation_cache: - type: "sqlite" - sqlite: - db_path: "/tmp/lcore-836-poc/conversation-cache.db" -authentication: - module: "noop" diff --git a/docs/design/llama-stack-config-merge/poc-results/pydantic-ai-research.md b/docs/design/llama-stack-config-merge/poc-results/pydantic-ai-research.md deleted file mode 100644 index a9a5fa3cb..000000000 --- a/docs/design/llama-stack-config-merge/poc-results/pydantic-ai-research.md +++ /dev/null @@ -1,245 +0,0 @@ -# Pydantic AI ↔ Llama Stack: Concept Mapping for a Backend-Agnostic YAML Schema - -## 1. Pydantic AI core concepts and configuration surface - -**What is an Agent.** In Pydantic AI an `Agent` is a generic, type-parameterised container that owns: a default model, instructions / system prompts, tools (and toolsets), capabilities (composable behavior units), an output type / output validators, retry budgets, model settings, dependency type, and instrumentation settings. From the official Agents API reference (ai.pydantic.dev/api/agent/): `Agent` is generic in `(AgentDepsT, OutputDataT)` and "by default, if neither generic parameter is customised, agents have type `Agent[None, str]`" (https://ai.pydantic.dev/api/agent/). - -Canonical construction (from the project README / overview at ai.pydantic.dev): -```python -from pydantic_ai import Agent -agent = Agent( - 'anthropic:claude-sonnet-4-6', - instructions='Be concise, reply with one sentence.', -) -result = agent.run_sync('Where does "hello world" come from?') -``` -(https://ai.pydantic.dev/) - -The Agent also owns tools registered via `@agent.tool` / `@agent.tool_plain` or via `tools=[...]`, dependencies via `deps_type=...`, structured outputs via `output_type=...`, and capabilities via `capabilities=[...]` (https://ai.pydantic.dev/api/agent/, https://ai.pydantic.dev/capabilities/). - -**How the model/provider is declared.** Pydantic AI is **model-string + client-object based**, not Llama-Stack-style named provider entries. The simplest form is a string `':'`: - -> "When you instantiate an Agent with just a name formatted as `:`, e.g. `openai:gpt-5.2` or `openrouter:google/gemini-3-pro-preview`, Pydantic AI will automatically select the appropriate model class, provider, and profile." -> — https://ai.pydantic.dev/models/overview/ - -For non-default endpoints, auth, or AI-gateway use, you instantiate a `Model` class and pass a `Provider`: -```python -from pydantic_ai.models.openai import OpenAIChatModel -from pydantic_ai.providers.azure import AzureProvider -agent = Agent(OpenAIChatModel('gpt-5.2', provider=AzureProvider(...))) -``` -(https://ai.pydantic.dev/models/overview/, https://ai.pydantic.dev/api/providers/) - -There is **no native concept of named provider entries with type/id/config that get looked up by name at runtime**, the way Llama Stack does. The `Provider` in Pydantic AI is a Python class with constructor args (`api_key`, `base_url`, `openai_client`, `http_client`), not a registry entry referenced from a YAML file by id (https://pydantic.dev/docs/ai/api/pydantic-ai/providers/). - -**Multiple models / providers in one app.** Per-agent: each `Agent` instance owns its own model. Globally there is no "default provider list"; the closest thing is the `gateway/...` prefix (Pydantic AI Gateway) or the `FallbackModel` wrapper that takes multiple models and falls back on failure (https://ai.pydantic.dev/models/overview/). Multi-model applications usually create multiple `Agent` instances and pass them around, optionally via dependency injection (`deps_type` carries a `RunContext` with whatever shared clients/configs you want). - -**File-based config — yes, natively.** Since the introduction of `AgentSpec` and `Agent.from_file` / `Agent.from_spec`, Pydantic AI supports declarative YAML/JSON agent definitions: - -```python -from pydantic_ai import Agent -agent = Agent.from_file('agent.yaml') -``` - -```yaml -# agent.yaml -model: anthropic:claude-opus-4-6 -instructions: "You are a helpful assistant." -capabilities: - - WebSearch: {local: duckduckgo} - - Thinking: {effort: high} -``` -(https://ai.pydantic.dev/core-concepts/agent-spec/, https://ai.pydantic.dev/api/agent/) - -`AgentSpec.to_file('agent.yaml')` can also emit a companion `agent_schema.json` for editor autocompletion. The spec is **per-agent**, not a server-wide config — there is no equivalent to Llama Stack's single `run.yaml` describing the whole runtime, APIs, and provider registry. (See section 5 for what this implies for a single-file operator config.) - -**Credentials / API keys at runtime.** Three mechanisms, in order of expressiveness: -1. **Environment variables** — each `Provider` class reads a conventional env var if no explicit `api_key=` is passed (e.g., `OLLAMA_API_KEY`, `OPENAI_API_KEY`, documented in https://pydantic.dev/docs/ai/api/pydantic-ai/providers/). -2. **Explicit provider client construction** — pass `api_key=`, `base_url=`, or a fully-constructed vendor SDK client (e.g., `openai_client=AsyncOpenAI(...)`) to the `Provider`. -3. **AgentSpec/from_file** — the YAML spec does **not** declare credentials; it declares model name and capabilities, and credentials still come from env vars or from Python code that wires the `Provider`. - -There is no built-in `${env.VAR}` interpolation in `AgentSpec` YAML. Template strings (`{{user_name}}`) exist but resolve against `deps`, not environment variables (https://ai.pydantic.dev/core-concepts/agent-spec/). - -## 2. RAG / retrieval / vector stores - -**No built-in RAG abstraction. No vector-store abstraction.** Pydantic AI's official docs are explicit: - -> "The main semantic difference between Pydantic AI Tools and RAG is RAG is synonymous with vector search, while Pydantic AI tools are more general-purpose. For vector search, you can use our embeddings support to generate embeddings across multiple providers." -> — https://ai.pydantic.dev/tools/ - -RAG is implemented as a user-written tool that calls whatever vector DB the user has chosen. The official "RAG" example (https://ai.pydantic.dev/examples/rag/, https://github.com/pydantic/pydantic-ai/blob/main/docs/examples/rag.md) uses **PostgreSQL + pgvector** directly via `asyncpg` and the **OpenAI SDK for embeddings** — Pydantic AI itself isn't involved in indexing: - -```python -@rag_agent.tool -async def retrieve(context: RunContext[Deps], search_query: str) -> str: - embedding = await context.deps.openai.embeddings.create( - input=search_query, model='text-embedding-3-small', - ) - rows = await context.deps.pool.fetch( - 'SELECT chunk FROM text_chunks ORDER BY embedding <-> $1 LIMIT 5', - pydantic_core.to_json(embedding.data[0].embedding).decode(), - ) - return '\n\n'.join(f'# Chunk:\n{row["chunk"]}\n' for row in rows) -``` -(verbatim from the official example) - -The docs even note: *"Note building the database doesn't use Pydantic AI right now, instead it uses the OpenAI SDK directly."* (https://github.com/pydantic/pydantic-ai/blob/main/docs/examples/rag.md) - -**Canonical community patterns.** From observed community projects and Pydantic AI's own example: -- **pgvector + OpenAI / Voyage embeddings via raw SDK calls** — the pattern in the official example, and in projects such as github.com/serkanyasr/agentic_rag_project (Pydantic AI + FastAPI + pgvector) and github.com/cskwork/pydantic-rag-ollama (Ollama embeddings + pgvector). -- **MCP server for retrieval** — exposing a vector DB through an MCP server and consuming it via Pydantic AI's `MCP(url=...)` capability (https://ai.pydantic.dev/capabilities/). -- **LlamaIndex / LangChain as a retrieval backend** — used purely as a library inside a Pydantic AI tool; no first-class integration is documented in ai.pydantic.dev. - -There is no `pydantic_ai.vector_store` module, no `VectorStore` protocol, and no roadmap entry that surfaced in my search for adding one (caveat: I did not find a public roadmap document, only the version policy and release notes — see §7). - -## 3. Safety / guardrails - -**No built-in safety / guardrails / shield API in Pydantic AI core.** GitHub Issue #1197 ("Guardrails") is the open feature request, with a working-design proposal that mirrors the OpenAI Agents SDK's `@input_guardrail` / `@output_guardrail` decorators (https://github.com/pydantic/pydantic-ai/issues/1197). As of `pydantic-ai-slim 1.97.0` (May 15, 2026, https://pypi.org/project/pydantic-ai-slim/) it has not been merged into core. - -**What Pydantic AI *does* provide is validation-as-correctness, not safety:** -- `output_type=` enforces the response shape via Pydantic validation (https://ai.pydantic.dev/output/). -- `@agent.output_validator` lets you raise `ModelRetry` and force the model to try again — but it operates only on the structured/typed output and is enforced via a per-run retry budget (https://ai.pydantic.dev/output/, https://ai.pydantic.dev/api/agent/). -- Tools can raise `ModelRetry` for argument-level checks (https://ai.pydantic.dev/tools-advanced/). -- Tool-call approval (`requires_approval=`) is a deterministic human-in-the-loop gate on a per-tool basis (https://ai.pydantic.dev/api/tools/, https://ai.pydantic.dev/). - -These are **schema enforcement and retry control**, not content-safety in the Llama Guard / Prompt Guard sense. They don't see the user prompt before the model does, and they don't classify content categories. - -**Typical user patterns for actual safety:** -1. **Custom output validator** that calls an external moderation API (OpenAI moderations, Lakera, LLM Guard) — minimal but only catches output, runs after the model call. -2. **Tool gating** — a `prepare` function on `Tool` that filters tool availability based on `RunContext` (https://ai.pydantic.dev/api/tools/). -3. **Third-party capability packages built on the Capabilities API:** - - `pydantic-ai-guardrails` (https://pypi.org/project/pydantic-ai-guardrails/, https://github.com/jagreehal/pydantic-ai-guardrails) — `GuardedAgent` wrapper with input/output guardrails, llm-guard + autoevals + Guardrails Hub integrations, OpenAI Guardrails-UI config loading, parallel execution. - - `pydantic-ai-shields` (https://github.com/vstorm-co/pydantic-ai-shields) — `PromptInjection`, `PiiDetector`, `SecretRedaction`, `BlockedKeywords`, `NoRefusals`, `OutputGuard`, `AsyncGuardrail` capabilities passed via `capabilities=[...]`. -4. **NeMo Guardrails / Guardrails AI / Llama Guard via tool call** — wrap the entire model call in your own pipeline outside Pydantic AI. - -None of these are first-party. None are mentioned in the official Pydantic AI docs as the canonical answer. *I am unsure* whether any will be brought into core before V2. - -## 4. Tools / function calling - -**Declaration / registration** (https://ai.pydantic.dev/tools/, https://ai.pydantic.dev/toolsets/): -- `@agent.tool` — decorator, function receives `RunContext` as first arg. -- `@agent.tool_plain` — decorator, no `RunContext`. -- `tools=[fn1, fn2, Tool(fn3, name=..., description=...)]` on the `Agent` constructor. -- `FunctionToolset(tools=[...])` + `toolsets=[...]` on the constructor — first-class collections of tools, can be combined dynamically with `@agent.toolset`. -- Dynamic registration inside a running tool: `toolset.add_function(...)` / `toolset.add_tool(...)`. - -Schema is **auto-generated** from function signatures and docstrings via griffe (https://ai.pydantic.dev/tools/). Args validated by Pydantic; `ModelRetry` triggers a retry with feedback to the model. - -**Comparison to Llama Stack's `tool_runtime` / `registered_resources.tool_groups`.** Llama Stack treats tool *implementations* as plug-in providers under `providers.tool_runtime` (e.g., `remote::model-context-protocol`, `remote::tavily-search`, `inline::rag-runtime`), and the *named tool groups* the agent can invoke as a separate registered-resources list (`tool_groups:` or `registered_resources.tool_groups:`), each referencing a runtime by `provider_id` (e.g., MCP endpoint URI). The split is intentional: it lets the operator add or remove tool backends without touching application code (https://llamastack.github.io/docs). - -Pydantic AI has no equivalent split. Tools are **Python objects/callables**, not configuration. There is one exception that brings configuration-driven extensibility: **capability packages** (`AbstractCapability` subclasses) can be referenced from `AgentSpec` YAML by class name and registered via `custom_capability_types=` on `Agent.from_spec` / `Agent.from_file` (https://ai.pydantic.dev/capabilities/). MCP is exposed this way: in `agent.yaml` you can write `capabilities: [{MCP: {url: https://mcp.example.com/api}}]` — that's the closest analogue to Llama Stack's `tool_groups` entry for an MCP endpoint, and it's the surface you would use if you want a YAML-only tool declaration. - -There is no general plugin discovery via entry-points documented in ai.pydantic.dev; you must `pip install` the capability package and pass its class to `custom_capability_types`. - -## 5. Mapping table (Llama Stack ↔ Pydantic AI) - -| Llama Stack concept | Pydantic AI equivalent / pattern | -|---|---| -| `providers.inference` (named entry with id+type+config) | No direct equivalent. Pattern: `Agent('openai:gpt-5.2')` model string, or explicit `Provider(api_key=…, base_url=…)` + `Model` instance per agent. | -| `providers.safety` / `shields` | No direct equivalent in core. Pattern: `@agent.output_validator` for shape, third-party capability packages (`pydantic-ai-shields`, `pydantic-ai-guardrails`) for content safety. | -| `providers.vector_io` / vector stores | No direct equivalent. Pattern: tool that calls pgvector/Milvus/Qdrant via the user's chosen client; embeddings via Pydantic AI's `embeddings` support or a raw SDK. | -| `providers.tool_runtime` | No direct equivalent as a provider registry. Pattern: tools registered as Python callables; MCP endpoints declared via the `MCP` capability in code or `AgentSpec` YAML. | -| `providers.agents` (Agents API as a provider) | The `Agent` class itself; not a server provider — it's a Python object. No equivalent of swapping the agent runtime via config. | -| `apis: [agents, inference, safety, …]` | No direct equivalent. Pydantic AI has no notion of selectively enabling capability *APIs* — every Agent always supports tools, output validation, etc., as Python APIs. | -| `registered_resources` (models, shields, vector stores) | Partially: model is declared per-Agent in `AgentSpec` (`model:`). Shields/vector stores have no equivalent registry — they're code. | -| `storage` (sqlstore / kvstore) | No equivalent. Pydantic AI itself is stateless per run; durable execution is delegated to **Temporal / DBOS / Prefect** integrations (https://ai.pydantic.dev/api/agent/). Conversation state is the caller's responsibility. | -| `${env.VAR}` env-ref resolution in config | No equivalent in `AgentSpec` YAML. Env vars are read by `Provider` classes at construction time. For YAML-side interpolation you must layer your own loader (e.g., `pydantic-settings` or a manual `os.path.expandvars` pre-pass). | - -## 6. Implications for a backend-agnostic operator-facing schema - -Replaying the user's example schema: -```yaml -inference: - providers: - - type: openai - api_key_env: OPENAI_API_KEY - allowed_models: [gpt-4o-mini] -rag: - providers: - - type: faiss - embedding_model: sentence-transformers -safety: - default_shield: llama-guard -``` - -### `inference` block — **STABLE-ish across both backends. ~75% confidence.** - -This concept maps cleanly to Llama Stack today (`providers.inference` with `provider_type: remote::openai`, `api_key: ${env.OPENAI_API_KEY}`, optionally a `registered_resources.models` allow-list) and to Pydantic AI tomorrow (synthesize a Python `OpenAIProvider(api_key=os.environ["OPENAI_API_KEY"])` and use model strings constrained to `allowed_models`). The single-item-list shape (`providers: [...]`) on the Llama Stack side preserves the model that Llama Stack uses today; on the Pydantic AI side a synthesizer picks the first matching entry per agent. Operators describe "what credentials, what endpoints, what models are allowed" — a vocabulary stable in both worlds. - -**Where it breaks:** Llama Stack allows multiple named provider entries of the same API serving different models (e.g., `vllm-inference` and `vllm-safety`); Pydantic AI has no global registry, so naming providers is meaningless until your synthesizer assigns them to specific Agents. Decision: keep the list but treat the entries as available-clients, not as a global registry. Don't expose `provider_id` in the abstract schema unless you also expose agent-to-provider binding. - -### `rag` block — **NOT STABLE. ~25% confidence it survives.** - -Llama Stack has `providers.vector_io` (with `inline::faiss`, `remote::milvus`, `remote::pgvector`, etc.) plus `registered_resources.vector_stores` and a first-class `/v1/vector_stores` API. Pydantic AI has **none of this**. The official RAG example wires pgvector directly via `asyncpg` (https://github.com/pydantic/pydantic-ai/blob/main/docs/examples/rag.md). A backend-agnostic synthesizer can take your `rag.providers[].type=faiss` and produce a Llama Stack provider entry, but for Pydantic AI the synthesizer would have to *generate code or instantiate a vector-DB client and a tool that uses it* — a much larger gap. - -Worse, vocabulary diverges: `embedding_model: sentence-transformers` is a model identifier in Llama Stack (registered under `registered_resources.models` with `model_type: embedding`); in Pydantic AI it would be a parameter to `pydantic_ai.embeddings.SentenceTransformersEmbedder` or similar (the `sentence-transformers` package extra is in `pydantic-ai-slim[sentence-transformers]` — https://pypi.org/project/pydantic-ai-slim/). - -**Recommendation:** Until Pydantic AI ships a built-in vector-store abstraction (no public signal it's coming in the next 6–12 months — see §7), keep RAG configuration **under a Llama-Stack-specific subtree**, and on the Pydantic AI side require operators to declare which tool implements retrieval. A minimal portable surface might be `rag.embedding_model:` only — both backends understand "which model to embed with." - -### `safety` block — **NOT STABLE. ~20% confidence it survives.** - -`default_shield: llama-guard` translates 1:1 to Llama Stack — `providers.safety: - provider_type: inline::llama-guard` + `registered_resources.shields: - shield_id: llama-guard, provider_id: llama-guard`, plus per-agent `input_shields` / `output_shields`. On Pydantic AI, there is no shield concept; the closest path is a third-party capability (`pydantic-ai-shields` provides a `PromptInjection` capability you'd add to `agent.yaml`'s `capabilities:` list), but the actual *model used* is hard-coded inside each capability, and Llama Guard specifically is not a first-class Pydantic AI option. - -The vocabulary `default_shield: ` makes sense in Llama Stack (where shields are registered resources you can name); it makes no sense in Pydantic AI without inventing a registry layer. - -**Recommendation:** Keep `safety.default_shield` under a Llama-Stack-specific subtree. The portable surface is essentially nothing today — at best `safety.enabled: bool` and `safety.fail_closed: bool`. - -### Minimum YAML surface stable across both backends - -```yaml -# This much survives a Llama Stack → Pydantic AI migration, with caveats: -inference: - providers: - - type: openai # vendor identifier (stable) - api_key_env: OPENAI_API_KEY # env-var name only (synthesizer reads value) - base_url: https://... # optional override - allowed_models: [gpt-4o-mini] -# everything else (rag, safety, storage, tool_runtime, shields, vector_stores) -# goes under a backend-specific block: -backend_specific: - llama_stack: - rag: {...} - safety: {...} - storage: {...} - pydantic_ai: - capabilities: [...] - spec_overrides: {...} -``` - -## 7. Pydantic AI's own roadmap / stability signals - -**Stability today.** Pydantic AI reached **V1.0.0 on September 4, 2025** with an explicit API-stability commitment: *"V1 means we're committing to API stability: we will not break your code for at least 6 months."* (https://pydantic.dev/articles/pydantic-ai-v1). The version policy adds: *"We will not intentionally make breaking changes in minor releases of V1. V2 will be released in April 2026 at the earliest, 6 months after the release of V1 in September 2025."* (https://ai.pydantic.dev/version-policy/). Current PyPI version is **`pydantic-ai 1.98.0` (May 19, 2026)** and **`pydantic-ai-slim 1.97.0` (May 15, 2026)** (https://pypi.org/project/pydantic-ai/, https://pypi.org/project/pydantic-ai-slim/) — production/stable classification. - -As of this report (May 20, 2026), **V2 has not shipped** — the upgrade guide at https://ai.pydantic.dev/changelog/ lists only V1.x breaking changes (which the team explicitly notes were *accidental* leftovers from pre-V1 work, e.g., the Python evaluator removal in #2808 left out of v1.0.0). - -**Release cadence.** Weekly to bi-weekly minor releases since V1 (e.g., v1.90.0 on May 4, 2026; v1.91.0; v1.93.0 on May 9; v1.94.0 on May 12; v1.95.0; v1.97.0 on May 15; v1.98.0 on May 19) — https://github.com/pydantic/pydantic-ai/releases. No breaking changes to agent construction, model strings, or `@agent.tool` since 1.0; recent breaking-change entries in the changelog are pre-V1 (the upgrade guide is filtered to historical pre-1.0 churn). - -**Recent changes touching agent / provider / tool surface in the last 12 months** (https://ai.pydantic.dev/changelog/, https://github.com/pydantic/pydantic-ai/releases): -- `AgentStreamEvent` expanded to a union — backward compatible (#2689). -- `format_as_xml` import path moved (#2446/#1484) — minor. -- Removal of deprecated `Agent.result_validator`, `AgentRunResult.data`, `Agent.last_run_messages` (#2451). -- `TenacityTransport` now requires `RetryConfig` TypedDict (#2670, #2717). -- v1.94.0 (May 12, 2026): "Drop mistralai as dependency from pydantic-ai by @Kludex in #5384"; OpenAI profile flag for multi-system messages (https://github.com/pydantic/pydantic-ai/releases). -- v1.95.0 / v1.97.0 / v1.98.0: incremental fixes; deprecation of `OutlinesModel` / `OutlinesProvider`, `AGUIApp` / `Agent.to_ag_ui()` in favor of `AGUIAdapter`. - -None touched the public Agent constructor signature, the `:` string convention, the `@agent.tool` decorator, or `AgentSpec`/`from_file`. - -**Roadmap signals for built-in safety / RAG / registered-resources.** -- **Guardrails (Issue #1197)** is open with an OpenAI-Agents-SDK-style proposal; not on a stated milestone. *I am unsure* whether it will land before V2. -- No public roadmap document at ai.pydantic.dev mentions a built-in vector-store / RAG abstraction or a Llama-Stack-style provider registry. Thoughtworks Technology Radar **Volume 33** (published November 5, 2025, per PRNewswire) confirms the framework is intentionally narrow: *"Rather than trying to be a Swiss Army knife, PydanticAI offers a lightweight yet powerful approach."* (https://www.thoughtworks.com/radar/languages-and-frameworks/pydantic-ai) — and the same page notes "This blip is not on the current edition of the Radar," indicating the entry was Volume 33 and not Volume 34. -- The big roadmap themes per the v1 launch article (https://pydantic.dev/articles/pydantic-ai-v1) are: durable execution (Temporal/DBOS/Prefect), human-in-the-loop tool approval, MCP/A2A/AG-UI interop, and the Pydantic AI Gateway. **Not** safety shields, not vector stores, not server-side configuration. - -**Conclusion (§7):** API stability is high (V1, ~9 months in production, weekly minors, no agent-surface breakage). The library is **intentionally not growing into Llama Stack's territory** on a 6–12 month horizon. - -## 8. Recommendation - -**Abstract only the inference vocabulary today. Keep RAG, safety, storage, and tool-runtime under backend-specific subtrees. Confidence: ~80%.** Build the single-file schema around an `inference:` block of vendor + endpoint + env-var-name + allow-listed models — that vocabulary maps cleanly to Llama Stack today and to Pydantic AI's Provider + model-string surface tomorrow, and a thin synthesizer per backend covers the gap. Do **not** abstract `rag.*` or `safety.*` into a portable vocabulary right now: Pydantic AI has no built-in vector-store or shield concept, no public roadmap signal that either is coming before V2, and the third-party capability packages that fill the gap (pydantic-ai-shields, pydantic-ai-guardrails) have incompatible vocabularies with each other and with Llama Guard. Park them under `backend_specific.llama_stack.{rag,safety,storage}` and a parallel `backend_specific.pydantic_ai.{capabilities,spec_overrides}`. Treat MCP endpoints as the one tool-runtime concept worth abstracting (~60% confidence) — both backends support MCP natively and the URI + auth-token + allowed-tools fields are stable on both sides. Re-evaluate at every Pydantic AI minor release for: (a) a built-in guardrails API merging Issue #1197, (b) any vector-store abstraction, (c) a server-side / multi-agent config concept. If any of those land, the safety or RAG vocabulary becomes worth abstracting; until then, premature abstraction will cost you more than the duplication. - -## Caveats - -- **Llama Stack rebrand.** On **April 28, 2026**, the Llama Stack project rebranded to **OGX**, per the official announcement blog post (https://ogx-ai.github.io/blog/from-llama-stack-to-ogx): *"Llama Stack is now OGX. The name changed, but more importantly, so did the mission."* The repo `github.com/llamastack/llama-stack` and the mirror `github.com/meta-llama/llama-stack` redirect there. Latest release tag observed: **v1.0.2 (May 13, 2026)**. The OGX rebrand post also states that **"The project supports 23 inference providers. You can run GPT-4, Claude, Gemini, Mistral, or any model you want behind OGX."** Some templates still ship `image_name:` (legacy) while newer ones use `distro_name:` (PR #4396). The `registered_resources:` block introduced in PR #4600 is the new canonical home for `models / shields / vector_stores / tool_groups / datasets / benchmarks`; older templates with these as bare top-level keys still load. -- **Llama Stack env-var syntax.** Confirmed forms: `${env.VAR}` (required), `${env.VAR:=default}` (with default), `${env.VAR:+value}` (conditional, e.g., enable a provider only when a key is set). The single-colon form `${env.VAR:default}` seen in some older blog posts is **not** the current canonical syntax. -- **Llama Stack source-file paths in the schema reconstruction could not be directly verified** because raw GitHub fetches were blocked during research. The schema sketch is reconstructed from third-party verbatim quotes (Cerebras, Red Hat, Medium) and PR titles. Before locking your schema, fetch `src/llama_stack/core/datatypes.py` (look for `StackRunConfig`) and a current `src/llama_stack/distributions/starter/run.yaml` to confirm. -- **`pydantic-ai-shields` and `pydantic-ai-guardrails` are third-party**, maintained by independent authors (vstorm-co and jagreehal respectively). They are not part of pydantic/pydantic-ai. Treat them as community packages whose APIs may diverge from anything the Pydantic team eventually ships. -- **No Pydantic AI public roadmap doc** was found; conclusions about "not on the roadmap" are inferred from the version-policy doc, the v1 launch post, recent release notes, and the absence of relevant milestones — not from a positive statement that these features are out of scope. Flagged as inference, not fact. -- **Pydantic AI V2 timing.** The version policy says *"V2 will be released in April 2026 at the earliest"*; as of May 20, 2026, V2 has not shipped and the changelog page lists only V1.x entries. The user's stated migration window (2026 or Q1 2027) likely overlaps the V2 release; plan to re-validate this report when V2 ships. \ No newline at end of file