diff --git a/examples/proposals/kubernetes_deployment.md b/examples/proposals/kubernetes_deployment.md new file mode 100644 index 00000000..548fc734 --- /dev/null +++ b/examples/proposals/kubernetes_deployment.md @@ -0,0 +1,360 @@ +# Kubernetes Deployment RFC + +## Summary + +Issue [#324](https://github.com/vxcontrol/pentagi/issues/324) asks +whether PentAGI can run on Kubernetes. Today PentAGI is built and +documented around Docker Compose and the installer, and there is no +supported Kubernetes path. This RFC sketches what a future, +incremental Kubernetes-compatibility effort could look like and names +the parts of the current design that make it non-trivial. + +This document does not implement runtime behavior. It does not add +Helm charts, Kubernetes manifests, Kustomize bases, an operator, or +CRDs. It does not change `docker-compose.yml`, the installer, the +backend, the database schema, or any environment variable. It does +not claim that PentAGI runs on Kubernetes today, because it does not. +It is a design surface for maintainers to push back on before any +deployment code lands. + +The RFC is intentionally staged and docs-first. PentAGI's flow +executor currently talks to a Docker daemon over a bind-mounted +socket, and that single fact drives most of the difficulty below. A +naive "wrap the containers in a Deployment" approach would either +break flow execution or smuggle in implicit, hard-to-inspect +lifecycle behavior -- close to the patterns pushed back on during PR +[#268](https://github.com/vxcontrol/pentagi/pull/268) review. The +proposed path below stays explicit and reviewable: every Kubernetes +resource a future implementation would create should be something an +operator can see with `kubectl`, not a hidden background mechanism. + +## Goals + +- Capture, in one place, the concrete reasons PentAGI does not run on + Kubernetes today, grounded in the current Compose and installer + design rather than in guesswork. +- Map each Compose-era assumption (secrets, volumes, service + discovery, TLS, health, networking, the container executor, + observability, image selection, migrations) to its candidate + Kubernetes equivalent. +- Identify the one genuinely hard problem -- the Docker-socket flow + executor -- and lay out candidate approaches with their trade-offs, + without choosing one. +- Propose an incremental, docs-first path so that any later + implementation can be reviewed in small, self-contained slices. +- Keep operators in control of secrets, persistence, network reach, + and the privilege level of flow execution at every step. +- Give maintainers a single artifact to accept, reject, or reshape + before any chart, manifest, or operator code is written. + +## Non-Goals + +- This RFC does not add Helm charts, raw manifests, Kustomize + overlays, an operator, or CRDs. No deployment artifact ships with + this document. +- This RFC does not modify `docker-compose.yml`, the installer, or the + current supported deployment path. Compose remains the only + supported deployment model until maintainers decide otherwise. +- This RFC does not add, rename, or change any environment variable, + and does not change any default in `.env.example` or the backend + configuration. +- This RFC does not change the backend, the database schema, the + generated code, the GraphQL or REST surface, or the flow executor. +- This RFC does not propose hidden background orchestration, an + implicit queue, or out-of-band lifecycle state to make Kubernetes + work. Carrying forward the explicit lesson from PR + [#268](https://github.com/vxcontrol/pentagi/pull/268) review: any + future Kubernetes resource (Pod, Job, PVC, Secret) must be visible + and manageable through the standard Kubernetes API, not buried in + process memory. +- This RFC does not claim parity with the Compose deployment. Some + capabilities (notably the privileged Docker-socket executor) may + never map cleanly, and this document does not promise that they + will. +- This RFC does not pick a single executor strategy, a single + ingress controller, a single storage class, or a single secret + backend. Those are deferred to a later implementation RFC. + +## Current Deployment Assumptions + +This section describes how PentAGI is deployed today, because the +Kubernetes considerations only make sense against the current shape. +Everything here is drawn from `docker-compose.yml`, the installer +docs, and the backend, not from a hypothetical setup. + +- **Compose-oriented topology.** The supported deployment is Docker + Compose (directly or via the installer). The core stack is the + `pentagi` backend, a `pgvector` PostgreSQL instance, a `pgexporter` + metrics sidecar, and a `scraper` service. Optional stacks add + Graphiti / Neo4j, Langfuse, and an observability bundle + (OpenTelemetry collector, Grafana, VictoriaMetrics, and friends). +- **The flow executor uses the Docker socket.** This is the central + fact for Kubernetes. The `pentagi` service bind-mounts the host + Docker socket (`${PENTAGI_DOCKER_SOCKET:-/var/run/docker.sock}` to + `/var/run/docker.sock`) and the backend's Docker client connects + via `client.FromEnv`, honoring + `DOCKER_HOST` (default `unix:///var/run/docker.sock`). During a + flow, PentAGI creates and destroys terminal/worker containers + against that daemon. The executor is effectively "talk to a Docker + daemon and spawn sibling containers," not "run one long-lived + process." +- **Elevated privileges by design.** Because the backend drives the + Docker socket, the `pentagi` service runs `user: root:root` + (commented in-file as "while using docker.sock") and carries + Docker-related toggles (`DOCKER_INSIDE`, `DOCKER_NET_ADMIN`, + `DOCKER_GID=998`, `DOCKER_WORK_DIR`). This privilege level is + intrinsic to the current executor, not incidental. +- **Local-named volumes for state.** Persistent state uses Docker + local volumes: `pentagi-data` mounted at `/opt/pentagi/data`, + Postgres data in `pentagi-postgres-data`, plus `pentagi-ssl`, + `scraper-ssl`, and `pentagi-ollama`. These assume a single host + with local volume drivers. +- **Configuration and secrets via `.env`.** Provider keys, the + database DSN, embedding settings, TLS material, and feature toggles + are passed as environment variables sourced from `.env`. There is no + externalized secret store in the default path; the env file is the + source of truth. +- **Service discovery by Compose DNS.** Services find each other by + Compose service name on user-defined bridge networks + (`pentagi-network`, and the optional `observability-network` and + `langfuse-network`). The backend reaches Postgres, the scraper, and + optional services by name. +- **TLS terminates at the backend.** The backend listens on `8443` + and is published to `${PENTAGI_LISTEN_IP:-127.0.0.1}:8443`, + defaulting to loopback. There is no separate ingress or reverse + proxy in the core stack; TLS is handled inside the container. +- **Health via Compose healthchecks.** Ordering uses Compose + `depends_on` with `condition: service_healthy` (for example the + backend waits on `pgvector`). Health is expressed as container + healthchecks, not as orchestrator probes. +- **Database migrations on startup.** The backend embeds its SQL + migrations and runs them with goose at process start + (`goose.Up`). There is no separate migration step; the backend + migrates itself when it boots. +- **Image selection via env override.** The backend image is + `${PENTAGI_IMAGE:-vxcontrol/pentagi:latest}`, and worker/tool images + are similarly overridable. Air-gapped and mirror setups already rely + on these overrides (see the README's note on restricted networks, + Docker mirrors, and proxies). + +## Kubernetes Compatibility Considerations + +For each Compose-era assumption above, this section names the +candidate Kubernetes equivalent and the friction. Nothing here is a +committed design; it is a map of the problem space. + +- **Secrets and configuration.** The `.env` model maps to Kubernetes + `Secret` objects (provider keys, DB credentials, TLS material) and + `ConfigMap` objects (non-secret toggles). This is mostly mechanical. + The open part is whether to keep a flat env-injection model + (`envFrom` a Secret/ConfigMap) or move toward referenced secrets, + and whether to integrate external secret managers. No change to the + variable names themselves is needed. +- **Persistent volumes.** The local-named volumes map to + `PersistentVolumeClaim`s backed by a cluster `StorageClass`. Postgres + state in particular wants a `StatefulSet` with a stable claim, or an + external managed Postgres. The friction is that several volumes today + assume single-host locality and `ReadWriteOnce` semantics; a future + design has to be explicit about access modes and about whether + Postgres is in-cluster or external. +- **Service discovery.** Compose service-name DNS maps cleanly to + Kubernetes `Service` objects and in-cluster DNS. This is among the + lowest-friction items; the backend would address Postgres and the + scraper by Service name instead of Compose name. +- **Ingress and TLS.** Today TLS terminates in the backend on `8443` + bound to loopback. On Kubernetes the candidate is an `Ingress` (or + Gateway API) with TLS via cert-manager, or preserving in-pod TLS and + exposing it through a passthrough Service. The open question is + whether to keep TLS in the backend or move termination to the edge; + both are viable and have different operational profiles. +- **Health checks.** Compose healthchecks and `depends_on` map to + `readinessProbe` and `livenessProbe`. Startup ordering that Compose + expresses with `service_healthy` becomes readiness-gated rollout + plus application-level retry, since Kubernetes does not block one + workload's start on another's health the way Compose does. +- **Network policies.** The implicit isolation of Compose user-defined + networks maps to Kubernetes `NetworkPolicy`. This is an opportunity + to make the currently-implicit segmentation explicit, but it is also + net-new surface that has to be designed rather than translated. +- **Flow / container execution model (the hard problem).** This is the + item that does not translate mechanically. The backend expects a + Docker daemon and spawns sibling containers over the socket. + Kubernetes does not hand workloads a Docker socket, and modern + clusters do not run Docker as the node runtime. Candidate + approaches, each with real trade-offs and none free of cost: + - **Kubernetes-native execution.** Teach the executor to create + ephemeral `Pod`s or `Job`s through the Kubernetes API instead of + Docker containers. Most idiomatic and the most inspectable + (`kubectl get pods/jobs` shows exactly what a flow is running), + but the largest backend change, and it requires an in-cluster + `ServiceAccount` with pod-create RBAC, which is its own risk. + - **Docker-in-Docker sidecar.** Run a DinD daemon next to the + backend and keep the existing socket-based executor. Smallest + backend change, but DinD typically needs a privileged container, + has known stability and storage caveats, and concentrates risk in + one privileged pod. + - **Sandboxed runtimes.** Pair Kubernetes-native execution with a + stronger isolation runtime (gVisor, Kata, sysbox, or similar) for + the worker pods, since flow workers run untrusted, agent-driven + commands. This is a hardening layer on top of native execution, + not an alternative to it. + Whatever is chosen, the PR + [#268](https://github.com/vxcontrol/pentagi/pull/268) lesson + applies: the running work must be visible and manageable through + standard Kubernetes objects, not tracked only inside the backend + process. +- **Observability.** The optional OpenTelemetry / Grafana / + VictoriaMetrics stack maps to in-cluster deployments or, more + likely, to whatever the operator's cluster already runs. The + candidate direction is to make PentAGI emit to existing cluster + observability rather than bundling its own, with the bundled stack + as an opt-in for clusters that have none. +- **Image overrides.** The existing `PENTAGI_IMAGE` and related + per-image overrides map directly to image fields in pod specs, which + is helpful for air-gapped and mirror deployments. This is + low-friction and reuses an existing mechanism rather than inventing + one. +- **Upgrade and migration path.** Because the backend runs goose + migrations on startup, a rolling update could run migrations from + whichever replica starts first. On Compose with a single backend + this is fine; on Kubernetes with multiple replicas it is not. A + future design needs an explicit decision: a one-shot migration + `Job` (or init container) gated ahead of the rollout, or an + enforced single-writer constraint. This must be settled before any + multi-replica backend deployment is suggested. + +## Proposed Incremental Path + +The path is deliberately docs-first so each step is small enough to +review and reject in isolation. No step below is started by this RFC; +this is the proposed sequence, not a commitment. + +1. **This RFC.** Land the design surface, confirm the boundaries + (docs-only, no charts, no executor change yet), and let maintainers + accept, reshape, or decline the direction. +2. **Executor strategy decision.** Before any manifest exists, settle + the single hardest question in a follow-up RFC: how flow workers + run on Kubernetes (native Pods/Jobs vs DinD vs sandboxed runtime), + and what privilege and RBAC that implies. Everything else depends + on this. +3. **Stateless-core reference manifests.** Once the executor decision + exists, a minimal, clearly-labeled reference for the parts that do + translate cleanly -- backend Deployment/Service, Postgres via + StatefulSet or external, Secrets/ConfigMaps, probes, a migration + Job -- explicitly marked experimental and excluding flow execution. +4. **Flow execution on the chosen model.** Implement the executor + decision from step 2 behind the existing Docker path, so Compose + keeps working unchanged and Kubernetes execution is additive and + opt-in. +5. **Packaging and operator guide.** Only after the above is proven, + consider a Helm chart or operator and a Kubernetes operator guide + (in the spirit of the existing `examples/` material), so packaging + lands on top of a working deployment rather than ahead of it. + +Each step is self-contained: maintainers can stop after any step +without leaving PentAGI in a half-migrated state, and Compose remains +the supported path throughout. + +## Open Questions + +- Which executor model should PentAGI target first -- Kubernetes-native + Pods/Jobs, a DinD sidecar, or a sandboxed runtime -- and is it worth + supporting more than one? +- Should Postgres (and pgvector) run in-cluster as a StatefulSet, or + should the Kubernetes path assume an external managed database? +- Should TLS continue to terminate in the backend, or move to an + Ingress / Gateway with cert-manager? +- How should the startup goose migration be handled under multiple + backend replicas -- a gating migration Job, an init container, or an + enforced single-writer? +- What RBAC is acceptable for the backend's ServiceAccount if it + creates worker Pods/Jobs, and how is that least-privileged? +- Should the observability stack be bundled, or should PentAGI default + to emitting into the operator's existing cluster observability? +- Is Helm, an operator, or plain manifests the right packaging once a + working deployment exists, and which should ship first? +- How should air-gapped and mirror deployments be expressed on + Kubernetes, reusing the existing image-override mechanism? +- What is the minimum Kubernetes version and feature set + (StorageClass, Ingress/Gateway, NetworkPolicy support) a future + reference deployment should assume? + +## Security and Operational Considerations + +Moving PentAGI onto Kubernetes changes its security posture, and the +changes should be designed in rather than discovered later. + +- **Privilege of the executor.** The current model effectively grants + the backend host-level container control via the Docker socket. Any + Kubernetes equivalent (pod-create RBAC, a privileged DinD sidecar, + or a sandboxed runtime) carries comparable or different risk. The + privilege level must be explicit, least-privilege, and visible to + operators -- not an unstated side effect of "making it work." +- **RBAC and namespacing.** If the backend creates worker Pods/Jobs, + its ServiceAccount needs scoped permissions in a dedicated + namespace, never cluster-admin. Flow workers should be confined to + that namespace with their own constrained ServiceAccount. +- **Untrusted workloads.** Flow workers run agent-driven, untrusted + commands. On Kubernetes that argues for pod security standards, + seccomp/AppArmor profiles, dropped capabilities, and a sandboxed + runtime for worker pods, rather than running them as ordinary + privileged pods. +- **Secret handling.** Kubernetes `Secret`s are base64, not encrypted, + at rest by default. A future design should call out + encryption-at-rest, optional external secret managers, and the fact + that provider keys and the DB DSN are sensitive. No secret should be + baked into an image or committed to a manifest. +- **Network segmentation.** The implicit Compose-network isolation + should be reproduced with explicit `NetworkPolicy`, defaulting to + deny and opening only the required backend-to-Postgres, + backend-to-scraper, and worker egress paths. +- **No unsafe defaults.** Any future reference deployment must not + default to a privileged or host-network pod, must not expose the + backend publicly without TLS, and must not widen RBAC for + convenience. The Compose default already binds the backend to + loopback; the Kubernetes default should be equally conservative. +- **Inspectable lifecycle.** Per the PR + [#268](https://github.com/vxcontrol/pentagi/pull/268) lesson, flow + execution state on Kubernetes should be representable as real + objects an operator can list and delete, so a stuck or runaway flow + is visible and stoppable through the cluster API rather than only + through backend internals. + +## Test and Validation Strategy + +A future implementation should be validated against the points below +before being described as anything more than experimental. This RFC +itself is validated only as documentation. + +- **Local clusters.** Bring-up and teardown on kind and minikube as + the baseline developer-facing validation, since they need no cloud + account. +- **Manifest and chart linting.** If/when manifests or a chart exist, + `kubectl apply --dry-run=server`, `kubeconform` (or equivalent), and + `helm lint` / `helm template` in CI before anything is published. +- **Migration validation.** Verify the chosen migration approach is + safe under a rolling update with more than one backend replica, so + goose does not run concurrently from multiple pods. +- **End-to-end flow test.** Run at least one real flow on the chosen + executor model and confirm worker Pods/Jobs are created, complete, + are cleaned up, and are visible via `kubectl` for their lifetime. +- **Security review.** Run pod security and RBAC checks (for example + with a policy linter) to confirm least-privilege, deny-by-default + network policies, and no privileged or host-network defaults. +- **Compose parity guard.** Confirm the existing Docker Compose path + is unchanged and still the supported default, so the Kubernetes work + remains additive and opt-in throughout. + +## References + +- Issue [#324](https://github.com/vxcontrol/pentagi/issues/324): + Kubernetes deployment request. +- PR [#268](https://github.com/vxcontrol/pentagi/pull/268): source of + the explicit-lifecycle / no-hidden-state lesson carried forward + here. +- `docker-compose.yml`: current service topology, the Docker-socket + mount, the `root:root` executor, named volumes, and networks + described in "Current Deployment Assumptions." +- The README sections on Docker image configuration and on restricted + networks, Docker mirrors, and proxies: the existing image-override + mechanism reused under "Image overrides."