-
Notifications
You must be signed in to change notification settings - Fork 2.4k
docs: add Kubernetes deployment compatibility RFC #326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mason5052
wants to merge
2
commits into
vxcontrol:main
Choose a base branch
from
mason5052:codex/issue-324-kubernetes-rfc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,360 @@ | ||
| # Kubernetes Deployment RFC | ||
|
|
||
| ## Summary | ||
|
|
||
| Issue [#324](https://github.com/vxcontrol/pentagi/issues/324) asks | ||
| whether PentAGI can run on Kubernetes. Today PentAGI is built and | ||
| documented around Docker Compose and the installer, and there is no | ||
| supported Kubernetes path. This RFC sketches what a future, | ||
| incremental Kubernetes-compatibility effort could look like and names | ||
| the parts of the current design that make it non-trivial. | ||
|
|
||
| This document does not implement runtime behavior. It does not add | ||
| Helm charts, Kubernetes manifests, Kustomize bases, an operator, or | ||
| CRDs. It does not change `docker-compose.yml`, the installer, the | ||
| backend, the database schema, or any environment variable. It does | ||
| not claim that PentAGI runs on Kubernetes today, because it does not. | ||
| It is a design surface for maintainers to push back on before any | ||
| deployment code lands. | ||
|
|
||
| The RFC is intentionally staged and docs-first. PentAGI's flow | ||
| executor currently talks to a Docker daemon over a bind-mounted | ||
| socket, and that single fact drives most of the difficulty below. A | ||
| naive "wrap the containers in a Deployment" approach would either | ||
| break flow execution or smuggle in implicit, hard-to-inspect | ||
| lifecycle behavior -- close to the patterns pushed back on during PR | ||
| [#268](https://github.com/vxcontrol/pentagi/pull/268) review. The | ||
| proposed path below stays explicit and reviewable: every Kubernetes | ||
| resource a future implementation would create should be something an | ||
| operator can see with `kubectl`, not a hidden background mechanism. | ||
|
|
||
| ## Goals | ||
|
|
||
| - Capture, in one place, the concrete reasons PentAGI does not run on | ||
| Kubernetes today, grounded in the current Compose and installer | ||
| design rather than in guesswork. | ||
| - Map each Compose-era assumption (secrets, volumes, service | ||
| discovery, TLS, health, networking, the container executor, | ||
| observability, image selection, migrations) to its candidate | ||
| Kubernetes equivalent. | ||
| - Identify the one genuinely hard problem -- the Docker-socket flow | ||
| executor -- and lay out candidate approaches with their trade-offs, | ||
| without choosing one. | ||
| - Propose an incremental, docs-first path so that any later | ||
| implementation can be reviewed in small, self-contained slices. | ||
| - Keep operators in control of secrets, persistence, network reach, | ||
| and the privilege level of flow execution at every step. | ||
| - Give maintainers a single artifact to accept, reject, or reshape | ||
| before any chart, manifest, or operator code is written. | ||
|
|
||
| ## Non-Goals | ||
|
|
||
| - This RFC does not add Helm charts, raw manifests, Kustomize | ||
| overlays, an operator, or CRDs. No deployment artifact ships with | ||
| this document. | ||
| - This RFC does not modify `docker-compose.yml`, the installer, or the | ||
| current supported deployment path. Compose remains the only | ||
| supported deployment model until maintainers decide otherwise. | ||
| - This RFC does not add, rename, or change any environment variable, | ||
| and does not change any default in `.env.example` or the backend | ||
| configuration. | ||
| - This RFC does not change the backend, the database schema, the | ||
| generated code, the GraphQL or REST surface, or the flow executor. | ||
| - This RFC does not propose hidden background orchestration, an | ||
| implicit queue, or out-of-band lifecycle state to make Kubernetes | ||
| work. Carrying forward the explicit lesson from PR | ||
| [#268](https://github.com/vxcontrol/pentagi/pull/268) review: any | ||
| future Kubernetes resource (Pod, Job, PVC, Secret) must be visible | ||
| and manageable through the standard Kubernetes API, not buried in | ||
| process memory. | ||
| - This RFC does not claim parity with the Compose deployment. Some | ||
| capabilities (notably the privileged Docker-socket executor) may | ||
| never map cleanly, and this document does not promise that they | ||
| will. | ||
| - This RFC does not pick a single executor strategy, a single | ||
| ingress controller, a single storage class, or a single secret | ||
| backend. Those are deferred to a later implementation RFC. | ||
|
|
||
| ## Current Deployment Assumptions | ||
|
|
||
| This section describes how PentAGI is deployed today, because the | ||
| Kubernetes considerations only make sense against the current shape. | ||
| Everything here is drawn from `docker-compose.yml`, the installer | ||
| docs, and the backend, not from a hypothetical setup. | ||
|
|
||
| - **Compose-oriented topology.** The supported deployment is Docker | ||
| Compose (directly or via the installer). The core stack is the | ||
| `pentagi` backend, a `pgvector` PostgreSQL instance, a `pgexporter` | ||
| metrics sidecar, and a `scraper` service. Optional stacks add | ||
| Graphiti / Neo4j, Langfuse, and an observability bundle | ||
| (OpenTelemetry collector, Grafana, VictoriaMetrics, and friends). | ||
| - **The flow executor uses the Docker socket.** This is the central | ||
| fact for Kubernetes. The `pentagi` service bind-mounts the host | ||
| Docker socket (`${PENTAGI_DOCKER_SOCKET:-/var/run/docker.sock}` to | ||
| `/var/run/docker.sock`) and the backend's Docker client connects | ||
| via `client.FromEnv`, honoring | ||
| `DOCKER_HOST` (default `unix:///var/run/docker.sock`). During a | ||
| flow, PentAGI creates and destroys terminal/worker containers | ||
| against that daemon. The executor is effectively "talk to a Docker | ||
| daemon and spawn sibling containers," not "run one long-lived | ||
| process." | ||
| - **Elevated privileges by design.** Because the backend drives the | ||
| Docker socket, the `pentagi` service runs `user: root:root` | ||
| (commented in-file as "while using docker.sock") and carries | ||
| Docker-related toggles (`DOCKER_INSIDE`, `DOCKER_NET_ADMIN`, | ||
| `DOCKER_GID=998`, `DOCKER_WORK_DIR`). This privilege level is | ||
| intrinsic to the current executor, not incidental. | ||
| - **Local-named volumes for state.** Persistent state uses Docker | ||
| local volumes: `pentagi-data` mounted at `/opt/pentagi/data`, | ||
| Postgres data in `pentagi-postgres-data`, plus `pentagi-ssl`, | ||
| `scraper-ssl`, and `pentagi-ollama`. These assume a single host | ||
| with local volume drivers. | ||
| - **Configuration and secrets via `.env`.** Provider keys, the | ||
| database DSN, embedding settings, TLS material, and feature toggles | ||
| are passed as environment variables sourced from `.env`. There is no | ||
| externalized secret store in the default path; the env file is the | ||
| source of truth. | ||
| - **Service discovery by Compose DNS.** Services find each other by | ||
| Compose service name on user-defined bridge networks | ||
| (`pentagi-network`, and the optional `observability-network` and | ||
| `langfuse-network`). The backend reaches Postgres, the scraper, and | ||
| optional services by name. | ||
| - **TLS terminates at the backend.** The backend listens on `8443` | ||
| and is published to `${PENTAGI_LISTEN_IP:-127.0.0.1}:8443`, | ||
| defaulting to loopback. There is no separate ingress or reverse | ||
| proxy in the core stack; TLS is handled inside the container. | ||
| - **Health via Compose healthchecks.** Ordering uses Compose | ||
| `depends_on` with `condition: service_healthy` (for example the | ||
| backend waits on `pgvector`). Health is expressed as container | ||
| healthchecks, not as orchestrator probes. | ||
| - **Database migrations on startup.** The backend embeds its SQL | ||
| migrations and runs them with goose at process start | ||
| (`goose.Up`). There is no separate migration step; the backend | ||
| migrates itself when it boots. | ||
| - **Image selection via env override.** The backend image is | ||
| `${PENTAGI_IMAGE:-vxcontrol/pentagi:latest}`, and worker/tool images | ||
| are similarly overridable. Air-gapped and mirror setups already rely | ||
| on these overrides (see the README's note on restricted networks, | ||
| Docker mirrors, and proxies). | ||
|
|
||
| ## Kubernetes Compatibility Considerations | ||
|
|
||
| For each Compose-era assumption above, this section names the | ||
| candidate Kubernetes equivalent and the friction. Nothing here is a | ||
| committed design; it is a map of the problem space. | ||
|
|
||
| - **Secrets and configuration.** The `.env` model maps to Kubernetes | ||
| `Secret` objects (provider keys, DB credentials, TLS material) and | ||
| `ConfigMap` objects (non-secret toggles). This is mostly mechanical. | ||
| The open part is whether to keep a flat env-injection model | ||
| (`envFrom` a Secret/ConfigMap) or move toward referenced secrets, | ||
| and whether to integrate external secret managers. No change to the | ||
| variable names themselves is needed. | ||
| - **Persistent volumes.** The local-named volumes map to | ||
| `PersistentVolumeClaim`s backed by a cluster `StorageClass`. Postgres | ||
| state in particular wants a `StatefulSet` with a stable claim, or an | ||
| external managed Postgres. The friction is that several volumes today | ||
| assume single-host locality and `ReadWriteOnce` semantics; a future | ||
| design has to be explicit about access modes and about whether | ||
| Postgres is in-cluster or external. | ||
| - **Service discovery.** Compose service-name DNS maps cleanly to | ||
| Kubernetes `Service` objects and in-cluster DNS. This is among the | ||
| lowest-friction items; the backend would address Postgres and the | ||
| scraper by Service name instead of Compose name. | ||
| - **Ingress and TLS.** Today TLS terminates in the backend on `8443` | ||
| bound to loopback. On Kubernetes the candidate is an `Ingress` (or | ||
| Gateway API) with TLS via cert-manager, or preserving in-pod TLS and | ||
| exposing it through a passthrough Service. The open question is | ||
| whether to keep TLS in the backend or move termination to the edge; | ||
| both are viable and have different operational profiles. | ||
| - **Health checks.** Compose healthchecks and `depends_on` map to | ||
| `readinessProbe` and `livenessProbe`. Startup ordering that Compose | ||
| expresses with `service_healthy` becomes readiness-gated rollout | ||
| plus application-level retry, since Kubernetes does not block one | ||
| workload's start on another's health the way Compose does. | ||
| - **Network policies.** The implicit isolation of Compose user-defined | ||
| networks maps to Kubernetes `NetworkPolicy`. This is an opportunity | ||
| to make the currently-implicit segmentation explicit, but it is also | ||
| net-new surface that has to be designed rather than translated. | ||
| - **Flow / container execution model (the hard problem).** This is the | ||
| item that does not translate mechanically. The backend expects a | ||
| Docker daemon and spawns sibling containers over the socket. | ||
| Kubernetes does not hand workloads a Docker socket, and modern | ||
| clusters do not run Docker as the node runtime. Candidate | ||
| approaches, each with real trade-offs and none free of cost: | ||
| - **Kubernetes-native execution.** Teach the executor to create | ||
| ephemeral `Pod`s or `Job`s through the Kubernetes API instead of | ||
| Docker containers. Most idiomatic and the most inspectable | ||
| (`kubectl get pods/jobs` shows exactly what a flow is running), | ||
| but the largest backend change, and it requires an in-cluster | ||
| `ServiceAccount` with pod-create RBAC, which is its own risk. | ||
| - **Docker-in-Docker sidecar.** Run a DinD daemon next to the | ||
| backend and keep the existing socket-based executor. Smallest | ||
| backend change, but DinD typically needs a privileged container, | ||
| has known stability and storage caveats, and concentrates risk in | ||
| one privileged pod. | ||
| - **Sandboxed runtimes.** Pair Kubernetes-native execution with a | ||
| stronger isolation runtime (gVisor, Kata, sysbox, or similar) for | ||
| the worker pods, since flow workers run untrusted, agent-driven | ||
| commands. This is a hardening layer on top of native execution, | ||
| not an alternative to it. | ||
| Whatever is chosen, the PR | ||
| [#268](https://github.com/vxcontrol/pentagi/pull/268) lesson | ||
| applies: the running work must be visible and manageable through | ||
| standard Kubernetes objects, not tracked only inside the backend | ||
| process. | ||
| - **Observability.** The optional OpenTelemetry / Grafana / | ||
| VictoriaMetrics stack maps to in-cluster deployments or, more | ||
| likely, to whatever the operator's cluster already runs. The | ||
| candidate direction is to make PentAGI emit to existing cluster | ||
| observability rather than bundling its own, with the bundled stack | ||
| as an opt-in for clusters that have none. | ||
| - **Image overrides.** The existing `PENTAGI_IMAGE` and related | ||
| per-image overrides map directly to image fields in pod specs, which | ||
| is helpful for air-gapped and mirror deployments. This is | ||
| low-friction and reuses an existing mechanism rather than inventing | ||
| one. | ||
| - **Upgrade and migration path.** Because the backend runs goose | ||
| migrations on startup, a rolling update could run migrations from | ||
| whichever replica starts first. On Compose with a single backend | ||
| this is fine; on Kubernetes with multiple replicas it is not. A | ||
| future design needs an explicit decision: a one-shot migration | ||
| `Job` (or init container) gated ahead of the rollout, or an | ||
| enforced single-writer constraint. This must be settled before any | ||
| multi-replica backend deployment is suggested. | ||
|
|
||
| ## Proposed Incremental Path | ||
|
|
||
| The path is deliberately docs-first so each step is small enough to | ||
| review and reject in isolation. No step below is started by this RFC; | ||
| this is the proposed sequence, not a commitment. | ||
|
|
||
| 1. **This RFC.** Land the design surface, confirm the boundaries | ||
| (docs-only, no charts, no executor change yet), and let maintainers | ||
| accept, reshape, or decline the direction. | ||
| 2. **Executor strategy decision.** Before any manifest exists, settle | ||
| the single hardest question in a follow-up RFC: how flow workers | ||
| run on Kubernetes (native Pods/Jobs vs DinD vs sandboxed runtime), | ||
| and what privilege and RBAC that implies. Everything else depends | ||
| on this. | ||
| 3. **Stateless-core reference manifests.** Once the executor decision | ||
| exists, a minimal, clearly-labeled reference for the parts that do | ||
| translate cleanly -- backend Deployment/Service, Postgres via | ||
| StatefulSet or external, Secrets/ConfigMaps, probes, a migration | ||
| Job -- explicitly marked experimental and excluding flow execution. | ||
| 4. **Flow execution on the chosen model.** Implement the executor | ||
| decision from step 2 behind the existing Docker path, so Compose | ||
| keeps working unchanged and Kubernetes execution is additive and | ||
| opt-in. | ||
| 5. **Packaging and operator guide.** Only after the above is proven, | ||
| consider a Helm chart or operator and a Kubernetes operator guide | ||
| (in the spirit of the existing `examples/` material), so packaging | ||
| lands on top of a working deployment rather than ahead of it. | ||
|
|
||
| Each step is self-contained: maintainers can stop after any step | ||
| without leaving PentAGI in a half-migrated state, and Compose remains | ||
| the supported path throughout. | ||
|
|
||
| ## Open Questions | ||
|
|
||
| - Which executor model should PentAGI target first -- Kubernetes-native | ||
| Pods/Jobs, a DinD sidecar, or a sandboxed runtime -- and is more than | ||
| one worth supporting? | ||
| - Should Postgres (and pgvector) run in-cluster as a StatefulSet, or | ||
| should the Kubernetes path assume an external managed database? | ||
| - Should TLS continue to terminate in the backend, or move to an | ||
| Ingress / Gateway with cert-manager? | ||
| - How should the startup goose migration be handled under multiple | ||
| backend replicas -- a gating migration Job, an init container, or an | ||
| enforced single-writer? | ||
| - What RBAC is acceptable for the backend's ServiceAccount if it | ||
| creates worker Pods/Jobs, and how is that least-privileged? | ||
| - Should the observability stack be bundled, or should PentAGI default | ||
| to emitting into the operator's existing cluster observability? | ||
| - Is Helm, an operator, or plain manifests the right packaging once a | ||
| working deployment exists, and which should ship first? | ||
| - How should air-gapped and mirror deployments be expressed on | ||
| Kubernetes, reusing the existing image-override mechanism? | ||
| - What is the minimum Kubernetes version and feature set | ||
| (StorageClass, Ingress/Gateway, NetworkPolicy support) a future | ||
| reference deployment should assume? | ||
|
|
||
| ## Security and Operational Considerations | ||
|
|
||
| Moving PentAGI onto Kubernetes changes its security posture, and the | ||
| changes should be designed in rather than discovered later. | ||
|
|
||
| - **Privilege of the executor.** The current model effectively grants | ||
| the backend host-level container control via the Docker socket. Any | ||
| Kubernetes equivalent (pod-create RBAC, a privileged DinD sidecar, | ||
| or a sandboxed runtime) carries comparable or different risk. The | ||
| privilege level must be explicit, least-privilege, and visible to | ||
| operators -- not an unstated side effect of "making it work." | ||
| - **RBAC and namespacing.** If the backend creates worker Pods/Jobs, | ||
| its ServiceAccount needs scoped permissions in a dedicated | ||
| namespace, never cluster-admin. Flow workers should be confined to | ||
| that namespace with their own constrained ServiceAccount. | ||
| - **Untrusted workloads.** Flow workers run agent-driven, untrusted | ||
| commands. On Kubernetes that argues for pod security standards, | ||
| seccomp/AppArmor profiles, dropped capabilities, and a sandboxed | ||
| runtime for worker pods, rather than running them as ordinary | ||
| privileged pods. | ||
| - **Secret handling.** Kubernetes `Secret`s are base64, not encrypted, | ||
| at rest by default. A future design should call out | ||
| encryption-at-rest, optional external secret managers, and the fact | ||
| that provider keys and the DB DSN are sensitive. No secret should be | ||
| baked into an image or committed to a manifest. | ||
| - **Network segmentation.** The implicit Compose-network isolation | ||
| should be reproduced with explicit `NetworkPolicy`, defaulting to | ||
| deny and opening only the required backend-to-Postgres, | ||
| backend-to-scraper, and worker egress paths. | ||
| - **No unsafe defaults.** Any future reference deployment must not | ||
| default to a privileged or host-network pod, must not expose the | ||
| backend publicly without TLS, and must not widen RBAC for | ||
| convenience. The Compose default already binds the backend to | ||
| loopback; the Kubernetes default should be equally conservative. | ||
| - **Inspectable lifecycle.** Per the PR | ||
| [#268](https://github.com/vxcontrol/pentagi/pull/268) lesson, flow | ||
| execution state on Kubernetes should be representable as real | ||
| objects an operator can list and delete, so a stuck or runaway flow | ||
| is visible and stoppable through the cluster API rather than only | ||
| through backend internals. | ||
|
|
||
| ## Test and Validation Strategy | ||
|
|
||
| A future implementation should be validated against the points below | ||
| before being described as anything more than experimental. This RFC | ||
| itself is validated only as documentation. | ||
|
|
||
| - **Local clusters.** Bring-up and teardown on kind and minikube as | ||
| the baseline developer-facing validation, since they need no cloud | ||
| account. | ||
| - **Manifest and chart linting.** If/when manifests or a chart exist, | ||
| `kubectl apply --dry-run=server`, `kubeconform` (or equivalent), and | ||
| `helm lint` / `helm template` in CI before anything is published. | ||
| - **Migration validation.** Verify the chosen migration approach is | ||
| safe under a rolling update with more than one backend replica, so | ||
| goose does not run concurrently from multiple pods. | ||
| - **End-to-end flow test.** Run at least one real flow on the chosen | ||
| executor model and confirm worker Pods/Jobs are created, complete, | ||
| are cleaned up, and are visible via `kubectl` for their lifetime. | ||
| - **Security review.** Run pod security and RBAC checks (for example | ||
| with a policy linter) to confirm least-privilege, deny-by-default | ||
| network policies, and no privileged or host-network defaults. | ||
| - **Compose parity guard.** Confirm the existing Docker Compose path | ||
| is unchanged and still the supported default, so the Kubernetes work | ||
| remains additive and opt-in throughout. | ||
|
|
||
| ## References | ||
|
|
||
| - Issue [#324](https://github.com/vxcontrol/pentagi/issues/324): | ||
| Kubernetes deployment request. | ||
| - PR [#268](https://github.com/vxcontrol/pentagi/pull/268): source of | ||
| the explicit-lifecycle / no-hidden-state lesson carried forward | ||
| here. | ||
| - `docker-compose.yml`: current service topology, the Docker-socket | ||
| mount, the `root:root` executor, named volumes, and networks | ||
| described in "Current Deployment Assumptions." | ||
| - The README sections on Docker image configuration and on restricted | ||
| networks, Docker mirrors, and proxies: the existing image-override | ||
| mechanism reused under "Image overrides." | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.