Problem Statement
OpenShell now has VM-specific lifecycle extension hooks via #1583, but that extension model is local to the VM driver. Other core compute drivers such as Kubernetes, Docker, and Podman have their own lifecycle flows and driver-specific configuration, but they do not expose a common way for operator-installed integrations to participate in validation, plan construction, create/start failure handling, cleanup, or reconciliation.
This would also allow potential external drivers to easily wrap an existing driver instead of reimplementing everything.
Example: a managed Kubernetes driver.
OpenShell gateway
-> external driver: acme-kubernetes
-> delegates base lifecycle to core Kubernetes driver
-> injects custom lifecycle extensions
Proposed Design
OpenShell should define a common lifecycle extension interface for core compute drivers. Each driver keeps ownership of its native implementation, but exposes a small set of lifecycle phases where operator-installed extensions can validate requests, inspect or mutate a driver-owned plan, allocate side resources, handle create failure, clean up on delete, and reconcile after restart.
The VM driver's existing lifecycle hooks should become the reference implementation. Kubernetes could be the first non-VM target, with a mutable plan covering the Sandbox CR, pod template, namespace, labels/annotations, Services, NetworkPolicies, and resource claims.
External compute drivers can either fully implement the compute-driver protocol or just provide their specific extensions to an existing compute driver.
Alternatives Considered
Fully external compute drivers were also considered. That works for completely new backends, but it is too heavy when the desired behavior is mostly an existing driver plus site-specific customization, such as Kubernetes with tenant namespaces, DRA claims, IPAM, or NetworkPolicy.
Agent Investigation
The investigation found that this is feasible, but should be treated as an
RFC/design feature rather than a small implementation change.
Key findings:
- OpenShell already has a common internal compute-driver protocol in
proto/compute_driver.proto, with lifecycle RPCs for validate, create, stop,
delete, get, list, and watch.
- The gateway wraps drivers through
ComputeRuntime, but driver selection is
still tied to the fixed ComputeDriverKind enum: kubernetes, vm,
docker, and podman.
- VM has the strongest precedent:
openshell-driver-vm defines
LifecycleExtension, LifecycleExtensionRegistry, LaunchPlan, activation
labels, failure cleanup, delete cleanup, and restore hooks.
- Kubernetes, Docker, and Podman each have clear lifecycle and plan-building
points, but no shared hook abstraction. Kubernetes is the best first non-VM
target because it already builds a mutable Sandbox CR / pod template before
create.
driver_config is already selected and forwarded per active driver, but it is
request data. It does not provide lifecycle ordering, rollback, cleanup,
reconciliation, or audit semantics.
- External/wrapper drivers need a config identity model so wrapper config,
base-driver config, and extension config do not get conflated.
- Main risks are API shape, security boundaries, rollback/idempotency,
capability discovery, and how extension activation is authorized.
- Expected tests include hook ordering, unsupported capability behavior,
rollback on create failure, idempotent cleanup, Kubernetes plan mutation, and
regression coverage for the existing VM hooks.
Checklist
Problem Statement
OpenShell now has VM-specific lifecycle extension hooks via #1583, but that extension model is local to the VM driver. Other core compute drivers such as Kubernetes, Docker, and Podman have their own lifecycle flows and driver-specific configuration, but they do not expose a common way for operator-installed integrations to participate in validation, plan construction, create/start failure handling, cleanup, or reconciliation.
This would also allow potential external drivers to easily wrap an existing driver instead of reimplementing everything.
Example: a managed Kubernetes driver.
Proposed Design
OpenShell should define a common lifecycle extension interface for core compute drivers. Each driver keeps ownership of its native implementation, but exposes a small set of lifecycle phases where operator-installed extensions can validate requests, inspect or mutate a driver-owned plan, allocate side resources, handle create failure, clean up on delete, and reconcile after restart.
The VM driver's existing lifecycle hooks should become the reference implementation. Kubernetes could be the first non-VM target, with a mutable plan covering the Sandbox CR, pod template, namespace, labels/annotations, Services, NetworkPolicies, and resource claims.
External compute drivers can either fully implement the compute-driver protocol or just provide their specific extensions to an existing compute driver.
Alternatives Considered
Fully external compute drivers were also considered. That works for completely new backends, but it is too heavy when the desired behavior is mostly an existing driver plus site-specific customization, such as Kubernetes with tenant namespaces, DRA claims, IPAM, or NetworkPolicy.
Agent Investigation
The investigation found that this is feasible, but should be treated as an
RFC/design feature rather than a small implementation change.
Key findings:
proto/compute_driver.proto, with lifecycle RPCs for validate, create, stop,delete, get, list, and watch.
ComputeRuntime, but driver selection isstill tied to the fixed
ComputeDriverKindenum:kubernetes,vm,docker, andpodman.openshell-driver-vmdefinesLifecycleExtension,LifecycleExtensionRegistry,LaunchPlan, activationlabels, failure cleanup, delete cleanup, and restore hooks.
points, but no shared hook abstraction. Kubernetes is the best first non-VM
target because it already builds a mutable Sandbox CR / pod template before
create.
driver_configis already selected and forwarded per active driver, but it isrequest data. It does not provide lifecycle ordering, rollback, cleanup,
reconciliation, or audit semantics.
base-driver config, and extension config do not get conflated.
capability discovery, and how extension activation is authorized.
rollback on create failure, idempotent cleanup, Kubernetes plan mutation, and
regression coverage for the existing VM hooks.
Checklist