Spike: Composable Lifecycle Extensions Across Compute Drivers
Problem Statement
OpenShell now has VM-specific lifecycle extension hooks via
#1583, but the extension model is local to the VM driver. The
same integration class exists for Kubernetes, Docker, Podman, BlueField/DPU, and
future external drivers: operators want to reuse the built-in driver lifecycle
while injecting deployment-specific validation, plan mutation, resource
allocation, network attribution, and cleanup.
This should be generalized as a compute-driver lifecycle extension model, with
VM as the reference implementation rather than the only implementation.
Technical Context
The gateway already has a common internal compute-driver protocol. The common
surface is compute_driver.proto, and ComputeRuntime translates public
Sandbox resources into DriverSandbox messages before calling a selected
driver. That protocol currently has lifecycle RPCs for validate, create, stop,
delete, get, list, and watch, but it does not expose composable hook phases or
driver plan mutation points.
The VM driver has a more expressive in-process extension framework. It defines a
LifecycleExtension trait, a LifecycleExtensionRegistry, a mutable
LaunchPlan, activation by policy-stamped labels, and failure/delete/restore
hooks. Kubernetes, Docker, and Podman instead have driver-local plan builders and
driver-specific driver_config parsing. Those are useful, but they do not let an
operator-installed extension compose with the existing driver implementation.
The design target is not "external drivers replace everything." It is "external
or specialized drivers can wrap or reuse existing drivers and customize only the
parts that differ."
Affected Components
| Component |
Key Files |
Role |
| Internal compute-driver protocol |
proto/compute_driver.proto |
Defines the gateway-to-driver lifecycle RPCs and DriverSandboxTemplate fields. |
| Gateway compute runtime |
crates/openshell-server/src/compute/mod.rs |
Selects the active driver, translates public sandbox templates, forwards selected driver_config, and calls the compute-driver RPC surface. |
| Gateway driver selection |
crates/openshell-server/src/lib.rs, crates/openshell-core/src/config.rs, crates/openshell-server/src/config_file.rs |
Still switches over fixed ComputeDriverKind values and per-driver config tables. |
| VM lifecycle extensions |
crates/openshell-driver-vm/src/lifecycle.rs, crates/openshell-driver-vm/src/driver.rs |
Current reference implementation for lifecycle extension hooks, mutable launch plans, activation, rollback, delete, and restore. |
| Kubernetes driver |
crates/openshell-driver-kubernetes/src/driver.rs, crates/openshell-driver-kubernetes/src/grpc.rs |
Builds Kubernetes Sandbox CR and pod template JSON, parses Kubernetes driver_config, and is the strongest first non-VM target. |
| Docker driver |
crates/openshell-driver-docker/src/lib.rs |
Builds Docker container create requests, validates mount/CDI/resource config, and already has a narrow driver_config surface. |
| Podman driver |
crates/openshell-driver-podman/src/driver.rs, crates/openshell-driver-podman/src/container.rs |
Builds Podman libpod specs, manages volumes/token files, and already has a narrow driver_config surface. |
| Driver utility contracts |
crates/openshell-core/src/driver_utils.rs |
Shared driver labels, supervisor mount paths, token paths, and capability response helper. |
| Architecture/docs |
architecture/compute-runtimes.md, architecture/README.md, docs/reference/sandbox-compute-drivers.mdx, docs/reference/gateway-config.mdx |
Documents runtime boundaries and user/operator-facing compute driver config. |
Technical Investigation
Architecture Overview
OpenShell has a gateway-owned compute orchestration layer over pluggable compute
drivers. The internal protocol is intentionally driver-native: proto/compute_driver.proto:10
states the file owns driver-native request/response/observation types, and
proto/compute_driver.proto:18 defines the ComputeDriver service.
The protocol supports lifecycle RPCs:
GetCapabilities at proto/compute_driver.proto:20
ValidateSandboxCreate at proto/compute_driver.proto:23
CreateSandbox at proto/compute_driver.proto:33
StopSandbox at proto/compute_driver.proto:36
DeleteSandbox at proto/compute_driver.proto:39
WatchSandboxes at proto/compute_driver.proto:42
The protocol does not currently model lifecycle extension phases, extension
capabilities, or a way to compose one driver with another.
The gateway wraps concrete drivers behind SharedComputeDriver at
crates/openshell-server/src/compute/mod.rs:50. ComputeRuntime::from_driver
stores the selected ComputeDriverKind and reads driver capabilities at
crates/openshell-server/src/compute/mod.rs:245. In-tree constructors wire
Docker, Kubernetes, VM, and Podman independently at
crates/openshell-server/src/compute/mod.rs:295,
crates/openshell-server/src/compute/mod.rs:330,
crates/openshell-server/src/compute/mod.rs:359, and
crates/openshell-server/src/compute/mod.rs:386.
Driver selection is still enum-bound. ComputeDriverKind supports only
kubernetes, vm, docker, and podman at
crates/openshell-core/src/config.rs:50. build_compute_runtime switches over
that enum at crates/openshell-server/src/lib.rs:701 and has a
TODO(driver-abstraction) at crates/openshell-server/src/lib.rs:12 saying the
per-driver wiring should eventually collapse to a driver-agnostic path. Config
inheritance is also enum-bound through driver_table and inheritable_keys at
crates/openshell-server/src/config_file.rs:225 and
crates/openshell-server/src/config_file.rs:258.
The gateway translates public sandbox templates into driver templates at
crates/openshell-server/src/compute/mod.rs:1374. It forwards only the matching
nested driver_config block for the selected driver at
crates/openshell-server/src/compute/mod.rs:1423. Tests at
crates/openshell-server/src/compute/mod.rs:1981 verify that only the matching
driver block is passed through and non-object blocks are rejected.
The existing VM lifecycle framework is VM-local. The extension activation label
prefix is defined at crates/openshell-driver-vm/src/lifecycle.rs:13.
LaunchPlan is defined at crates/openshell-driver-vm/src/lifecycle.rs:218.
LifecycleExtension is defined at crates/openshell-driver-vm/src/lifecycle.rs:308.
The registry validates names/descriptors at
crates/openshell-driver-vm/src/lifecycle.rs:505, runs configure_launch at
crates/openshell-driver-vm/src/lifecycle.rs:553, runs before_launch at
crates/openshell-driver-vm/src/lifecycle.rs:591, and runs cleanup/restore
hooks at crates/openshell-driver-vm/src/lifecycle.rs:603,
crates/openshell-driver-vm/src/lifecycle.rs:624,
crates/openshell-driver-vm/src/lifecycle.rs:637, and
crates/openshell-driver-vm/src/lifecycle.rs:644.
The VM driver wires those hooks into provisioning and cleanup. It calls
configure_launch around crates/openshell-driver-vm/src/driver.rs:727, calls
before_launch around crates/openshell-driver-vm/src/driver.rs:785, invokes
after_restore around crates/openshell-driver-vm/src/driver.rs:963, invokes
after_delete around crates/openshell-driver-vm/src/driver.rs:1043, and
invokes before_restore around crates/openshell-driver-vm/src/driver.rs:1204.
Kubernetes has a natural plan-building boundary but no extension hook around it.
KubernetesComputeDriver::create_sandbox builds a DynamicObject and assigns
obj.data = sandbox_to_k8s_spec(...) at
crates/openshell-driver-kubernetes/src/driver.rs:376 and
crates/openshell-driver-kubernetes/src/driver.rs:420. The pod template and
Sandbox CR spec are built in sandbox_to_k8s_spec at
crates/openshell-driver-kubernetes/src/driver.rs:1182 and
sandbox_template_to_k8s at
crates/openshell-driver-kubernetes/src/driver.rs:1249. Kubernetes-specific
driver_config is parsed at crates/openshell-driver-kubernetes/src/driver.rs:94
and applied to pod scheduling/resources at
crates/openshell-driver-kubernetes/src/driver.rs:1525 and
crates/openshell-driver-kubernetes/src/driver.rs:1553.
Docker and Podman show the same pattern in container-driver form. Docker parses
DockerSandboxDriverConfig at crates/openshell-driver-docker/src/lib.rs:273,
validates templates at crates/openshell-driver-docker/src/lib.rs:447, and
builds driver-owned mounts from config at
crates/openshell-driver-docker/src/lib.rs:1656. Podman parses
PodmanSandboxDriverConfig at
crates/openshell-driver-podman/src/container.rs:64, validates create requests
at crates/openshell-driver-podman/src/driver.rs:290, creates volumes/token
files/container state at crates/openshell-driver-podman/src/driver.rs:343, and
builds the libpod container spec in try_build_container_spec_with_token around
crates/openshell-driver-podman/src/driver.rs:426.
Current Behavior
Today, a deployment-specific integration has three imperfect options:
- Use
driver_config if the built-in driver already exposes the exact knob.
This works for simple typed driver options but does not support side effects,
external allocation, rollback, or composition.
- Implement a full external compute driver. This is appropriate for a new
backend, but too heavy for "Kubernetes plus platform-specific mutation" or
"Podman plus custom mounts/IPAM."
- Add bespoke hooks inside one driver. VM has already done this, but repeating
VM-specific patterns independently across Kubernetes/Docker/Podman would make
extension behavior inconsistent.
There is also a config identity problem for wrapper drivers. Public docs state
that compute_drivers currently accepts one of docker, podman,
kubernetes, or vm at docs/reference/sandbox-compute-drivers.mdx:17 and
docs/reference/sandbox-compute-drivers.mdx:24. driver_config is keyed by
driver name and the gateway forwards only the active driver's block. A wrapper
or external driver that delegates to Kubernetes must decide whether it consumes
driver_config.kubernetes, driver_config.<wrapper-name>, or both.
What Would Need to Change
The first issue should be an RFC/tracking issue, not an immediate broad
implementation. It should define a common lifecycle extension model that can be
implemented incrementally by individual drivers.
The likely shape:
- Add a shared lifecycle vocabulary in a common crate, probably
openshell-core or a new compute-extension module.
- Keep VM as the reference implementation and map current VM phases:
configure_launch, before_launch, after_launch_failed, after_delete,
before_restore, and after_restore.
- Define common phases such as
validate_request, prepare_plan,
before_create, after_create, on_create_failed, before_start,
after_ready, before_stop, after_stop, before_delete, after_delete,
and reconcile.
- Define a common hook context with sandbox ID/name, selected driver, template,
typed resources, driver config, policy/tenant/user metadata when available,
and audit/span context. Do not pass raw provider secrets by default.
- Define a driver-owned mutable plan abstraction. It can be typed per driver
rather than forcing one universal plan type.
- Add capability discovery so a driver can declare which hook phases and plan
mutation points it supports.
- Add composition guidance for external/specialized drivers: they should be able
to fully implement compute_driver.proto, or wrap/delegate to an existing
driver and apply extensions around that driver's plan.
- Define config identity for composed drivers: the wrapper's own config,
delegated base-driver config, and any policy-controlled extension config must
not be conflated.
- Choose Kubernetes as the first non-VM proof of concept because it already has
a clear native plan boundary and important use cases: namespace selection,
NetworkPolicy, Services, DRA/resource claims, pod mutation, labels,
annotations, placement, warm pools, and IP attribution.
Alternative Approaches Considered
Only extend driver_config. This is insufficient. driver_config is useful
for selected-driver schema knobs, but it is request data. It does not model
operator-installed code, external resource allocation, rollback, delete hooks,
capability discovery, or composition.
Only use external compute drivers. This works for complete replacement but
creates unnecessary forks when an integration wants 90 percent of the in-tree
Kubernetes/Docker/Podman/VM behavior.
Copy the VM trait into each driver. This would be fast but would ossify
VM-specific names like LaunchPlan and before_launch. A common contract should
be driver-neutral, with VM-specific adapters.
Change compute_driver.proto immediately. This may eventually be needed for
external driver capability discovery, but the first RFC can define in-process
driver composition and capability metadata before committing to protocol fields.
Patterns to Follow
- Keep the gateway/driver boundary clean.
DriverSandboxTemplate.driver_config
is already selected by the gateway and validated by the driver.
- Keep platform-specific schema inside the selected driver, following the
Kubernetes comment at crates/openshell-driver-kubernetes/src/driver.rs:88.
- Treat extensions as operator-installed, not user-supplied.
- Preserve fail-closed behavior for unsupported capabilities, invalid
activation keys, and unsafe config.
- Follow VM cleanup ordering: setup in registration order, cleanup in reverse
order, and idempotent cleanup hooks.
- Keep internal tracking labels and auth/security fields driver-owned, following
the Docker/Podman/Kubernetes patterns where managed labels and required env
vars override user input.
Proposed Approach
Create an RFC issue under the OpenShell extensibility umbrella that defines
composable lifecycle extensions across compute drivers. The RFC should use the
merged VM lifecycle implementation as reference art, but deliberately generalize
the vocabulary to driver-neutral lifecycle phases and driver-owned plans.
The first implementation target should be Kubernetes because the current driver
already builds a rich native plan before create, and the highest-value use cases
are Kubernetes-native: tenant namespace selection, DRA/resource claims,
NetworkPolicy/Service creation, warm pool/SandboxClaim integration, placement,
labels, annotations, and IP attribution.
The RFC should explicitly support external/specialized drivers reusing existing
drivers. A driver should be able to wrap the in-tree Kubernetes or VM driver and
contribute validation, plan mutation, allocation, and cleanup without
reimplementing the full compute lifecycle.
Scope Assessment
- Complexity: High
- Confidence: Medium-high for an RFC and Kubernetes-first proof of concept;
medium for a stable external-driver protocol shape.
- Estimated files to change for first implementation: 8-15 files, depending
on whether the first cut adds only in-process hooks or also extends
compute_driver.proto.
- Issue type:
feat
Risks & Open Questions
- Where should the common extension API live:
openshell-core, a new crate, or
per-driver adapter traits over a shared context?
- Should external compute drivers advertise hook capabilities through
GetCapabilities, a new RPC, or only through docs/config in the first cut?
- How much should extensions be allowed to mutate? Some fields must be immutable
after validation, especially identity labels, auth material, and gateway-owned
metadata.
- How should policy authorize extension activation? VM currently uses
policy-stamped labels. A cross-driver contract needs the same fail-closed
property.
- How should failures roll back multi-resource Kubernetes extensions, such as
creating a Service, NetworkPolicy, ResourceClaim, or SandboxClaim around the
Sandbox CR?
- How should tenant/user metadata be represented before the OIDC/tenant model is
fully settled?
- DRA-style resource requests need a clear boundary between user request,
policy-approved resource class, and driver/extension-created Kubernetes
objects.
- LSM impact is low for the RFC itself, but non-VM container hooks that add bind
mounts, device mounts, or /proc-visible behavior must account for SELinux and
AppArmor. Podman already has SELinux-aware mount handling in
crates/openshell-driver-podman/src/container.rs:17.
Test Considerations
- Shared unit tests for lifecycle ordering: validate/prepare/create/failure/delete
ordering, reverse cleanup order, no-op unsupported phases, and duplicate
extension names.
- VM regression tests should continue covering existing hook ordering, cleanup,
backend validation, and restore behavior.
- Kubernetes unit tests should exercise pod/Sandbox CR plan mutation before
create, including rejecting attempts to overwrite reserved labels/annotations.
- Failure tests should cover rollback when an extension fails after allocating a
resource but before the main driver create succeeds.
- Capability tests should verify unsupported hooks fail predictably or no-op
according to the RFC.
- External-driver tests may need a stub driver if
compute_driver.proto gains
capability discovery fields.
Documentation & Config Impact
architecture/compute-runtimes.md should document lifecycle extension phases
and the distinction between driver replacement and driver composition.
docs/reference/sandbox-compute-drivers.mdx should explain which drivers
support which lifecycle phases.
docs/reference/gateway-config.mdx must be updated if operator config is
added for extension registration, extension enablement, or external driver
composition.
- The compute-driver docs currently state that only one gateway driver is
selected and supported values are fixed. Those sections need updates if
wrapper/composed drivers become selectable.
- Kubernetes setup docs may need RBAC updates if extensions create resources
beyond Sandbox CRs, such as Service, NetworkPolicy, ResourceClaim, or
SandboxClaim.
Related Work
Created by spike investigation. Use build-from-issue to plan and implement
after human review.
Spike: Composable Lifecycle Extensions Across Compute Drivers
Problem Statement
OpenShell now has VM-specific lifecycle extension hooks via
#1583, but the extension model is local to the VM driver. The
same integration class exists for Kubernetes, Docker, Podman, BlueField/DPU, and
future external drivers: operators want to reuse the built-in driver lifecycle
while injecting deployment-specific validation, plan mutation, resource
allocation, network attribution, and cleanup.
This should be generalized as a compute-driver lifecycle extension model, with
VM as the reference implementation rather than the only implementation.
Technical Context
The gateway already has a common internal compute-driver protocol. The common
surface is
compute_driver.proto, andComputeRuntimetranslates publicSandboxresources intoDriverSandboxmessages before calling a selecteddriver. That protocol currently has lifecycle RPCs for validate, create, stop,
delete, get, list, and watch, but it does not expose composable hook phases or
driver plan mutation points.
The VM driver has a more expressive in-process extension framework. It defines a
LifecycleExtensiontrait, aLifecycleExtensionRegistry, a mutableLaunchPlan, activation by policy-stamped labels, and failure/delete/restorehooks. Kubernetes, Docker, and Podman instead have driver-local plan builders and
driver-specific
driver_configparsing. Those are useful, but they do not let anoperator-installed extension compose with the existing driver implementation.
The design target is not "external drivers replace everything." It is "external
or specialized drivers can wrap or reuse existing drivers and customize only the
parts that differ."
Affected Components
proto/compute_driver.protoDriverSandboxTemplatefields.crates/openshell-server/src/compute/mod.rsdriver_config, and calls the compute-driver RPC surface.crates/openshell-server/src/lib.rs,crates/openshell-core/src/config.rs,crates/openshell-server/src/config_file.rsComputeDriverKindvalues and per-driver config tables.crates/openshell-driver-vm/src/lifecycle.rs,crates/openshell-driver-vm/src/driver.rscrates/openshell-driver-kubernetes/src/driver.rs,crates/openshell-driver-kubernetes/src/grpc.rsdriver_config, and is the strongest first non-VM target.crates/openshell-driver-docker/src/lib.rsdriver_configsurface.crates/openshell-driver-podman/src/driver.rs,crates/openshell-driver-podman/src/container.rsdriver_configsurface.crates/openshell-core/src/driver_utils.rsarchitecture/compute-runtimes.md,architecture/README.md,docs/reference/sandbox-compute-drivers.mdx,docs/reference/gateway-config.mdxTechnical Investigation
Architecture Overview
OpenShell has a gateway-owned compute orchestration layer over pluggable compute
drivers. The internal protocol is intentionally driver-native:
proto/compute_driver.proto:10states the file owns driver-native request/response/observation types, and
proto/compute_driver.proto:18defines theComputeDriverservice.The protocol supports lifecycle RPCs:
GetCapabilitiesatproto/compute_driver.proto:20ValidateSandboxCreateatproto/compute_driver.proto:23CreateSandboxatproto/compute_driver.proto:33StopSandboxatproto/compute_driver.proto:36DeleteSandboxatproto/compute_driver.proto:39WatchSandboxesatproto/compute_driver.proto:42The protocol does not currently model lifecycle extension phases, extension
capabilities, or a way to compose one driver with another.
The gateway wraps concrete drivers behind
SharedComputeDriveratcrates/openshell-server/src/compute/mod.rs:50.ComputeRuntime::from_driverstores the selected
ComputeDriverKindand reads driver capabilities atcrates/openshell-server/src/compute/mod.rs:245. In-tree constructors wireDocker, Kubernetes, VM, and Podman independently at
crates/openshell-server/src/compute/mod.rs:295,crates/openshell-server/src/compute/mod.rs:330,crates/openshell-server/src/compute/mod.rs:359, andcrates/openshell-server/src/compute/mod.rs:386.Driver selection is still enum-bound.
ComputeDriverKindsupports onlykubernetes,vm,docker, andpodmanatcrates/openshell-core/src/config.rs:50.build_compute_runtimeswitches overthat enum at
crates/openshell-server/src/lib.rs:701and has aTODO(driver-abstraction)atcrates/openshell-server/src/lib.rs:12saying theper-driver wiring should eventually collapse to a driver-agnostic path. Config
inheritance is also enum-bound through
driver_tableandinheritable_keysatcrates/openshell-server/src/config_file.rs:225andcrates/openshell-server/src/config_file.rs:258.The gateway translates public sandbox templates into driver templates at
crates/openshell-server/src/compute/mod.rs:1374. It forwards only the matchingnested
driver_configblock for the selected driver atcrates/openshell-server/src/compute/mod.rs:1423. Tests atcrates/openshell-server/src/compute/mod.rs:1981verify that only the matchingdriver block is passed through and non-object blocks are rejected.
The existing VM lifecycle framework is VM-local. The extension activation label
prefix is defined at
crates/openshell-driver-vm/src/lifecycle.rs:13.LaunchPlanis defined atcrates/openshell-driver-vm/src/lifecycle.rs:218.LifecycleExtensionis defined atcrates/openshell-driver-vm/src/lifecycle.rs:308.The registry validates names/descriptors at
crates/openshell-driver-vm/src/lifecycle.rs:505, runsconfigure_launchatcrates/openshell-driver-vm/src/lifecycle.rs:553, runsbefore_launchatcrates/openshell-driver-vm/src/lifecycle.rs:591, and runs cleanup/restorehooks at
crates/openshell-driver-vm/src/lifecycle.rs:603,crates/openshell-driver-vm/src/lifecycle.rs:624,crates/openshell-driver-vm/src/lifecycle.rs:637, andcrates/openshell-driver-vm/src/lifecycle.rs:644.The VM driver wires those hooks into provisioning and cleanup. It calls
configure_launcharoundcrates/openshell-driver-vm/src/driver.rs:727, callsbefore_launcharoundcrates/openshell-driver-vm/src/driver.rs:785, invokesafter_restorearoundcrates/openshell-driver-vm/src/driver.rs:963, invokesafter_deletearoundcrates/openshell-driver-vm/src/driver.rs:1043, andinvokes
before_restorearoundcrates/openshell-driver-vm/src/driver.rs:1204.Kubernetes has a natural plan-building boundary but no extension hook around it.
KubernetesComputeDriver::create_sandboxbuilds aDynamicObjectand assignsobj.data = sandbox_to_k8s_spec(...)atcrates/openshell-driver-kubernetes/src/driver.rs:376andcrates/openshell-driver-kubernetes/src/driver.rs:420. The pod template andSandbox CR spec are built in
sandbox_to_k8s_specatcrates/openshell-driver-kubernetes/src/driver.rs:1182andsandbox_template_to_k8satcrates/openshell-driver-kubernetes/src/driver.rs:1249. Kubernetes-specificdriver_configis parsed atcrates/openshell-driver-kubernetes/src/driver.rs:94and applied to pod scheduling/resources at
crates/openshell-driver-kubernetes/src/driver.rs:1525andcrates/openshell-driver-kubernetes/src/driver.rs:1553.Docker and Podman show the same pattern in container-driver form. Docker parses
DockerSandboxDriverConfigatcrates/openshell-driver-docker/src/lib.rs:273,validates templates at
crates/openshell-driver-docker/src/lib.rs:447, andbuilds driver-owned mounts from config at
crates/openshell-driver-docker/src/lib.rs:1656. Podman parsesPodmanSandboxDriverConfigatcrates/openshell-driver-podman/src/container.rs:64, validates create requestsat
crates/openshell-driver-podman/src/driver.rs:290, creates volumes/tokenfiles/container state at
crates/openshell-driver-podman/src/driver.rs:343, andbuilds the libpod container spec in
try_build_container_spec_with_tokenaroundcrates/openshell-driver-podman/src/driver.rs:426.Current Behavior
Today, a deployment-specific integration has three imperfect options:
driver_configif the built-in driver already exposes the exact knob.This works for simple typed driver options but does not support side effects,
external allocation, rollback, or composition.
backend, but too heavy for "Kubernetes plus platform-specific mutation" or
"Podman plus custom mounts/IPAM."
VM-specific patterns independently across Kubernetes/Docker/Podman would make
extension behavior inconsistent.
There is also a config identity problem for wrapper drivers. Public docs state
that
compute_driverscurrently accepts one ofdocker,podman,kubernetes, orvmatdocs/reference/sandbox-compute-drivers.mdx:17anddocs/reference/sandbox-compute-drivers.mdx:24.driver_configis keyed bydriver name and the gateway forwards only the active driver's block. A wrapper
or external driver that delegates to Kubernetes must decide whether it consumes
driver_config.kubernetes,driver_config.<wrapper-name>, or both.What Would Need to Change
The first issue should be an RFC/tracking issue, not an immediate broad
implementation. It should define a common lifecycle extension model that can be
implemented incrementally by individual drivers.
The likely shape:
openshell-coreor a new compute-extension module.configure_launch,before_launch,after_launch_failed,after_delete,before_restore, andafter_restore.validate_request,prepare_plan,before_create,after_create,on_create_failed,before_start,after_ready,before_stop,after_stop,before_delete,after_delete,and
reconcile.typed resources, driver config, policy/tenant/user metadata when available,
and audit/span context. Do not pass raw provider secrets by default.
rather than forcing one universal plan type.
mutation points it supports.
to fully implement
compute_driver.proto, or wrap/delegate to an existingdriver and apply extensions around that driver's plan.
delegated base-driver config, and any policy-controlled extension config must
not be conflated.
a clear native plan boundary and important use cases: namespace selection,
NetworkPolicy, Services, DRA/resource claims, pod mutation, labels,
annotations, placement, warm pools, and IP attribution.
Alternative Approaches Considered
Only extend
driver_config. This is insufficient.driver_configis usefulfor selected-driver schema knobs, but it is request data. It does not model
operator-installed code, external resource allocation, rollback, delete hooks,
capability discovery, or composition.
Only use external compute drivers. This works for complete replacement but
creates unnecessary forks when an integration wants 90 percent of the in-tree
Kubernetes/Docker/Podman/VM behavior.
Copy the VM trait into each driver. This would be fast but would ossify
VM-specific names like
LaunchPlanandbefore_launch. A common contract shouldbe driver-neutral, with VM-specific adapters.
Change
compute_driver.protoimmediately. This may eventually be needed forexternal driver capability discovery, but the first RFC can define in-process
driver composition and capability metadata before committing to protocol fields.
Patterns to Follow
DriverSandboxTemplate.driver_configis already selected by the gateway and validated by the driver.
Kubernetes comment at
crates/openshell-driver-kubernetes/src/driver.rs:88.activation keys, and unsafe config.
order, and idempotent cleanup hooks.
the Docker/Podman/Kubernetes patterns where managed labels and required env
vars override user input.
Proposed Approach
Create an RFC issue under the OpenShell extensibility umbrella that defines
composable lifecycle extensions across compute drivers. The RFC should use the
merged VM lifecycle implementation as reference art, but deliberately generalize
the vocabulary to driver-neutral lifecycle phases and driver-owned plans.
The first implementation target should be Kubernetes because the current driver
already builds a rich native plan before create, and the highest-value use cases
are Kubernetes-native: tenant namespace selection, DRA/resource claims,
NetworkPolicy/Service creation, warm pool/SandboxClaim integration, placement,
labels, annotations, and IP attribution.
The RFC should explicitly support external/specialized drivers reusing existing
drivers. A driver should be able to wrap the in-tree Kubernetes or VM driver and
contribute validation, plan mutation, allocation, and cleanup without
reimplementing the full compute lifecycle.
Scope Assessment
medium for a stable external-driver protocol shape.
on whether the first cut adds only in-process hooks or also extends
compute_driver.proto.featRisks & Open Questions
openshell-core, a new crate, orper-driver adapter traits over a shared context?
GetCapabilities, a new RPC, or only through docs/config in the first cut?after validation, especially identity labels, auth material, and gateway-owned
metadata.
policy-stamped labels. A cross-driver contract needs the same fail-closed
property.
creating a Service, NetworkPolicy, ResourceClaim, or SandboxClaim around the
Sandbox CR?
fully settled?
policy-approved resource class, and driver/extension-created Kubernetes
objects.
mounts, device mounts, or
/proc-visible behavior must account for SELinux andAppArmor. Podman already has SELinux-aware mount handling in
crates/openshell-driver-podman/src/container.rs:17.Test Considerations
ordering, reverse cleanup order, no-op unsupported phases, and duplicate
extension names.
backend validation, and restore behavior.
create, including rejecting attempts to overwrite reserved labels/annotations.
resource but before the main driver create succeeds.
according to the RFC.
compute_driver.protogainscapability discovery fields.
Documentation & Config Impact
architecture/compute-runtimes.mdshould document lifecycle extension phasesand the distinction between driver replacement and driver composition.
docs/reference/sandbox-compute-drivers.mdxshould explain which driverssupport which lifecycle phases.
docs/reference/gateway-config.mdxmust be updated if operator config isadded for extension registration, extension enablement, or external driver
composition.
selected and supported values are fixed. Those sections need updates if
wrapper/composed drivers become selectable.
beyond
SandboxCRs, such asService,NetworkPolicy,ResourceClaim, orSandboxClaim.Related Work
extension point.
and merged implementation.
lifecycle extensions.
Created by spike investigation. Use
build-from-issueto plan and implementafter human review.