diff --git a/architecture/compute-runtimes.md b/architecture/compute-runtimes.md index a193ed622..e598c4904 100644 --- a/architecture/compute-runtimes.md +++ b/architecture/compute-runtimes.md @@ -33,8 +33,8 @@ when a sandbox create request asks for GPU resources. | Docker | Local development with Docker available. | Container plus nested sandbox namespace. | Uses host networking so loopback gateway endpoints work from the supervisor. | | Podman | Rootless or single-machine deployments. | Container plus nested sandbox namespace. | Uses the Podman REST API, OCI image volumes, and CDI GPU devices when available. | | Kubernetes | Cluster deployment through Helm. | Pod plus nested sandbox namespace. | Uses Kubernetes API objects, service accounts, secrets, PVC-backed workspace storage, and GPU resources. | -| VM | Experimental microVM isolation. | Per-sandbox libkrun VM. | Gateway spawns `openshell-driver-vm` as a subprocess over a private, state-local Unix socket. The VM driver boots a cached bootstrap `rootfs.ext4`, prepares requested OCI images inside a bootstrap VM with `umoci`, attaches the prepared image disk read-only, and gives each sandbox a writable `overlay.ext4` for merged-root changes and runtime material. The driver persists each accepted launch request beside the overlay and restarts those VMs on driver startup without recreating the overlay. | -| External | Out-of-tree drivers operated alongside the gateway. | Whatever boundary the driver implements. | Activated by `--compute-driver-socket=` (env `OPENSHELL_COMPUTE_DRIVER_SOCKET`). The gateway connects to a UDS the operator already provisioned, runs `GetCapabilities`, logs the advertised `driver_name`, and dispatches all sandbox lifecycle calls through the same `compute_driver.proto` surface as the in-tree drivers. The driver process and socket lifecycle are operator-owned; the gateway does not spawn, supervise, or remove the driver. The trust boundary is the socket's filesystem permissions — the operator must ensure only the gateway uid can read/write it. | +| VM | Experimental microVM isolation. | Per-sandbox libkrun VM. | Managed endpoint-backed driver. The gateway spawns `openshell-driver-vm`, waits for its Unix socket, and then consumes it through the same remote `compute_driver.proto` path used by unmanaged endpoint drivers. The VM driver boots a cached bootstrap `rootfs.ext4`, prepares requested OCI images inside a bootstrap VM with `umoci`, attaches the prepared image disk read-only, and gives each sandbox a writable `overlay.ext4` for merged-root changes and runtime material. The driver persists each accepted launch request beside the overlay and restarts those VMs on driver startup without recreating the overlay. | +| Extension | Out-of-tree drivers operated alongside the gateway. | Whatever boundary the driver implements. | Selected by a non-reserved custom `compute_drivers = [""]` entry with `[openshell.drivers.].socket_path`, or by `--compute-driver-socket=` as launch-time shorthand for the `extension` driver ID. Reserved built-in names such as `vm`, `docker`, `podman`, and `kubernetes` cannot be used as unmanaged socket endpoints. The gateway connects to a UDS the operator already provisioned, runs `GetCapabilities`, logs the advertised `driver_name`, and dispatches all sandbox lifecycle calls through `compute_driver.proto`. The driver process and socket lifecycle are operator-owned; the gateway does not spawn, supervise, or remove unmanaged extension drivers. The trust boundary is the socket's filesystem permissions: the operator must ensure only the gateway uid can read/write it. | Per-sandbox CPU and memory values currently enter the driver layer through template resource limits. Docker and Podman apply them as runtime limits. @@ -79,7 +79,7 @@ The supervisor must be available inside each sandbox workload: | Podman | Read-only OCI image volume containing the supervisor binary. | | Kubernetes | Sandbox pod image or pod template configuration. | | VM | Embedded in the guest rootfs bundle. | -| External | Defined by the out-of-tree driver. | +| Extension | Defined by the out-of-tree driver. | Driver-controlled environment variables must override sandbox image or template values for sandbox ID, sandbox name, gateway endpoint, relay socket path, TLS diff --git a/crates/openshell-core/src/config.rs b/crates/openshell-core/src/config.rs index 593621515..c66d32610 100644 --- a/crates/openshell-core/src/config.rs +++ b/crates/openshell-core/src/config.rs @@ -4,6 +4,7 @@ //! Configuration management for `OpenShell` components. use serde::{Deserialize, Serialize}; +use std::collections::BTreeMap; use std::fmt; #[cfg(unix)] use std::io::{Read, Write}; @@ -69,6 +70,27 @@ impl ComputeDriverKind { } } +/// Normalize a configured compute driver name. +/// +/// Built-in driver names and custom remote driver names share the same +/// selection namespace. The normalized value is lowercase ASCII and may contain +/// letters, digits, `-`, and `_`. +pub fn normalize_compute_driver_name(value: &str) -> Result { + let value = value.trim(); + if value.is_empty() { + return Err("compute driver name cannot be empty".to_string()); + } + if !value + .bytes() + .all(|b| b.is_ascii_alphanumeric() || matches!(b, b'-' | b'_')) + { + return Err(format!( + "invalid compute driver name '{value}'. use ASCII letters, digits, '-' or '_'" + )); + } + Ok(value.to_ascii_lowercase()) +} + impl fmt::Display for ComputeDriverKind { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { f.write_str(self.as_str()) @@ -358,13 +380,14 @@ pub struct Config { /// The config shape allows multiple drivers so the gateway can evolve /// toward multi-backend routing. Current releases require exactly one /// configured driver. - pub compute_drivers: Vec, + pub compute_drivers: Vec, - /// When set, the gateway dispatches sandbox lifecycle to an out-of-tree - /// compute driver process listening on this Unix domain socket and - /// speaking `compute_driver.proto`. Takes precedence over - /// `compute_drivers` and the auto-detection probe. - pub external_compute_driver_socket: Option, + /// Operator-provided endpoints for named remote compute drivers. + /// + /// This is populated by CLI/env inputs such as `--compute-driver-socket`. + /// TOML-authored endpoints live under `[openshell.drivers.]` and are + /// resolved by the gateway config loader. + pub compute_driver_endpoints: BTreeMap, /// TTL for SSH session tokens, in seconds. 0 disables expiry. pub ssh_session_ttl_secs: u64, @@ -565,7 +588,7 @@ impl Config { gateway_jwt: None, database_url: String::new(), compute_drivers: vec![], - external_compute_driver_socket: None, + compute_driver_endpoints: BTreeMap::new(), ssh_session_ttl_secs: default_ssh_session_ttl_secs(), grpc_rate_limit_requests: None, grpc_rate_limit_window_secs: None, @@ -621,18 +644,27 @@ impl Config { /// Create a new configuration with the configured compute drivers. #[must_use] - pub fn with_compute_drivers(mut self, drivers: I) -> Self + pub fn with_compute_drivers(mut self, drivers: I) -> Self where - I: IntoIterator, + I: IntoIterator, + D: ToString, { - self.compute_drivers = drivers.into_iter().collect(); + self.compute_drivers = drivers + .into_iter() + .map(|driver| driver.to_string()) + .collect(); self } - /// Pin an external compute driver by Unix domain socket path. + /// Register a Unix domain socket endpoint for a named remote driver. #[must_use] - pub fn with_external_compute_driver_socket(mut self, socket: Option) -> Self { - self.external_compute_driver_socket = socket; + pub fn with_compute_driver_endpoint( + mut self, + name: impl Into, + socket: impl Into, + ) -> Self { + self.compute_driver_endpoints + .insert(name.into(), socket.into()); self } @@ -780,8 +812,8 @@ mod tests { use super::is_reachable_unix_socket; use super::{ ComputeDriverKind, Config, DEFAULT_SERVICE_ROUTING_DOMAIN, GatewayJwtConfig, detect_driver, - docker_host_unix_socket_path, is_unix_socket, podman_socket_candidates_from_env, - podman_socket_responds, + docker_host_unix_socket_path, is_unix_socket, normalize_compute_driver_name, + podman_socket_candidates_from_env, podman_socket_responds, }; #[cfg(unix)] use std::io::{Read as _, Write as _}; @@ -817,6 +849,18 @@ mod tests { assert!(err.contains("unsupported compute driver 'firecracker'")); } + #[test] + fn compute_driver_name_normalization_accepts_builtin_and_custom_names() { + assert_eq!(normalize_compute_driver_name(" VM ").unwrap(), "vm"); + assert_eq!( + normalize_compute_driver_name("Kyma_GPU-1").unwrap(), + "kyma_gpu-1" + ); + + let err = normalize_compute_driver_name("kyma/gpu").unwrap_err(); + assert!(err.contains("invalid compute driver name")); + } + #[test] fn config_defaults_to_loopback_bind_address() { let expected: SocketAddr = "127.0.0.1:17670".parse().expect("valid address"); diff --git a/crates/openshell-server/src/cli.rs b/crates/openshell-server/src/cli.rs index e78fec2a3..19949c514 100644 --- a/crates/openshell-server/src/cli.rs +++ b/crates/openshell-server/src/cli.rs @@ -109,15 +109,15 @@ struct RunArgs { value_delimiter = ',', value_parser = parse_compute_driver )] - drivers: Vec, + drivers: Vec, - /// Path to a Unix domain socket served by an out-of-tree compute driver + /// Path to a Unix domain socket served by a remote compute driver /// implementing `compute_driver.proto`. /// - /// When set, the gateway dispatches sandbox lifecycle to that driver - /// instead of one of the in-tree backends, skipping both the `--drivers` - /// list and the auto-detection probe. The driver name advertised in - /// `GetCapabilities` is logged for diagnostics. + /// When set, the socket is associated with the single configured driver + /// name. If no driver name is configured, the gateway uses `extension`. + /// Reserved built-in driver names such as Docker, Podman, Kubernetes, and + /// VM do not accept socket endpoints. #[arg(long, env = "OPENSHELL_COMPUTE_DRIVER_SOCKET")] compute_driver_socket: Option, @@ -245,6 +245,7 @@ async fn run_from_args(mut args: RunArgs, matches: ArgMatches) -> Result<()> { if let Some(file) = file.as_ref() { merge_file_into_args(&mut args, &file.openshell.gateway, &matches); } + normalize_compute_driver_socket_args(&mut args)?; let local_tls = apply_runtime_defaults(&mut args)?; let local_jwt = defaults::complete_local_jwt_config()?; @@ -375,13 +376,19 @@ async fn run_from_args(mut args: RunArgs, matches: ArgMatches) -> Result<()> { args.grpc_rate_limit_requests, args.grpc_rate_limit_window_seconds, ) - .with_external_compute_driver_socket(args.compute_driver_socket.clone()) .with_server_sans(args.server_sans.clone()) .with_loopback_service_http(args.enable_loopback_service_http); validate_grpc_rate_limit_args( args.grpc_rate_limit_requests, args.grpc_rate_limit_window_seconds, )?; + if let Some(socket) = args.compute_driver_socket.clone() { + let driver = args + .drivers + .first() + .expect("normalize_compute_driver_socket_args sets a driver for socket endpoints"); + config = config.with_compute_driver_endpoint(driver.clone(), socket); + } if let Some(ttl) = file .as_ref() @@ -468,8 +475,8 @@ async fn run_from_args(mut args: RunArgs, matches: ArgMatches) -> Result<()> { .into_diagnostic() } -fn parse_compute_driver(value: &str) -> std::result::Result { - value.parse() +fn parse_compute_driver(value: &str) -> std::result::Result { + openshell_core::config::normalize_compute_driver_name(value) } fn resolve_config_path(args: &RunArgs) -> Result> { @@ -668,16 +675,51 @@ fn validate_grpc_rate_limit_args(requests: Option, window_seconds: Option Option { - // An external-driver socket pins dispatch to the out-of-tree path and - // bypasses both the `--drivers` list and auto-detection probe; callers - // that key off the in-tree `ComputeDriverKind` get `None` here. - if args.compute_driver_socket.is_some() { - return None; +fn normalize_compute_driver_socket_args(args: &mut RunArgs) -> Result<()> { + let Some(socket) = args.compute_driver_socket.as_ref() else { + return Ok(()); + }; + if socket.as_os_str().is_empty() { + return Err(miette::miette!( + "--compute-driver-socket must not be an empty path" + )); + } + + match args.drivers.as_slice() { + [] => { + args.drivers.push("extension".to_string()); + Ok(()) + } + [driver] => { + let driver = openshell_core::config::normalize_compute_driver_name(driver) + .map_err(|err| miette::miette!("{err}"))?; + if matches!( + driver.parse::().ok(), + Some( + ComputeDriverKind::Docker + | ComputeDriverKind::Podman + | ComputeDriverKind::Kubernetes + | ComputeDriverKind::Vm + ) + ) { + return Err(miette::miette!( + "--compute-driver-socket cannot be combined with reserved built-in compute driver '{driver}'" + )); + } + args.drivers[0] = driver; + Ok(()) + } + drivers => Err(miette::miette!( + "--compute-driver-socket requires exactly one compute driver name, got: {}", + drivers.join(",") + )), } +} + +fn effective_single_driver(args: &RunArgs) -> Option { match args.drivers.as_slice() { [] => openshell_core::config::detect_driver(), - [driver] => Some(*driver), + [driver] => driver.parse().ok(), _ => None, } } @@ -1585,41 +1627,67 @@ ssh_session_ttl_secs = 1234 .unwrap_or_else(std::sync::PoisonError::into_inner); let _g = EnvVarGuard::remove("OPENSHELL_COMPUTE_DRIVER_SOCKET"); - let (args, _) = parse_with_args(&[ + let (mut args, _) = parse_with_args(&[ "openshell-gateway", "--db-url", "sqlite::memory:", "--compute-driver-socket", - "/run/openshell/external.sock", + "/run/openshell/extension.sock", ]); + super::normalize_compute_driver_socket_args(&mut args).unwrap(); assert_eq!( args.compute_driver_socket.as_deref(), - Some(std::path::Path::new("/run/openshell/external.sock")) + Some(std::path::Path::new("/run/openshell/extension.sock")) ); - // External socket pins dispatch off the in-tree enum, so the - // single-driver helper must return None even when no --drivers given. + assert_eq!(args.drivers, ["extension"]); assert!(super::effective_single_driver(&args).is_none()); } #[test] - fn compute_driver_socket_overrides_drivers_flag() { + fn compute_driver_socket_rejects_reserved_builtin_drivers() { let _lock = ENV_LOCK .lock() .unwrap_or_else(std::sync::PoisonError::into_inner); let _g = EnvVarGuard::remove("OPENSHELL_COMPUTE_DRIVER_SOCKET"); - let (args, _) = parse_with_args(&[ + let (mut args, _) = parse_with_args(&[ "openshell-gateway", "--db-url", "sqlite::memory:", "--drivers", "docker", "--compute-driver-socket", - "/run/openshell/external.sock", + "/run/openshell/extension.sock", + ]); + let err = super::normalize_compute_driver_socket_args(&mut args).unwrap_err(); + assert!( + err.to_string() + .contains("cannot be combined with reserved built-in compute driver 'docker'"), + "unexpected error: {err}" + ); + } + + #[test] + fn compute_driver_socket_rejects_vm_endpoint() { + let _lock = ENV_LOCK + .lock() + .unwrap_or_else(std::sync::PoisonError::into_inner); + let _g = EnvVarGuard::remove("OPENSHELL_COMPUTE_DRIVER_SOCKET"); + + let (mut args, _) = parse_with_args(&[ + "openshell-gateway", + "--db-url", + "sqlite::memory:", + "--drivers", + "vm", + "--compute-driver-socket", + "/run/openshell/vm.sock", ]); + let err = super::normalize_compute_driver_socket_args(&mut args).unwrap_err(); assert!( - super::effective_single_driver(&args).is_none(), - "external socket must short-circuit --drivers" + err.to_string() + .contains("cannot be combined with reserved built-in compute driver 'vm'"), + "unexpected error: {err}" ); } @@ -1630,14 +1698,16 @@ ssh_session_ttl_secs = 1234 .unwrap_or_else(std::sync::PoisonError::into_inner); let _g = EnvVarGuard::set( "OPENSHELL_COMPUTE_DRIVER_SOCKET", - "/var/run/openshell/external.sock", + "/var/run/openshell/extension.sock", ); - let (args, _) = parse_with_args(&["openshell-gateway", "--db-url", "sqlite::memory:"]); + let (mut args, _) = parse_with_args(&["openshell-gateway", "--db-url", "sqlite::memory:"]); + super::normalize_compute_driver_socket_args(&mut args).unwrap(); assert_eq!( args.compute_driver_socket.as_deref(), - Some(std::path::Path::new("/var/run/openshell/external.sock")) + Some(std::path::Path::new("/var/run/openshell/extension.sock")) ); + assert_eq!(args.drivers, ["extension"]); } #[test] diff --git a/crates/openshell-server/src/compute/mod.rs b/crates/openshell-server/src/compute/mod.rs index 7ef654926..933f29a59 100644 --- a/crates/openshell-server/src/compute/mod.rs +++ b/crates/openshell-server/src/compute/mod.rs @@ -128,6 +128,35 @@ impl Drop for ManagedDriverProcess { } } +#[derive(Debug)] +pub struct AcquiredRemoteDriverEndpoint { + pub(crate) name: String, + pub(crate) channel: Channel, + pub(crate) driver_process: Option>, +} + +impl AcquiredRemoteDriverEndpoint { + pub(crate) fn managed_builtin( + driver_kind: ComputeDriverKind, + channel: Channel, + driver_process: Arc, + ) -> Self { + Self { + name: driver_kind.as_str().to_string(), + channel, + driver_process: Some(driver_process), + } + } + + pub(crate) fn unmanaged(name: impl Into, channel: Channel) -> Self { + Self { + name: name.into(), + channel, + driver_process: None, + } + } +} + #[derive(Debug, Clone)] struct RemoteComputeDriver { channel: Channel, @@ -226,7 +255,7 @@ impl ComputeDriver for RemoteComputeDriver { #[derive(Clone)] pub struct ComputeRuntime { driver: SharedComputeDriver, - driver_kind: Option, + driver_name: String, shutdown_cleanup: Option>, startup_resume: Option>, _driver_process: Option>, @@ -250,7 +279,7 @@ impl fmt::Debug for ComputeRuntime { impl ComputeRuntime { #[allow(clippy::too_many_arguments)] async fn from_driver( - driver_kind: Option, + driver_name: String, driver: SharedComputeDriver, shutdown_cleanup: Option>, startup_resume: Option>, @@ -268,19 +297,17 @@ impl ComputeRuntime { .await .map_err(compute_error_from_status)? .into_inner(); - // For out-of-tree drivers (driver_kind = None), log the name the - // driver advertises in GetCapabilities so operators can confirm - // the gateway is talking to the driver they expect. - if driver_kind.is_none() { - info!( - driver_name = %capabilities.driver_name, - "External compute driver connected" - ); - } + let driver_kind = driver_name.parse::().ok(); + info!( + configured_driver = %driver_name, + advertised_driver = %capabilities.driver_name, + remote = driver_kind.is_none(), + "Compute driver connected" + ); let default_image = capabilities.default_image; Ok(Self { driver, - driver_kind, + driver_name, shutdown_cleanup, startup_resume, _driver_process: driver_process, @@ -325,7 +352,7 @@ impl ComputeRuntime { let startup_resume: Arc = driver.clone(); let driver: SharedComputeDriver = driver; Self::from_driver( - Some(ComputeDriverKind::Docker), + ComputeDriverKind::Docker.as_str().to_string(), driver, Some(shutdown_cleanup), Some(startup_resume), @@ -354,7 +381,7 @@ impl ComputeRuntime { .map_err(|err| ComputeError::Message(err.to_string()))?; let driver: SharedComputeDriver = Arc::new(ComputeDriverService::new(driver)); Self::from_driver( - Some(ComputeDriverKind::Kubernetes), + ComputeDriverKind::Kubernetes.as_str().to_string(), driver, None, None, @@ -370,55 +397,21 @@ impl ComputeRuntime { .await } - pub(crate) async fn new_remote_vm( - channel: Channel, - driver_process: Option>, - store: Arc, - sandbox_index: SandboxIndex, - sandbox_watch_bus: SandboxWatchBus, - tracing_log_bus: TracingLogBus, - supervisor_sessions: Arc, - ) -> Result { - let driver: SharedComputeDriver = Arc::new(RemoteComputeDriver::new(channel)); - Self::from_driver( - Some(ComputeDriverKind::Vm), - driver, - None, - None, - driver_process, - store, - sandbox_index, - sandbox_watch_bus, - tracing_log_bus, - supervisor_sessions, - true, - Vec::new(), - ) - .await - } - - /// Construct a runtime that proxies all sandbox lifecycle to an - /// out-of-tree compute driver listening on a pre-existing UDS endpoint. - /// - /// The driver process is operator-managed (not spawned by the gateway), - /// so no [`ManagedDriverProcess`] handle is attached. The advertised - /// `driver_name` from `GetCapabilities` is logged for diagnostics by - /// [`Self::from_driver`]. - pub(crate) async fn new_remote_external( - channel: Channel, + pub(crate) async fn new_remote_driver( + endpoint: AcquiredRemoteDriverEndpoint, store: Arc, sandbox_index: SandboxIndex, sandbox_watch_bus: SandboxWatchBus, tracing_log_bus: TracingLogBus, supervisor_sessions: Arc, ) -> Result { - let driver: SharedComputeDriver = Arc::new(RemoteComputeDriver::new(channel)); + let driver: SharedComputeDriver = Arc::new(RemoteComputeDriver::new(endpoint.channel)); Self::from_driver( - None, + endpoint.name, driver, None, None, - None, + endpoint.driver_process, store, sandbox_index, sandbox_watch_bus, @@ -443,7 +436,7 @@ impl ComputeRuntime { .map_err(|err| ComputeError::Message(err.to_string()))?; let driver: SharedComputeDriver = Arc::new(PodmanDriverService::new(driver)); Self::from_driver( - Some(ComputeDriverKind::Podman), + ComputeDriverKind::Podman.as_str().to_string(), driver, None, None, @@ -466,7 +459,7 @@ impl ComputeRuntime { #[must_use] pub fn driver_kind(&self) -> Option { - self.driver_kind + self.driver_name.parse().ok() } #[must_use] @@ -476,7 +469,7 @@ impl ComputeRuntime { pub async fn validate_sandbox_create(&self, sandbox: &Sandbox) -> Result<(), Status> { let driver_sandbox = - driver_sandbox_from_public(sandbox, self.driver_kind).map_err(|status| *status)?; + driver_sandbox_from_public(sandbox, &self.driver_name).map_err(|status| *status)?; self.driver .validate_sandbox_create(Request::new(ValidateSandboxCreateRequest { sandbox: Some(driver_sandbox), @@ -492,7 +485,7 @@ impl ComputeRuntime { ) -> Result { let sandbox_id = sandbox.object_id().to_string(); let mut driver_sandbox = - driver_sandbox_from_public(&sandbox, self.driver_kind).map_err(|status| *status)?; + driver_sandbox_from_public(&sandbox, &self.driver_name).map_err(|status| *status)?; // Create with MustCreate condition to prevent duplicate creation race self.sandbox_index.update_from_sandbox(&sandbox); @@ -1418,18 +1411,21 @@ impl ComputeRuntime { } } -/// Connect to an out-of-tree compute driver that is already listening on -/// `socket_path` and return a tonic `Channel` speaking `compute_driver.proto`. +/// Connect to an unmanaged remote compute driver that is already listening on +/// `socket_path` and return the acquired endpoint. /// /// The gateway does not spawn or own the driver process — the operator is /// responsible for placing the driver alongside the gateway and granting the /// gateway uid read/write on the socket. The host portion of the URL is /// ignored because the connector resolves to the UDS rather than DNS. #[cfg(unix)] -pub async fn connect_external_compute_driver(socket_path: &Path) -> Result { +pub async fn connect_remote_compute_driver( + name: impl Into, + socket_path: &Path, +) -> Result { let socket_path: PathBuf = socket_path.to_path_buf(); let display_path = socket_path.clone(); - Endpoint::from_static("http://[::]:50051") + let channel = Endpoint::from_static("http://[::]:50051") .connect_with_connector(service_fn(move |_: tonic::transport::Uri| { let socket_path = socket_path.clone(); async move { UnixStream::connect(socket_path).await.map(TokioIo::new) } @@ -1437,22 +1433,26 @@ pub async fn connect_external_compute_driver(socket_path: &Path) -> Result Result { +pub async fn connect_remote_compute_driver( + _name: impl Into, + _socket_path: &Path, +) -> Result { Err(ComputeError::Message( - "the external compute driver requires unix domain socket support".to_string(), + "remote compute driver endpoints require unix domain socket support".to_string(), )) } fn driver_sandbox_from_public( sandbox: &Sandbox, - driver_kind: Option, + driver_name: &str, ) -> Result> { Ok(DriverSandbox { id: sandbox.object_id().to_string(), @@ -1461,7 +1461,7 @@ fn driver_sandbox_from_public( spec: sandbox .spec .as_ref() - .map(|spec| driver_sandbox_spec_from_public(spec, driver_kind)) + .map(|spec| driver_sandbox_spec_from_public(spec, driver_name)) .transpose()?, status: sandbox.status.as_ref().map(driver_status_from_public), }) @@ -1469,7 +1469,7 @@ fn driver_sandbox_from_public( fn driver_sandbox_spec_from_public( spec: &SandboxSpec, - driver_kind: Option, + driver_name: &str, ) -> Result> { Ok(DriverSandboxSpec { log_level: spec.log_level.clone(), @@ -1477,7 +1477,7 @@ fn driver_sandbox_spec_from_public( template: spec .template .as_ref() - .map(|template| driver_sandbox_template_from_public(template, driver_kind)) + .map(|template| driver_sandbox_template_from_public(template, driver_name)) .transpose()?, gpu: spec.gpu, sandbox_token: String::new(), @@ -1486,7 +1486,7 @@ fn driver_sandbox_spec_from_public( fn driver_sandbox_template_from_public( template: &SandboxTemplate, - driver_kind: Option, + driver_name: &str, ) -> Result> { Ok(DriverSandboxTemplate { image: template.image.clone(), @@ -1495,21 +1495,17 @@ fn driver_sandbox_template_from_public( environment: template.environment.clone(), resources: extract_typed_resources(&template.resources), platform_config: build_platform_config(template), - driver_config: select_driver_config(&template.driver_config, driver_kind)?, + driver_config: select_driver_config(&template.driver_config, driver_name)?, }) } fn select_driver_config( config: &Option, - driver_kind: Option, + driver_name: &str, ) -> Result, Box> { let Some(config) = config else { return Ok(None); }; - let Some(driver_kind) = driver_kind else { - return Ok(None); - }; - let driver_name = driver_kind.as_str(); let Some(value) = config.fields.get(driver_name) else { return Ok(None); }; @@ -2004,7 +2000,7 @@ impl ComputeDriver for NoopTestDriver { pub async fn new_test_runtime(store: Arc) -> ComputeRuntime { ComputeRuntime { driver: Arc::new(NoopTestDriver), - driver_kind: None, + driver_name: "test".to_string(), shutdown_cleanup: None, startup_resume: None, _driver_process: None, @@ -2074,8 +2070,7 @@ mod tests { .collect(), }; - let selected = - select_driver_config(&Some(config), Some(ComputeDriverKind::Kubernetes)).unwrap(); + let selected = select_driver_config(&Some(config), "kubernetes").unwrap(); let selected = selected.expect("kubernetes config should be selected"); assert!(selected.fields.contains_key("node")); @@ -2092,12 +2087,27 @@ mod tests { .collect(), }; - let selected = - select_driver_config(&Some(config), Some(ComputeDriverKind::Kubernetes)).unwrap(); + let selected = select_driver_config(&Some(config), "kubernetes").unwrap(); assert!(selected.is_none()); } + #[test] + fn select_driver_config_forwards_named_remote_driver_block() { + let config = prost_types::Struct { + fields: std::iter::once(( + "kyma".to_string(), + struct_value([("pool", string_value("gpu"))]), + )) + .collect(), + }; + + let selected = select_driver_config(&Some(config), "kyma").unwrap(); + let selected = selected.expect("named remote config should be selected"); + + assert!(selected.fields.contains_key("pool")); + } + #[test] fn select_driver_config_rejects_non_object_matching_driver_block() { let config = prost_types::Struct { @@ -2105,8 +2115,7 @@ mod tests { .collect(), }; - let err = - select_driver_config(&Some(config), Some(ComputeDriverKind::Kubernetes)).unwrap_err(); + let err = select_driver_config(&Some(config), "kubernetes").unwrap_err(); assert_eq!(err.code(), Code::InvalidArgument); assert!(err.message().contains("template.driver_config.kubernetes")); @@ -2226,7 +2235,7 @@ mod tests { let store = Arc::new(Store::connect("sqlite::memory:").await.unwrap()); ComputeRuntime { driver, - driver_kind: None, + driver_name: "test-driver".to_string(), shutdown_cleanup: None, startup_resume, _driver_process: None, diff --git a/crates/openshell-server/src/compute/vm.rs b/crates/openshell-server/src/compute/vm.rs index efdc9daab..be88047f3 100644 --- a/crates/openshell-server/src/compute/vm.rs +++ b/crates/openshell-server/src/compute/vm.rs @@ -29,6 +29,7 @@ //! trait implementation registering the VM driver against the generic //! interface. +use super::AcquiredRemoteDriverEndpoint; #[cfg(unix)] use super::ManagedDriverProcess; #[cfg(unix)] @@ -37,7 +38,7 @@ use hyper_util::rt::TokioIo; use openshell_core::proto::compute::v1::{ GetCapabilitiesRequest, compute_driver_client::ComputeDriverClient, }; -use openshell_core::{Config, Error, Result}; +use openshell_core::{ComputeDriverKind, Config, Error, Result}; #[cfg(unix)] use std::os::unix::fs::{FileTypeExt, MetadataExt, PermissionsExt}; #[cfg(unix)] @@ -451,7 +452,7 @@ pub fn compute_driver_guest_tls_paths( pub async fn spawn( config: &Config, vm_config: &VmComputeConfig, -) -> Result<(Channel, Arc)> { +) -> Result { if vm_config.grpc_endpoint.trim().is_empty() { return Err(Error::config( "grpc_endpoint is required when using the vm compute driver", @@ -507,14 +508,18 @@ pub async fn spawn( })?; let channel = wait_for_compute_driver(&socket_path, &mut child).await?; let process = Arc::new(ManagedDriverProcess::new(child, socket_path)); - Ok((channel, process)) + Ok(AcquiredRemoteDriverEndpoint::managed_builtin( + ComputeDriverKind::Vm, + channel, + process, + )) } #[cfg(not(unix))] pub async fn spawn( _config: &Config, _vm_config: &VmComputeConfig, -) -> Result<(Channel, std::sync::Arc)> { +) -> Result { Err(Error::config( "the vm compute driver requires unix domain socket support", )) diff --git a/crates/openshell-server/src/config_file.rs b/crates/openshell-server/src/config_file.rs index 39cf02bba..3875756dc 100644 --- a/crates/openshell-server/src/config_file.rs +++ b/crates/openshell-server/src/config_file.rs @@ -87,7 +87,7 @@ pub struct GatewayFileSection { // ── Drivers ────────────────────────────────────────────────────────── #[serde(default)] - pub compute_drivers: Option>, + pub compute_drivers: Option>, // ── Sandbox / SSH ──────────────────────────────────────────────────── #[serde(default)] @@ -631,7 +631,7 @@ version = 2 .expect("compute_drivers must be explicitly set in the RPM default config"); assert_eq!( drivers, - &[ComputeDriverKind::Podman], + &["podman".to_string()], "RPM default must pin compute_drivers to [podman] to prevent unexpected \ driver selection when Docker is also installed" ); diff --git a/crates/openshell-server/src/lib.rs b/crates/openshell-server/src/lib.rs index 97bb1673e..2c8893c1a 100644 --- a/crates/openshell-server/src/lib.rs +++ b/crates/openshell-server/src/lib.rs @@ -10,14 +10,18 @@ //! - mTLS support //! //! TODO(driver-abstraction): `build_compute_runtime` still switches on -//! [`ComputeDriverKind`] and calls driver-specific constructors -//! ([`ComputeRuntime::new_kubernetes`], [`compute::vm::spawn`] + -//! [`ComputeRuntime::new_remote_vm`]). Once we have a generalized compute -//! driver interface, the per-arm wiring here should collapse to a single -//! driver-agnostic path that asks each registered driver to produce a -//! [`Channel`](tonic::transport::Channel) and hands the rest of the gateway a -//! uniform [`ComputeRuntime`]. The remaining VM plumbing now lives in -//! [`compute::vm`]; keep this file driver-agnostic going forward. +//! built-in driver names and calls driver-specific constructors +//! ([`ComputeRuntime::new_kubernetes`], [`ComputeRuntime::new_docker`], +//! [`compute::vm::spawn`] + [`ComputeRuntime::new_remote_driver`], +//! [`ComputeRuntime::new_podman`]). Endpoint-backed drivers now share the +//! remote `compute_driver.proto` path, so new remote drivers should enter +//! through named endpoint acquisition rather than gateway-wide socket side +//! channels. Once we have a generalized compute-driver registry, the remaining +//! per-arm wiring here should collapse to driver construction records that +//! produce either an in-process `SharedComputeDriver` or an acquired remote +//! endpoint, then hand the rest of the gateway a uniform [`ComputeRuntime`]. +//! The VM launch plumbing now lives in [`compute::vm`]; keep this file limited +//! to selecting and acquiring drivers. mod auth; pub mod certgen; @@ -45,6 +49,7 @@ mod ws_tunnel; use metrics_exporter_prometheus::PrometheusBuilder; use openshell_core::{ComputeDriverKind, Config, Error, Result}; +use serde::Deserialize; use std::collections::HashMap; use std::io::ErrorKind; use std::net::SocketAddr; @@ -709,32 +714,14 @@ async fn build_compute_runtime( tracing_log_bus: TracingLogBus, supervisor_sessions: Arc, ) -> Result { - if let Some(socket_path) = config.external_compute_driver_socket.as_deref() { - info!( - socket = %socket_path.display(), - "Using external compute driver" - ); - let channel = compute::connect_external_compute_driver(socket_path) - .await - .map_err(|e| Error::execution(format!("failed to create compute runtime: {e}")))?; - return ComputeRuntime::new_remote_external( - channel, - store, - sandbox_index, - sandbox_watch_bus, - tracing_log_bus, - supervisor_sessions, - ) - .await - .map_err(|e| Error::execution(format!("failed to create compute runtime: {e}"))); + let driver = configured_compute_driver(config, file)?; + info!(driver = %driver.name(), "Using compute driver"); + if let ConfiguredComputeDriver::Builtin(kind) = &driver { + warn_if_kubernetes_sandbox_jwt_expiry_disabled(config, *kind); } - let driver = configured_compute_driver(config)?; - info!(driver = %driver, "Using compute driver"); - warn_if_kubernetes_sandbox_jwt_expiry_disabled(config, driver); - match driver { - ComputeDriverKind::Kubernetes => { + ConfiguredComputeDriver::Builtin(ComputeDriverKind::Kubernetes) => { let mut k8s = kubernetes_config_from_file(file)?; if let Ok(size) = std::env::var("OPENSHELL_K8S_WORKSPACE_DEFAULT_STORAGE_SIZE") { k8s.workspace_default_storage_size = size; @@ -750,7 +737,7 @@ async fn build_compute_runtime( .await .map_err(|e| Error::execution(format!("failed to create compute runtime: {e}"))) } - ComputeDriverKind::Docker => ComputeRuntime::new_docker( + ConfiguredComputeDriver::Builtin(ComputeDriverKind::Docker) => ComputeRuntime::new_docker( config.clone(), docker_config.clone(), store, @@ -761,21 +748,7 @@ async fn build_compute_runtime( ) .await .map_err(|e| Error::execution(format!("failed to create compute runtime: {e}"))), - ComputeDriverKind::Vm => { - let (channel, driver_process) = compute::vm::spawn(config, vm_config).await?; - ComputeRuntime::new_remote_vm( - channel, - Some(driver_process), - store, - sandbox_index, - sandbox_watch_bus, - tracing_log_bus, - supervisor_sessions, - ) - .await - .map_err(|e| Error::execution(format!("failed to create compute runtime: {e}"))) - } - ComputeDriverKind::Podman => { + ConfiguredComputeDriver::Builtin(ComputeDriverKind::Podman) => { let mut podman = podman_config_from_file(file)?; podman.gateway_port = config.bind_address.port(); if let Ok(p) = std::env::var("OPENSHELL_PODMAN_SOCKET") { @@ -797,6 +770,40 @@ async fn build_compute_runtime( .await .map_err(|e| Error::execution(format!("failed to create compute runtime: {e}"))) } + ConfiguredComputeDriver::Builtin(ComputeDriverKind::Vm) => { + let endpoint = compute::vm::spawn(config, vm_config).await?; + ComputeRuntime::new_remote_driver( + endpoint, + store, + sandbox_index, + sandbox_watch_bus, + tracing_log_bus, + supervisor_sessions, + ) + .await + .map_err(|e| Error::execution(format!("failed to create compute runtime: {e}"))) + } + ConfiguredComputeDriver::Remote(remote) => { + let RemoteComputeDriverSelection { name, socket_path } = remote; + info!( + driver = %name, + socket = %socket_path.display(), + "Using remote compute driver endpoint" + ); + let endpoint = compute::connect_remote_compute_driver(name, &socket_path) + .await + .map_err(|e| Error::execution(format!("failed to create compute runtime: {e}")))?; + ComputeRuntime::new_remote_driver( + endpoint, + store, + sandbox_index, + sandbox_watch_bus, + tracing_log_bus, + supervisor_sessions, + ) + .await + .map_err(|e| Error::execution(format!("failed to create compute runtime: {e}"))) + } } } @@ -876,35 +883,117 @@ fn apply_podman_local_tls_defaults( Ok(()) } -fn configured_compute_driver(config: &Config) -> Result { +#[derive(Debug, Clone)] +enum ConfiguredComputeDriver { + Builtin(ComputeDriverKind), + Remote(RemoteComputeDriverSelection), +} + +impl ConfiguredComputeDriver { + fn name(&self) -> &str { + match self { + Self::Builtin(kind) => kind.as_str(), + Self::Remote(remote) => &remote.name, + } + } +} + +#[derive(Debug, Clone)] +struct RemoteComputeDriverSelection { + name: String, + socket_path: PathBuf, +} + +#[derive(Debug, Deserialize)] +#[serde(deny_unknown_fields)] +struct RemoteComputeDriverConfig { + socket_path: PathBuf, +} + +fn configured_compute_driver( + config: &Config, + file: Option<&config_file::ConfigFile>, +) -> Result { match config.compute_drivers.as_slice() { [] => match openshell_core::config::detect_driver() { Some(ComputeDriverKind::Vm) => Err(Error::config( "vm compute driver is opt-in only; set --drivers vm or OPENSHELL_DRIVERS=vm", )), - Some(driver) => Ok(driver), + Some(driver) => Ok(ConfiguredComputeDriver::Builtin(driver)), None => Err(Error::config( "no compute driver configured and auto-detection found no suitable driver; \ set --drivers or OPENSHELL_DRIVERS to kubernetes, podman, docker, or vm", )), }, - [ - driver @ (ComputeDriverKind::Kubernetes - | ComputeDriverKind::Vm - | ComputeDriverKind::Docker - | ComputeDriverKind::Podman), - ] => Ok(*driver), + [driver] => resolve_configured_compute_driver(driver, config, file), drivers => Err(Error::config(format!( "multiple compute drivers are not supported yet; configured drivers: {}", - drivers - .iter() - .map(ToString::to_string) - .collect::>() - .join(",") + drivers.join(",") ))), } } +fn resolve_configured_compute_driver( + driver_name: &str, + config: &Config, + file: Option<&config_file::ConfigFile>, +) -> Result { + let name = openshell_core::config::normalize_compute_driver_name(driver_name) + .map_err(Error::config)?; + let driver_kind = builtin_compute_driver(&name); + if let Some(socket_path) = config.compute_driver_endpoints.get(&name) { + if driver_kind.is_some() { + return Err(Error::config(format!( + "compute driver '{name}' is a reserved built-in driver and cannot be selected with a socket endpoint" + ))); + } + return Ok(ConfiguredComputeDriver::Remote( + RemoteComputeDriverSelection { + name, + socket_path: socket_path.clone(), + }, + )); + } + + if let Some(kind) = driver_kind { + return Ok(ConfiguredComputeDriver::Builtin(kind)); + } + + let socket_path = remote_driver_socket_from_file(&name, file)?; + Ok(ConfiguredComputeDriver::Remote( + RemoteComputeDriverSelection { name, socket_path }, + )) +} + +fn builtin_compute_driver(name: &str) -> Option { + name.parse().ok() +} + +fn remote_driver_socket_from_file( + name: &str, + file: Option<&config_file::ConfigFile>, +) -> Result { + let Some(file) = file else { + return Err(Error::config(format!( + "compute driver '{name}' is not a built-in driver; configure [openshell.drivers.{name}].socket_path or pass --compute-driver-socket" + ))); + }; + let Some(raw) = file.openshell.drivers.get(name) else { + return Err(Error::config(format!( + "compute driver '{name}' is not a built-in driver; configure [openshell.drivers.{name}].socket_path" + ))); + }; + let config = raw + .clone() + .try_into::() + .map_err(|err| { + Error::config(format!( + "invalid [openshell.drivers.{name}] table for remote compute driver: {err}" + )) + })?; + Ok(config.socket_path) +} + fn kubernetes_sandbox_jwt_expiry_disabled(config: &Config, driver: ComputeDriverKind) -> bool { matches!(driver, ComputeDriverKind::Kubernetes) && config @@ -924,7 +1013,7 @@ fn warn_if_kubernetes_sandbox_jwt_expiry_disabled(config: &Config, driver: Compu #[cfg(test)] mod tests { use super::{ - ConnectionProtocol, MultiplexService, ServerState, TlsAcceptor, + ConfiguredComputeDriver, ConnectionProtocol, MultiplexService, ServerState, TlsAcceptor, allow_plaintext_service_http, classify_initial_bytes, configured_compute_driver, gateway_listener_addresses, is_benign_tls_handshake_failure, kubernetes_config_for_k8s_sa_bootstrap, kubernetes_sandbox_jwt_expiry_disabled, @@ -937,6 +1026,7 @@ mod tests { use rcgen::{CertificateParams, IsCa, KeyPair}; use std::io::{Error, ErrorKind, Write}; use std::net::SocketAddr; + use std::path::PathBuf; use std::sync::Arc; use std::time::Duration; use tempfile::{TempDir, tempdir}; @@ -1265,14 +1355,14 @@ mod tests { #[test] fn configured_compute_driver_triggers_auto_detection_when_empty() { - let config = Config::new(None).with_compute_drivers([]); + let config = Config::new(None).with_compute_drivers(std::iter::empty::()); // Empty drivers triggers auto-detection, which may return Some or None // depending on the environment. This test verifies the auto-detection path // is taken rather than immediately returning an error. - let result = configured_compute_driver(&config); + let result = configured_compute_driver(&config, None); // Either we get a detected driver or an error about none being detected. match result { - Ok(driver) => { + Ok(ConfiguredComputeDriver::Builtin(driver)) => { assert!( matches!( driver, @@ -1283,6 +1373,9 @@ mod tests { "auto-detected unexpected driver: {driver:?}" ); } + Ok(ConfiguredComputeDriver::Remote(remote)) => { + panic!("auto-detection returned remote driver: {remote:?}"); + } Err(e) => { assert!( e.to_string() @@ -1297,7 +1390,7 @@ mod tests { fn configured_compute_driver_rejects_multiple_entries() { let config = Config::new(None) .with_compute_drivers([ComputeDriverKind::Kubernetes, ComputeDriverKind::Podman]); - let err = configured_compute_driver(&config).unwrap_err(); + let err = configured_compute_driver(&config, None).unwrap_err(); assert!( err.to_string() .contains("multiple compute drivers are not supported yet") @@ -1308,27 +1401,90 @@ mod tests { #[test] fn configured_compute_driver_accepts_podman() { let config = Config::new(None).with_compute_drivers([ComputeDriverKind::Podman]); - assert_eq!( - configured_compute_driver(&config).unwrap(), - ComputeDriverKind::Podman - ); + let driver = configured_compute_driver(&config, None).unwrap(); + assert!(matches!( + driver, + ConfiguredComputeDriver::Builtin(ComputeDriverKind::Podman) + )); } #[test] fn configured_compute_driver_accepts_vm() { let config = Config::new(None).with_compute_drivers([ComputeDriverKind::Vm]); - assert_eq!( - configured_compute_driver(&config).unwrap(), - ComputeDriverKind::Vm - ); + let driver = configured_compute_driver(&config, None).unwrap(); + assert!(matches!( + driver, + ConfiguredComputeDriver::Builtin(ComputeDriverKind::Vm) + )); } #[test] fn configured_compute_driver_accepts_docker() { let config = Config::new(None).with_compute_drivers([ComputeDriverKind::Docker]); - assert_eq!( - configured_compute_driver(&config).unwrap(), - ComputeDriverKind::Docker + let driver = configured_compute_driver(&config, None).unwrap(); + assert!(matches!( + driver, + ConfiguredComputeDriver::Builtin(ComputeDriverKind::Docker) + )); + } + + #[test] + fn configured_compute_driver_resolves_named_remote_from_file() { + let file: super::config_file::ConfigFile = toml::from_str( + r#" +[openshell.gateway] +compute_drivers = ["kyma"] + +[openshell.drivers.kyma] +socket_path = "/run/openshell/kyma.sock" +"#, + ) + .unwrap(); + let config = Config::new(None).with_compute_drivers(["kyma"]); + + let driver = configured_compute_driver(&config, Some(&file)).unwrap(); + + match driver { + ConfiguredComputeDriver::Remote(remote) => { + assert_eq!(remote.name, "kyma"); + assert_eq!( + remote.socket_path, + PathBuf::from("/run/openshell/kyma.sock") + ); + } + ConfiguredComputeDriver::Builtin(other) => { + panic!("expected remote driver, got builtin driver {other:?}") + } + } + } + + #[test] + fn configured_compute_driver_rejects_vm_endpoint_from_config() { + let config = Config::new(None) + .with_compute_drivers([ComputeDriverKind::Vm]) + .with_compute_driver_endpoint("vm", "/run/openshell/vm.sock"); + + let err = configured_compute_driver(&config, None).unwrap_err(); + + assert!( + err.to_string() + .contains("reserved built-in driver and cannot be selected with a socket endpoint"), + "unexpected error: {err}" + ); + } + + #[test] + fn configured_compute_driver_rejects_builtin_endpoint() { + let config = Config::new(None) + .with_compute_drivers([ComputeDriverKind::Docker]) + .with_compute_driver_endpoint("docker", "/run/openshell/docker.sock"); + + let err = configured_compute_driver(&config, None).unwrap_err(); + + assert!( + err.to_string() + .contains("cannot be selected with a socket endpoint"), + "unexpected error: {err}" ); } diff --git a/docs/reference/gateway-config.mdx b/docs/reference/gateway-config.mdx index ff4542136..8b6e695d5 100644 --- a/docs/reference/gateway-config.mdx +++ b/docs/reference/gateway-config.mdx @@ -306,3 +306,24 @@ guest_tls_ca = "/var/lib/openshell/guest-tls/ca.pem" guest_tls_cert = "/var/lib/openshell/guest-tls/client.pem" guest_tls_key = "/var/lib/openshell/guest-tls/client-key.pem" ``` + +### Extension Driver + +Extension drivers run outside the gateway and expose the +`compute_driver.proto` gRPC service on a Unix socket. Use a non-reserved driver +name; built-in names such as `vm`, `docker`, `podman`, and `kubernetes` cannot +be selected through unmanaged socket endpoints. The selected driver name is the +key used for driver-owned sandbox config such as `template.driver_config.`. + +```toml +[openshell] +version = 1 + +[openshell.gateway] +bind_address = "127.0.0.1:17670" +log_level = "info" +compute_drivers = ["kyma"] + +[openshell.drivers.kyma] +socket_path = "/run/openshell/kyma-compute-driver.sock" +``` diff --git a/docs/reference/sandbox-compute-drivers.mdx b/docs/reference/sandbox-compute-drivers.mdx index 95a319c37..a79a3147a 100644 --- a/docs/reference/sandbox-compute-drivers.mdx +++ b/docs/reference/sandbox-compute-drivers.mdx @@ -21,7 +21,9 @@ Configure the compute driver on the gateway. Current releases accept one driver compute_drivers = ["docker"] ``` -Supported values are `docker`, `podman`, `kubernetes`, and `vm`. +Reserved built-in values are `docker`, `podman`, `kubernetes`, and `vm`. +Non-reserved names select an extension driver and require a +`socket_path` in `[openshell.drivers.]`. When `compute_drivers` is unset, the gateway auto-detects Kubernetes, then Podman, then Docker by CLI availability or a local Unix socket. The VM driver is never auto-detected; configure it explicitly with `compute_drivers = ["vm"]` or set `OPENSHELL_DRIVERS=vm` in the launch environment. @@ -29,10 +31,26 @@ Common gateway options: | Gateway TOML option | Description | |---|---| -| `compute_drivers = [""]` | Select the compute driver. Supported values are `docker`, `podman`, `kubernetes`, and `vm`. | +| `compute_drivers = [""]` | Select the compute driver. Built-in values are `docker`, `podman`, `kubernetes`, and `vm`; custom names require `[openshell.drivers.].socket_path`. | Set driver-specific values such as sandbox images, callback endpoints, network names, TLS material, and VM sizing in the gateway TOML file. See the [Gateway Configuration File](./gateway-config) reference for the full `[openshell.drivers.]` schema. +Extension drivers use the same `compute_driver.proto` gRPC surface as the +managed VM driver. For an out-of-tree driver, choose a driver name and point +the gateway at the Unix socket the operator has already provisioned: + +```toml +[openshell.gateway] +compute_drivers = ["kyma"] + +[openshell.drivers.kyma] +socket_path = "/run/openshell/kyma.sock" +``` + +The gateway does not spawn, supervise, or delete extension drivers. The +operator must protect the socket so only the gateway uid can access it. +Reserved built-in names cannot be selected through unmanaged socket endpoints. + Sandbox create supports `--cpu` and `--memory` for per-sandbox compute sizing. Docker and Podman apply them as runtime limits. Kubernetes applies them as both container requests and limits. The VM driver accepts the fields but currently @@ -225,7 +243,7 @@ compute_drivers = ["vm"] For a launch-time override, set `OPENSHELL_DRIVERS=vm` in the gateway environment and restart the service. -Configure VM driver values such as `grpc_endpoint`, `driver_dir`, `state_dir`, `default_image`, `bootstrap_image`, `vcpus`, `mem_mib`, `overlay_disk_mib`, `krun_log_level`, and `guest_tls_*` in `[openshell.drivers.vm]`. The VM `state_dir` stores overlay disks, console logs, runtime state, image-rootfs cache, and the private `run/compute-driver.sock` socket. +Configure VM driver values such as `grpc_endpoint`, `driver_dir`, `state_dir`, `default_image`, `bootstrap_image`, `vcpus`, `mem_mib`, `overlay_disk_mib`, `krun_log_level`, and `guest_tls_*` in `[openshell.drivers.vm]`. The VM `state_dir` stores overlay disks, console logs, runtime state, image-rootfs cache, and the private `run/compute-driver.sock` socket. The VM socket path is managed by the gateway and is not configurable through remote endpoint settings. The gateway starts `openshell-driver-vm` over a private Unix socket and passes its process ID so the driver can reject unexpected local clients. The driver's standalone TCP listener is disabled unless `--allow-unauthenticated-tcp` is set for local development.