Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions architecture/compute-runtimes.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ when a sandbox create request asks for GPU resources.
| Docker | Local development with Docker available. | Container plus nested sandbox namespace. | Uses host networking so loopback gateway endpoints work from the supervisor. |
| Podman | Rootless or single-machine deployments. | Container plus nested sandbox namespace. | Uses the Podman REST API, OCI image volumes, and CDI GPU devices when available. |
| Kubernetes | Cluster deployment through Helm. | Pod plus nested sandbox namespace. | Uses Kubernetes API objects, service accounts, secrets, PVC-backed workspace storage, and GPU resources. |
| VM | Experimental microVM isolation. | Per-sandbox libkrun VM. | Gateway spawns `openshell-driver-vm` as a subprocess over a private, state-local Unix socket. The VM driver boots a cached bootstrap `rootfs.ext4`, prepares requested OCI images inside a bootstrap VM with `umoci`, attaches the prepared image disk read-only, and gives each sandbox a writable `overlay.ext4` for merged-root changes and runtime material. The driver persists each accepted launch request beside the overlay and restarts those VMs on driver startup without recreating the overlay. |
| External | Out-of-tree drivers operated alongside the gateway. | Whatever boundary the driver implements. | Activated by `--compute-driver-socket=<path>` (env `OPENSHELL_COMPUTE_DRIVER_SOCKET`). The gateway connects to a UDS the operator already provisioned, runs `GetCapabilities`, logs the advertised `driver_name`, and dispatches all sandbox lifecycle calls through the same `compute_driver.proto` surface as the in-tree drivers. The driver process and socket lifecycle are operator-owned; the gateway does not spawn, supervise, or remove the driver. The trust boundary is the socket's filesystem permissions the operator must ensure only the gateway uid can read/write it. |
| VM | Experimental microVM isolation. | Per-sandbox libkrun VM. | Managed endpoint-backed driver. The gateway spawns `openshell-driver-vm`, waits for its Unix socket, and then consumes it through the same remote `compute_driver.proto` path used by unmanaged endpoint drivers. The VM driver boots a cached bootstrap `rootfs.ext4`, prepares requested OCI images inside a bootstrap VM with `umoci`, attaches the prepared image disk read-only, and gives each sandbox a writable `overlay.ext4` for merged-root changes and runtime material. The driver persists each accepted launch request beside the overlay and restarts those VMs on driver startup without recreating the overlay. |
| Extension | Out-of-tree drivers operated alongside the gateway. | Whatever boundary the driver implements. | Selected by a non-reserved custom `compute_drivers = ["<name>"]` entry with `[openshell.drivers.<name>].socket_path`, or by `--compute-driver-socket=<path>` as launch-time shorthand for the `extension` driver ID. Reserved built-in names such as `vm`, `docker`, `podman`, and `kubernetes` cannot be used as unmanaged socket endpoints. The gateway connects to a UDS the operator already provisioned, runs `GetCapabilities`, logs the advertised `driver_name`, and dispatches all sandbox lifecycle calls through `compute_driver.proto`. The driver process and socket lifecycle are operator-owned; the gateway does not spawn, supervise, or remove unmanaged extension drivers. The trust boundary is the socket's filesystem permissions: the operator must ensure only the gateway uid can read/write it. |

Per-sandbox CPU and memory values currently enter the driver layer through
template resource limits. Docker and Podman apply them as runtime limits.
Expand Down Expand Up @@ -79,7 +79,7 @@ The supervisor must be available inside each sandbox workload:
| Podman | Read-only OCI image volume containing the supervisor binary. |
| Kubernetes | Sandbox pod image or pod template configuration. |
| VM | Embedded in the guest rootfs bundle. |
| External | Defined by the out-of-tree driver. |
| Extension | Defined by the out-of-tree driver. |

Driver-controlled environment variables must override sandbox image or template
values for sandbox ID, sandbox name, gateway endpoint, relay socket path, TLS
Expand Down
74 changes: 59 additions & 15 deletions crates/openshell-core/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
//! Configuration management for `OpenShell` components.

use serde::{Deserialize, Serialize};
use std::collections::BTreeMap;
use std::fmt;
#[cfg(unix)]
use std::io::{Read, Write};
Expand Down Expand Up @@ -69,6 +70,27 @@ impl ComputeDriverKind {
}
}

/// Normalize a configured compute driver name.
///
/// Built-in driver names and custom remote driver names share the same
/// selection namespace. The normalized value is lowercase ASCII and may contain
/// letters, digits, `-`, and `_`.
pub fn normalize_compute_driver_name(value: &str) -> Result<String, String> {
let value = value.trim();
if value.is_empty() {
return Err("compute driver name cannot be empty".to_string());
}
if !value
.bytes()
.all(|b| b.is_ascii_alphanumeric() || matches!(b, b'-' | b'_'))
{
return Err(format!(
"invalid compute driver name '{value}'. use ASCII letters, digits, '-' or '_'"
));
}
Ok(value.to_ascii_lowercase())
}

impl fmt::Display for ComputeDriverKind {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_str(self.as_str())
Expand Down Expand Up @@ -358,13 +380,14 @@ pub struct Config {
/// The config shape allows multiple drivers so the gateway can evolve
/// toward multi-backend routing. Current releases require exactly one
/// configured driver.
pub compute_drivers: Vec<ComputeDriverKind>,
pub compute_drivers: Vec<String>,

/// When set, the gateway dispatches sandbox lifecycle to an out-of-tree
/// compute driver process listening on this Unix domain socket and
/// speaking `compute_driver.proto`. Takes precedence over
/// `compute_drivers` and the auto-detection probe.
pub external_compute_driver_socket: Option<PathBuf>,
/// Operator-provided endpoints for named remote compute drivers.
///
/// This is populated by CLI/env inputs such as `--compute-driver-socket`.
/// TOML-authored endpoints live under `[openshell.drivers.<name>]` and are
/// resolved by the gateway config loader.
pub compute_driver_endpoints: BTreeMap<String, PathBuf>,

/// TTL for SSH session tokens, in seconds. 0 disables expiry.
pub ssh_session_ttl_secs: u64,
Expand Down Expand Up @@ -565,7 +588,7 @@ impl Config {
gateway_jwt: None,
database_url: String::new(),
compute_drivers: vec![],
external_compute_driver_socket: None,
compute_driver_endpoints: BTreeMap::new(),
ssh_session_ttl_secs: default_ssh_session_ttl_secs(),
grpc_rate_limit_requests: None,
grpc_rate_limit_window_secs: None,
Expand Down Expand Up @@ -621,18 +644,27 @@ impl Config {

/// Create a new configuration with the configured compute drivers.
#[must_use]
pub fn with_compute_drivers<I>(mut self, drivers: I) -> Self
pub fn with_compute_drivers<I, D>(mut self, drivers: I) -> Self
where
I: IntoIterator<Item = ComputeDriverKind>,
I: IntoIterator<Item = D>,
D: ToString,
{
self.compute_drivers = drivers.into_iter().collect();
self.compute_drivers = drivers
.into_iter()
.map(|driver| driver.to_string())
.collect();
self
}

/// Pin an external compute driver by Unix domain socket path.
/// Register a Unix domain socket endpoint for a named remote driver.
#[must_use]
pub fn with_external_compute_driver_socket(mut self, socket: Option<PathBuf>) -> Self {
self.external_compute_driver_socket = socket;
pub fn with_compute_driver_endpoint(
mut self,
name: impl Into<String>,
socket: impl Into<PathBuf>,
) -> Self {
self.compute_driver_endpoints
.insert(name.into(), socket.into());
self
}

Expand Down Expand Up @@ -780,8 +812,8 @@ mod tests {
use super::is_reachable_unix_socket;
use super::{
ComputeDriverKind, Config, DEFAULT_SERVICE_ROUTING_DOMAIN, GatewayJwtConfig, detect_driver,
docker_host_unix_socket_path, is_unix_socket, podman_socket_candidates_from_env,
podman_socket_responds,
docker_host_unix_socket_path, is_unix_socket, normalize_compute_driver_name,
podman_socket_candidates_from_env, podman_socket_responds,
};
#[cfg(unix)]
use std::io::{Read as _, Write as _};
Expand Down Expand Up @@ -817,6 +849,18 @@ mod tests {
assert!(err.contains("unsupported compute driver 'firecracker'"));
}

#[test]
fn compute_driver_name_normalization_accepts_builtin_and_custom_names() {
assert_eq!(normalize_compute_driver_name(" VM ").unwrap(), "vm");
assert_eq!(
normalize_compute_driver_name("Kyma_GPU-1").unwrap(),
"kyma_gpu-1"
);

let err = normalize_compute_driver_name("kyma/gpu").unwrap_err();
assert!(err.contains("invalid compute driver name"));
}

#[test]
fn config_defaults_to_loopback_bind_address() {
let expected: SocketAddr = "127.0.0.1:17670".parse().expect("valid address");
Expand Down
128 changes: 99 additions & 29 deletions crates/openshell-server/src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -109,15 +109,15 @@ struct RunArgs {
value_delimiter = ',',
value_parser = parse_compute_driver
)]
drivers: Vec<ComputeDriverKind>,
drivers: Vec<String>,

/// Path to a Unix domain socket served by an out-of-tree compute driver
/// Path to a Unix domain socket served by a remote compute driver
/// implementing `compute_driver.proto`.
///
/// When set, the gateway dispatches sandbox lifecycle to that driver
/// instead of one of the in-tree backends, skipping both the `--drivers`
/// list and the auto-detection probe. The driver name advertised in
/// `GetCapabilities` is logged for diagnostics.
/// When set, the socket is associated with the single configured driver
/// name. If no driver name is configured, the gateway uses `extension`.
/// Reserved built-in driver names such as Docker, Podman, Kubernetes, and
/// VM do not accept socket endpoints.
#[arg(long, env = "OPENSHELL_COMPUTE_DRIVER_SOCKET")]
compute_driver_socket: Option<PathBuf>,

Expand Down Expand Up @@ -245,6 +245,7 @@ async fn run_from_args(mut args: RunArgs, matches: ArgMatches) -> Result<()> {
if let Some(file) = file.as_ref() {
merge_file_into_args(&mut args, &file.openshell.gateway, &matches);
}
normalize_compute_driver_socket_args(&mut args)?;

let local_tls = apply_runtime_defaults(&mut args)?;
let local_jwt = defaults::complete_local_jwt_config()?;
Expand Down Expand Up @@ -375,13 +376,19 @@ async fn run_from_args(mut args: RunArgs, matches: ArgMatches) -> Result<()> {
args.grpc_rate_limit_requests,
args.grpc_rate_limit_window_seconds,
)
.with_external_compute_driver_socket(args.compute_driver_socket.clone())
.with_server_sans(args.server_sans.clone())
.with_loopback_service_http(args.enable_loopback_service_http);
validate_grpc_rate_limit_args(
args.grpc_rate_limit_requests,
args.grpc_rate_limit_window_seconds,
)?;
if let Some(socket) = args.compute_driver_socket.clone() {
let driver = args
.drivers
.first()
.expect("normalize_compute_driver_socket_args sets a driver for socket endpoints");
config = config.with_compute_driver_endpoint(driver.clone(), socket);
}

if let Some(ttl) = file
.as_ref()
Expand Down Expand Up @@ -468,8 +475,8 @@ async fn run_from_args(mut args: RunArgs, matches: ArgMatches) -> Result<()> {
.into_diagnostic()
}

fn parse_compute_driver(value: &str) -> std::result::Result<ComputeDriverKind, String> {
value.parse()
fn parse_compute_driver(value: &str) -> std::result::Result<String, String> {
openshell_core::config::normalize_compute_driver_name(value)
}

fn resolve_config_path(args: &RunArgs) -> Result<Option<PathBuf>> {
Expand Down Expand Up @@ -668,16 +675,51 @@ fn validate_grpc_rate_limit_args(requests: Option<u64>, window_seconds: Option<u
Ok(())
}

fn effective_single_driver(args: &RunArgs) -> Option<ComputeDriverKind> {
// An external-driver socket pins dispatch to the out-of-tree path and
// bypasses both the `--drivers` list and auto-detection probe; callers
// that key off the in-tree `ComputeDriverKind` get `None` here.
if args.compute_driver_socket.is_some() {
return None;
fn normalize_compute_driver_socket_args(args: &mut RunArgs) -> Result<()> {
let Some(socket) = args.compute_driver_socket.as_ref() else {
return Ok(());
};
if socket.as_os_str().is_empty() {
return Err(miette::miette!(
"--compute-driver-socket must not be an empty path"
));
}

match args.drivers.as_slice() {
[] => {
args.drivers.push("extension".to_string());
Ok(())
}
[driver] => {
let driver = openshell_core::config::normalize_compute_driver_name(driver)
.map_err(|err| miette::miette!("{err}"))?;
if matches!(
driver.parse::<ComputeDriverKind>().ok(),
Some(
ComputeDriverKind::Docker
| ComputeDriverKind::Podman
| ComputeDriverKind::Kubernetes
| ComputeDriverKind::Vm
)
) {
return Err(miette::miette!(
"--compute-driver-socket cannot be combined with reserved built-in compute driver '{driver}'"
));
}
args.drivers[0] = driver;
Ok(())
}
drivers => Err(miette::miette!(
"--compute-driver-socket requires exactly one compute driver name, got: {}",
drivers.join(",")
)),
}
}

fn effective_single_driver(args: &RunArgs) -> Option<ComputeDriverKind> {
match args.drivers.as_slice() {
[] => openshell_core::config::detect_driver(),
[driver] => Some(*driver),
[driver] => driver.parse().ok(),
_ => None,
}
}
Expand Down Expand Up @@ -1585,41 +1627,67 @@ ssh_session_ttl_secs = 1234
.unwrap_or_else(std::sync::PoisonError::into_inner);
let _g = EnvVarGuard::remove("OPENSHELL_COMPUTE_DRIVER_SOCKET");

let (args, _) = parse_with_args(&[
let (mut args, _) = parse_with_args(&[
"openshell-gateway",
"--db-url",
"sqlite::memory:",
"--compute-driver-socket",
"/run/openshell/external.sock",
"/run/openshell/extension.sock",
]);
super::normalize_compute_driver_socket_args(&mut args).unwrap();
assert_eq!(
args.compute_driver_socket.as_deref(),
Some(std::path::Path::new("/run/openshell/external.sock"))
Some(std::path::Path::new("/run/openshell/extension.sock"))
);
// External socket pins dispatch off the in-tree enum, so the
// single-driver helper must return None even when no --drivers given.
assert_eq!(args.drivers, ["extension"]);
assert!(super::effective_single_driver(&args).is_none());
}

#[test]
fn compute_driver_socket_overrides_drivers_flag() {
fn compute_driver_socket_rejects_reserved_builtin_drivers() {
let _lock = ENV_LOCK
.lock()
.unwrap_or_else(std::sync::PoisonError::into_inner);
let _g = EnvVarGuard::remove("OPENSHELL_COMPUTE_DRIVER_SOCKET");

let (args, _) = parse_with_args(&[
let (mut args, _) = parse_with_args(&[
"openshell-gateway",
"--db-url",
"sqlite::memory:",
"--drivers",
"docker",
"--compute-driver-socket",
"/run/openshell/external.sock",
"/run/openshell/extension.sock",
]);
let err = super::normalize_compute_driver_socket_args(&mut args).unwrap_err();
assert!(
err.to_string()
.contains("cannot be combined with reserved built-in compute driver 'docker'"),
"unexpected error: {err}"
);
}

#[test]
fn compute_driver_socket_rejects_vm_endpoint() {
let _lock = ENV_LOCK
.lock()
.unwrap_or_else(std::sync::PoisonError::into_inner);
let _g = EnvVarGuard::remove("OPENSHELL_COMPUTE_DRIVER_SOCKET");

let (mut args, _) = parse_with_args(&[
"openshell-gateway",
"--db-url",
"sqlite::memory:",
"--drivers",
"vm",
"--compute-driver-socket",
"/run/openshell/vm.sock",
]);
let err = super::normalize_compute_driver_socket_args(&mut args).unwrap_err();
assert!(
super::effective_single_driver(&args).is_none(),
"external socket must short-circuit --drivers"
err.to_string()
.contains("cannot be combined with reserved built-in compute driver 'vm'"),
"unexpected error: {err}"
);
}

Expand All @@ -1630,14 +1698,16 @@ ssh_session_ttl_secs = 1234
.unwrap_or_else(std::sync::PoisonError::into_inner);
let _g = EnvVarGuard::set(
"OPENSHELL_COMPUTE_DRIVER_SOCKET",
"/var/run/openshell/external.sock",
"/var/run/openshell/extension.sock",
);

let (args, _) = parse_with_args(&["openshell-gateway", "--db-url", "sqlite::memory:"]);
let (mut args, _) = parse_with_args(&["openshell-gateway", "--db-url", "sqlite::memory:"]);
super::normalize_compute_driver_socket_args(&mut args).unwrap();
assert_eq!(
args.compute_driver_socket.as_deref(),
Some(std::path::Path::new("/var/run/openshell/external.sock"))
Some(std::path::Path::new("/var/run/openshell/extension.sock"))
);
assert_eq!(args.drivers, ["extension"]);
}

#[test]
Expand Down
Loading