feat(bluefield): add bluefield compute driver#1899
Draft
cheese-head wants to merge 14 commits into
Draft
Conversation
Introduce the openshell-driver-bluefield package as a set of private workspace member crates, starting with bf-core: the shared VF handle, role, claim, and lifecycle contracts the rest of the BlueField driver builds on. No behavior is wired into the gateway yet.
bf-inventory turns host sysfs into a set of claimable VF slots and owns the per-sandbox claim/release bookkeeping (VfPool) used by the lifecycle extension to hand out one VF per sandbox.
bf-vm plugs into the VM driver's lifecycle-extension seam. For each sandbox it claims a VF, checks host passthrough readiness, binds the VF to vfio-pci, persists the binding for restart recovery, and releases it on launch failure or delete. It also selects the BlueField guest kernel and wires the static guest-egress env contract.
bf-driver is the external compute driver process. It parses the BlueField CLI/env surface, installs the bf-vm lifecycle extension for the workload-running roles, and serves the ComputeDriver gRPC API over an authenticated Unix socket (or unauthenticated TCP for local dev).
… seam Introduce a generic guest-resource model on LaunchPlan so lifecycle extensions can request host PCI passthrough without growing the shared plan type per device class: - lifecycle: LaunchPlan now carries an opaque `resources: Vec<GuestResource>` with an `add_resource` writer. `GuestResource::PciPassthrough` is the only variant today; new kinds (e.g. volume mounts) become new variants without touching the plan's shape or its constructors. - runtime: render a `pcie-root-port` + `vfio-pci` pair per passthrough device and make the GPU device block optional, so a sandbox can carry a GPU plus one or more VF NICs at once. - driver: relax the non-GPU QEMU guard when a concrete passthrough device backs the launch, and forward each device to the launched child via `--vm-pci-passthrough`. The PCI-specific projection lives in the driver layer (which renders QEMU), keeping LaunchPlan generic; the exhaustive match forces that site to be revisited when a variant is added. Wire the BlueField VM extension onto the seam: declare the claimed VF as a passthrough device in `configure_launch` (so the backend resolves to QEMU and the guard sees a concrete device) and bind it in `before_launch`, attaching it to the guest as an egress NIC alongside any GPU.
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
Signed-off-by: Patrick Riel <priel@nvidia.com>
…nctions
Rename the VF-specific inventory/handle types into runtime-neutral
network-function types so SF and future container/Kubernetes adapters can
reuse the discovery, allocation, and assignment contracts:
- bf-core: VfRef -> NetFunction, VfSlot -> FunctionSlot (with a FunctionKind
{ Vf, Sf } discriminant and a generic `index`), drop VM-centric guest_*
field names (guest_mac -> mac, guest_datapath_address -> datapath_address,
AttachSpec guest_ip -> endpoint_ip). BluefieldAssignment carries `kind` and
uses generalized label keys.
- bf-inventory: VfInventory -> FunctionInventory, VfPool -> FunctionPool,
StaticVfInventory -> StaticFunctionInventory, VfError/VfResult ->
InventoryError/InventoryResult. Sysfs VF/representor impls keep their
kind-specific names.
- bf-vm: update all call sites; VFIO binding mechanism names retained.
Also restructure the bluefield README into a package-marker overview that
links the bf-vm implementation guide.
Signed-off-by: Patrick Riel <priel@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the host-side BlueField compute driver stack for OpenShell: a kind-aware
network-function abstraction (
bf-core), sysfs VF discovery and allocation(
bf-inventory), a VM lifecycle extension that claims a BlueField VF, binds itto
vfio-pci, passes it into a QEMU sandbox guest, and wires VF-backed egress(
bf-vm), plus the external driver binary (bf-driver). The contracts arefunction-kind aware (
Vf/Sf) so future SF, container, and Kubernetesadapters can reuse the same discovery, allocation, and assignment layers.
Related Issue
Not linked.
Changes
bf-corecontracts crate (kind-awareNetFunction/FunctionSlot/FunctionKind, lifecycle and runtime traits, and the label-basedBluefieldAssignmentcontract) plus workspace wiring.bf-inventory: sysfs host-VF and DPU-representor discovery, a static inventory for tests, and theFunctionPoolclaim/release allocator.bf-vm: a BlueField lifecycle extension over the VM compute driver — per-sandbox VF claim,vfio-pcibind with restore-on-teardown, host-passthrough preflight gating, guest-egress env contract and init drop-in, host PF auto-resolution, and QEMU guest-kernel resolution.bf-driver: the externalopenshell-driver-bluefieldcompute-driver binary the gateway spawns.openshell-driver-vmwith a generic PCI passthrough resource seam so host PCI devices can be passed into the guest.FunctionKind { Vf, Sf }; drop VM-centricguest_*field names) so SF/container/Kubernetes adapters can reuse the contracts.openshell-driver-bluefieldREADME into a package-marker overview that links thebf-vmimplementation guide.Testing
mise run pre-commitpassesChecklist