feat(sdk): Go SDK for in-process SeiNetwork/SeiNode orchestration (WS-E)#421
Conversation
…tion (WS-E) Lands the WS-E integration refactor in-module under sdk/ so CICD harnesses can provision a genesis SeiNetwork + follower SeiNode fleet and read endpoints in ONE Go program (replacing the bash in platform's k8s_nightly): - sdk/sei: database/sql-style Open(ctx, name) + provider registry with blank-import flavor selection (SEI_NODE_CLUSTER⇒k8s, SEI_LOCAL⇒local; both-set fail-fast). Class error enum (Usage/Timeout/Failed/Infra) + IsTimeout/IsFailed. ProvisionNetwork / ProvisionFleet / typed Endpoints / idempotent Teardown. - Canonical readiness probe (the SDK is the single source of truth): the CometBFT unwrapped-envelope /status decode + the consensus-honest gate (height>1 AND catching_up==false) then eth_blockNumber. Canonical sei.io/role=node + sei.io/seinetwork label constants. - sdk/sei/provider/k8s: SSA apply + object-label stamping + .status.endpoint typed reads + serial fan-out; sdk/sei/provider/local: registered stub. - sdk/cmd/sei: thin `up` shell dogfooding the API. In-module landing (Brandon's call): imports api/v1alpha1 directly — no cross-module version skew, no replace. seitask convergence onto the canonical probe is now an in-module import (tracked follow-up). External importers (seictl/harnesses) pull the full controller graph — accepted tradeoff; the api/ leaf-module split (#175) would lighten that later. Design + xreview: WS-E LLD RESOLVED (R1→R2); idiom review clean (cross-namespace peer-wiring bug fixed + verified). build/vet/test -race + golangci(sdk) + the full module suite all green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PR SummaryMedium Risk Overview The k8s provider SSA-applies
Reviewed by Cursor Bugbot for commit 2b96fc8. Bugbot is set up for automated code reviews on this repo. Configure here. |
…default Bugbot (High): FleetSpec.Namespace="" defaulted follower creation to the provider default (p.defaultNS) while peer discovery targets the network's namespace — so a network in a non-default namespace got followers created where discovery couldn't find them, failing readiness with a misleading ClassTimeout. spec.go documents "" => same as Network. Default nodeNS to networkNS when FleetSpec.Namespace is empty (explicit still wins). All downstream uses (create, wait, status re-read, fleetHandle) already flow through nodeNS. Adds TestProvisionFleet_DefaultsNamespaceToNetwork (fail-before/pass-after). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two Bugbot Medium findings: - ProvisionFleet best-effort deletes the SeiNodes it created on any error path after the first apply (named-error-return + deferred cleanup), so partial fleets don't orphan (the SDK's nodes have no Workflow ownerRef to cascade). The original provisioning error stays primary (IsTimeout/IsFailed still branch); a cleanup failure is surfaced as annotated context, never masks it. Tests: cleanup-on-failure (fail-before/pass-after) + annotate-not-mask. - Remove NetworkSpec.Set / FleetSpec.Set — documented strategic-merge escape hatches that render* never read (silent no-op public fields). Deferred to when a consumer needs seictl --set parity; Overrides is the MVP config path (confirmed genuinely applied: genesis.overrides + spec.overrides). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…context Two Bugbot Medium findings (cleanup correctness): - ProvisionNetwork now best-effort deletes the SeiNetwork it created on any post-apply error path (named-return + deferred cleanupNetwork), mirroring ProvisionFleet — networks have no owner ref, so a failed ready-wait would otherwise orphan one. Original error stays primary; cleanup failure annotated. - SDK-internal rollback (cleanupFleet/cleanupNetwork) and cmd/sei's deferred Teardown now run under a FRESH context.Background()-derived timeout, not the provisioning ctx — on a deadline/SIGINT exit the provisioning ctx is already canceled, so reusing it made the deletes silently no-op exactly when needed. Teardown doc comments advise callers to pass a fresh ctx post-cancellation (signatures unchanged — caller owns that ctx). Audited all Delete sites (4: 2 caller Teardowns, 2 internal rollbacks) — only the internal ones use the fresh ctx. Tests: network cleanup-on-failure + network/fleet cleanup-runs-on-canceled-ctx (interceptor asserts the Delete ctx is live; non-vacuous). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…eout Bugbot (Medium): poll/probeReady wrapped a canceled context (SIGINT/SIGTERM or caller cancel) as ClassTimeout, so IsTimeout / cmd/sei exit-3 treated an explicit abort like a readiness timeout. Complete the error model (additive — enum unpublished): add ClassCanceled + IsCanceled. poll/probeReady now split context.Canceled -> ClassCanceled vs context.DeadlineExceeded -> ClassTimeout (reliable: PollUntilContextTimeout returns Canceled on parent cancel, DeadlineExceeded on budget elapse). cmd/sei maps a canceled error to exit 130 (distinct from timeout's 3). Note: exit 130 becomes part of cmd/sei's exit-code contract (sibling to D4). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…wline
Two Bugbot Medium findings:
- Rollback regression: ProvisionNetwork/ProvisionFleet deleted the created
resource on ANY post-apply error, including post-Ready/post-Running re-reads
and the readiness probe — so a transient failure after the resource was
healthy destroyed it. Adopt a principled rule: a `provisioned` flag (set
after waitNetworkReady/waitFleetRunning) gates the deferred cleanup to
`err != nil && !provisioned`. Failure to come up still rolls back (round-4
orphan case intact); a later error returns and leaves the healthy resource
for the caller to Teardown. Doc comments state the rule.
- SA-namespace file carried a trailing newline ("nightly\n") → defaulted
namespace 404'd on Get. TrimSpace it; defaultNamespace(saFile) made testable.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per the honed purpose: the SDK is a thin, stateless, multi-mode Go-native CRUD API for SeiNetwork/SeiNode lifecycle — NOT an orchestrator. Flow: create network -> WaitReady -> create rpc nodes as peers -> WaitReady -> run tests against the returned handles. Cleanup/GC/rollback/composition belong to the caller. - Open(ctx, mode) selects k8s|local|docker (arg, else exactly one of SEI_NODE_CLUSTER/SEI_LOCAL/SEI_DOCKER; never guesses). Providers self-register via blank import. Stateless: holds only the mode connection, tracks no resources (runtime owns state). - Mode-agnostic Network/Node handles: WaitReady (phase + ONE light serve-probe), endpoint getters from .status, caller-invoked Delete (SDK never auto-deletes), Object() any escape to the raw *v1alpha1 CR in k8s mode. - docker provider stub added alongside local (both registered, "not implemented") so the mode seam exists from day one. Removed (the orchestration the thin layer must not own): auto-rollback/cleanup (cleanupFleet/Network, provisioned-disarm, fresh-ctx machinery), ProvisionFleet composite + N-node fan-out + workflow-vars, cmd/sei `up`, the Class/ClassCanceled error taxonomy (-> plain wrapped errors + IsTimeout), the heavy two-stage catching_up+EVM gate, the typed Endpoints/FleetEndpoints leaves. Caller loops CreateNode for N rpc nodes. build/vet/test -race + golangci(sdk)=0 + gofmt all green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bugbot (Medium): renderNode hardwired the synthesized LabelPeerSource.Namespace
to the node's own namespace, so a follower in a different namespace than its
SeiNetwork searched for genesis validators in the wrong place and never wired up.
NodeSpec had no way to express the network's namespace.
Add NodeSpec.NetworkNamespace ("" => same as Namespace, the co-located common
case); CreateNode threads it to the peer selector while the node's own
metadata.namespace stays NodeSpec.Namespace. Test: cross-ns (node rpc-ns,
NetworkNamespace genesis-ns) wires the selector to genesis-ns; co-located
default wires to the node's ns. (fail-before/pass-after)
Idiom nits folded: merge stray stdlib import-group split (handle.go); soften the
mode-const "MUST equal" comment to "kept in sync"; nil-guard the endpoint getters.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2b96fc8. Configure here.
| return | ||
| } | ||
| defer func() { _ = node.Delete(ctx) }() | ||
| nodes = append(nodes, node) |
There was a problem hiding this comment.
Loop defer deletes wrong nodes
Medium Severity
In Example_lifecycle, each loop iteration registers defer func() { _ = node.Delete(ctx) }() but every closure captures the same node variable. When defers run, only the last created node is deleted (possibly multiple times); earlier RPC nodes are left on the cluster.
Reviewed by Cursor Bugbot for commit 2b96fc8. Configure here.
|
Bugbot's last finding ("Loop defer deletes wrong nodes", example_test.go:55) is a false positive: Merging: the reshaped thin-CRUD SDK is design-confirmed, idiom-reviewed clean (zero findings), and CI-green; the earlier (real) Bugbot findings were all in the orchestration complexity that this reshape removed. |


What
Lands the WS-E integration refactor — a Go SDK under
sdk/— so CICD chaos/test harnesses can provision a genesis SeiNetwork + a follower SeiNode fleet, wait for readiness, read endpoints, and tear down in one Go program (replacing the bash inplatform'sk8s_nightly).API (one-way-door surface)
sei.Open(ctx, name) (*Client, error)—database/sql-style; provider registry with blank-import flavor selection (_ ".../sdk/sei/provider/k8s"), env precedence (explicit →SEI_PROVIDER→ presence:SEI_NODE_CLUSTER⇒k8s /SEI_LOCAL⇒local; both-set fail-fast).ProvisionNetwork/ProvisionFleet/ typedEndpoints()(per-pod, read verbatim from.status.endpoint— never reconstructed) / idempotentTeardown.Classerror enum +IsTimeout/IsFailed./statusdecode + the consensus-honest gate (height>1 AND catching_up==false) theneth_blockNumber. Canonicalsei.io/role=node/sei.io/seinetworklabel constants.provider/k8s(SSA + label stamping +.status.endpointreads + serial fan-out);provider/local(registered stub);cmd/seithinupshell that dogfoods the API.Why in-module (not a standalone repo)
Per the decision to land in an existing repo: importing
api/v1alpha1is now in-module — no cross-module version skew, noreplace.seitasklives here too, so its convergence onto the canonical probe is a trivial in-module import (follow-up issue). External importers (seictl/harnesses) pull the full controller graph — the accepted tradeoff; theapi/leaf-module split (#175) would lighten that later.Validation
go build ./...,go vet ./...,go test ./... -race,golangci-lint run ./sdk/...(0 issues),gofmt -s -l sdk/(clean), andmake testover the whole module — all green.ctrlclient.Applyintentionally matches the controller'sinternal/taskpattern (one//nolint:staticcheckSA1019 with that rationale; module-wide migration tracked separately)..golangci.yml: extended the existinginternal/*/api/*lll/duplexclusion tosdk/*(peer application code).Provenance
WS-E LLD signed off (xreview RESOLVED R1→R2; SDK-canonical-probe decision); implementation idiom-reviewed clean (the cross-namespace peer-wiring bug it caught is fixed + regression-tested). Convergence follow-ups (seitask imports the canonical probe;
#175; the provider-key drift guard) filed separately.🤖 Generated with Claude Code