Skip to content

Add Autobahn fullnode (CON-309)#3525

Open
wen-coding wants to merge 96 commits into
mainfrom
wen/autobahn_rpc_write_side
Open

Add Autobahn fullnode (CON-309)#3525
wen-coding wants to merge 96 commits into
mainfrom
wen/autobahn_rpc_write_side

Conversation

@wen-coding

@wen-coding wen-coding commented May 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a non-validator fullnode role to Autobahn. A fullnode loads the committee from autobahn.json as a routing table, dials a single committee member at a time over giga to pull finalized blocks (StreamFullCommitQCs + GetBlock), executes them locally via runExecute (FinalizeBlock + Commit + PushAppHash), and accepts inbound block-sync from other peers. Read endpoints (eth_call, eth_getBalance, eth_getTransactionReceipt, /block, /status) are served from the locally-executed state; eth_sendRawTransaction is the only thing forwarded — via EvmProxy to the shard-owning validator, because submission needs the producer's mempool.

Role follows cfg.Mode: mode = "validator" runs the validator path (requires the local key to be in the committee, and rejects priv-validator.laddr since autobahn signs in-process); any other mode runs as a fullnode. Mode is the operator's explicit role declaration, kept separate from committee membership so a newly-joined committee member can finish catch-up as a fullnode before the operator flips to mode = "validator". An ERROR is logged at startup if mode and committee membership disagree, so operator misconfiguration is visible.

Type split. GigaRouter is now built by two separate constructors:

  • NewGigaValidatorRouter(*GigaValidatorConfig, ...) — committee members. Embeds GigaRouterCommonConfig and adds ValidatorKey, ViewTimeout, Producer. Runs consensus + producer + the full giga service.
  • NewGigaFullnodeRouter(*GigaRouterCommonConfig, ...) — non-validators. No fullnode-specific config fields — the common config is sufficient. No consensus, no producer, no mempool. Single-active-subscriber dial loop: shuffled-once committee walk, cfg.DialInterval between attempts.

GigaRouter interface exposes the read path (Run, RunInboundConn, LastCommittedBlockNumber, MaxGasEstimatedPerBlock, BlockByNumber, BlockByHash, EvmProxy, Mempool). Mempool() utils.Option[*producer.State] returns Some(r.producer) on validators and None on fullnodes — the producer-backed mempool that internal/rpc/core/mempool.go and rpc/client/local/local.go reach through. dialAndRunConn takes utils.Option[NodePublicKey] for the expected peer key: validators pass Some(addr.Key) (consensus voting requires peer identity), fullnodes pass None (block-sync data is QC-signed, so peer identity doesn't need a dial-layer check).

Inbound block-sync on both roles. Validators and fullnodes both expose RunInboundConn. Committee peers get the full giga RPC on validators (RunServer); non-committee peers get the block-sync subset (RunBlockSyncServer: StreamFullCommitQCs + GetBlock), capped per node by autobahn-max-inbound-fullnode-peers (default 10; *int pointer in TOML so absent and explicit-zero are distinguishable). Cap is an atomic.Int64 counter on the hot path. Enables tree topologies — operators can deploy a relay fullnode that pulls from committee members and serves a fleet of downstream fullnodes.

EVM tx writes. eth_sendRawTransaction is sender-shard-mapped via Committee.EvmShard and forwarded over HTTP to the shard owner. Every committee member must expose an evmrpc URL (required field on AutobahnFileConfig, validated at parse time). Validators short-circuit on their own shard to local mempool; fullnodes always forward.

LastCommittedBlockNumber reads app.LastBlockHeight() directly. New method on the ABCI Application interface for a fast in-memory read of the app's executed height (Info() is too heavy for the hot path). Both validator and fullnode return the same value, matching CometBFT /status — clients querying receipts at the reported height never see a height the app hasn't reached. Single source of truth (the app), no parallel cache.

Run() lifecycle. GigaRouter is constructed in node/setup.go (buildGigaRouter) and spawned by node.go's OnStart via SpawnCritical, alongside the transport. Router.Run does not spawn giga.Run — the rule is that whoever constructs an object owns calling its Run.

Block-sync-only Service. giga.Service.state utils.Option[*consensus.State]. NewService constructs the full validator service; NewBlockSyncService constructs the fullnode-side (None). Consensus / avail handlers reach state through a validatorState() helper that OrPanics on the (structurally impossible) None branch.

Config. autobahn-config-file and autobahn-max-inbound-fullnode-peers are top-level TOML keys (placed above any [section] header so viper sees them at root scope where mapstructure expects). PersistentStateDir is rootified against the node's --home dir, matching how cometbft handles relative paths elsewhere.

CI. Adopts main's GHCR-based integration-test image distribution (matrix jobs pull from ghcr.io/sei-protocol/sei-chain-integration-test-{localnode,rpcnode}:<run_id> rather than loading a per-job tar artifact); adds an autobahn-integration-tests matrix job alongside the existing groups. The RPC node init script passes --overwrite to seid init so re-runs in a recycled container don't abort on existing config, and uses explicit if [ ! -f ] fail-fast checks instead of set -e (which made the autobahn-discovery curl probes fatal); polls for both genesis and per-validator dirs with explicit timeout errors.

Note: parallel multi-upstream block-sync + peer discovery (full mesh) is deferred to a follow-up PR — fullnodes here are single-active outbound + open inbound. That's enough for tree topologies (operator-configured relay fullnodes).

Test plan

  • gofmt -s + go vet clean
  • golangci-lint run clean
  • go test ./sei-tendermint/internal/p2p/... ./sei-tendermint/config/... ./sei-tendermint/node/... ./sei-tendermint/internal/proxy/... pass
  • make autobahn-integration-test: the whole client-facing EVM flow (balance / chainId / nonce / send / receipt) runs through a fullnode sidecar that catches up via giga and forwards writes; halt/liveness/permanent-fault scenarios verified by polling helpers against the fullnode's CometBFT RPC.
  • CI green

🤖 Generated with Claude Code

Adds an autobahn-role config (validator|rpc-only). With autobahn-role=
"rpc-only", a non-validator RPC node loads the committee from autobahn.json
as a routing table only — no consensus participation, no block execution,
no validator key required.

eth_sendRawTransaction submitted to such a node is recovered, sender-shard-
mapped via Committee.EvmShard, and forwarded over HTTP to the shard owner's
EVM RPC. The rest of the giga stack (consensus, producer, data, service)
stays nil; Run is a no-op; block-read methods return a sentinel error.

InitRPCOnly bootstraps the app once at startup so x/evm params (chain ID,
signer config) are populated. app.go pre-fires the EVM HTTP/WS start gate
since rpc-only nodes don't call ProcessBlock in the current milestone — see
TODO(autobahn-read-path) in NewGigaRouter for the read-side scope.

CI: wires PR #3234's make autobahn-integration-test into the workflow as a
new top-level job (it owns its own cluster via TestMain, so can't share the
matrix's cluster), and adds a TestAutobahn/RPCOnlyForwarding sub-test that
verifies an actual signed tx round-trips through the proxy: rpc-only sidecar
→ shard owner → block inclusion → receipt on validator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cursor

cursor Bot commented May 30, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Large refactor of Autobahn networking (validator vs fullnode, inbound caps, EVM tx proxying) and ABCI surface changes; misconfiguration or routing bugs could affect block execution, RPC correctness, or tx submission.

Overview
Introduces a non-validator Autobahn fullnode path: nodes with mode != "validator" build NewGigaFullnodeRouter, block-sync from one committee peer at a time, execute blocks locally, and expose read RPC from local state while forwarding eth_sendRawTransaction to shard owners via EvmProxy. Validators use NewGigaValidatorRouter with consensus/producer/mempool unchanged in spirit but refactored behind a shared GigaRouter interface; buildGigaRouter in setup picks the constructor from cfg.Mode and logs when mode disagrees with committee membership.

Giga / P2P: Inbound connections are split—committee peers get full giga on validators; others get block-sync only, with a configurable autobahn-max-inbound-fullnode-peers cap. EvmProxy / LocalClient.EvmProxy now return utils.Option[*url.URL]; evmrpc and mocks updated accordingly. LastCommittedBlockNumber uses new Application.LastBlockHeight() instead of a CommitQC cache. Autobahn JSON requires evmrpc per validator (no optional URL).

Ops / CI: autobahn-config-file moves to top-level TOML (with tests); rpc-node init polls for genesis and validator dirs, generates autobahn config when AUTOBAHN=true, and uses seid init --overwrite. Makefile/docker use DOCKER_PLATFORM and GHCR-oriented integration image messaging. New autobahn-integration-tests workflow job runs make autobahn-integration-test; integration tests boot an RPC fullnode sidecar and use height polling instead of fixed sleeps for halt/liveness checks.

Reviewed by Cursor Bugbot for commit cf7709e. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented May 30, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 20, 2026, 5:01 AM

@codecov

codecov Bot commented May 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 52.57511% with 221 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.22%. Comparing base (8d59181) to head (6f902a8).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
sei-tendermint/internal/p2p/giga_router_common.go 61.96% 35 Missing and 27 partials ⚠️
sei-tendermint/node/setup.go 38.55% 51 Missing ⚠️
sei-tendermint/internal/rpc/core/mempool.go 3.70% 26 Missing ⚠️
...ei-tendermint/internal/p2p/giga_router_fullnode.go 45.94% 19 Missing and 1 partial ⚠️
sei-tendermint/internal/p2p/giga/service.go 42.85% 14 Missing and 2 partials ⚠️
sei-tendermint/abci/types/mocks/application.go 0.00% 10 Missing ⚠️
sei-tendermint/internal/p2p/giga/consensus.go 33.33% 5 Missing and 5 partials ⚠️
sei-tendermint/internal/p2p/giga/avail.go 53.84% 1 Missing and 5 partials ⚠️
...i-tendermint/internal/p2p/giga_router_validator.go 88.88% 4 Missing and 2 partials ⚠️
sei-tendermint/rpc/client/local/local.go 20.00% 4 Missing ⚠️
... and 6 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3525      +/-   ##
==========================================
- Coverage   59.02%   58.22%   -0.80%     
==========================================
  Files        2215     2144      -71     
  Lines      182513   175148    -7365     
==========================================
- Hits       107720   101972    -5748     
+ Misses      65101    64109     -992     
+ Partials     9692     9067     -625     
Flag Coverage Δ
sei-chain-pr 62.91% <52.06%> (?)
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
evmrpc/send.go 51.04% <100.00%> (ø)
evmrpc/tx.go 83.28% <100.00%> (ø)
sei-cosmos/client/context.go 86.07% <ø> (ø)
sei-tendermint/config/config.go 77.08% <ø> (+3.56%) ⬆️
sei-tendermint/config/toml.go 64.38% <ø> (+9.38%) ⬆️
sei-tendermint/internal/p2p/giga_router.go 100.00% <ø> (+31.95%) ⬆️
sei-tendermint/internal/p2p/router.go 90.65% <100.00%> (+0.57%) ⬆️
sei-tendermint/internal/p2p/routeroptions.go 81.25% <ø> (ø)
sei-tendermint/internal/proxy/proxy.go 91.93% <100.00%> (+0.26%) ⬆️
sei-tendermint/internal/rpc/core/env.go 71.07% <100.00%> (-7.16%) ⬇️
... and 16 more

... and 313 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wen-coding wen-coding changed the title feat(autobahn): rpc-only mode forwards eth_sendRawTransaction (CON-309) Implement tx write in Autobahn rpc-only node (CON-309) May 30, 2026
Comment thread app/app.go Outdated
Comment thread app/app.go Outdated
Comment thread sei-tendermint/internal/p2p/giga_router_test.go Outdated
wen-coding and others added 2 commits May 29, 2026 18:14
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Guard autobahnRPCOnly on AutobahnConfigFile != "" so a stray
  autobahn-role without a config file doesn't pre-fire the EVM gate
  (matches node.go's gigaRPCOnly construction and the AutobahnRole
  godoc, which already says the role is ignored when the config file
  is empty).
- Drop time.Sleep + time.After from the rpc-only Run-cancel test; a
  pre-cancelled context proves the unblock path without any goroutine-
  timing synchronization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread sei-tendermint/node/setup.go
wen-coding and others added 2 commits May 29, 2026 19:57
ProcessBlock's deferred gate-fire didn't cover rpc-only nodes because
they never execute a block. Factor the gate-fire into a helper and
call it from InitChainer as well — fresh-start validators reach it via
the handshaker / runExecute InitChain call, rpc-only nodes via
GigaRouter.InitRPCOnly. Both paths converge on the same chain event.
The *Sent flags keep the second fire a no-op.

Drops the autobahnRPCOnly field on App and the consensus-mode branch in
RegisterLocalServices that bugbot flagged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bugbot caught the leftover copy-paste from buildGigaConfig — the rpc-only
variant intentionally skips the membership check, so nodeKey was never
read.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread sei-tendermint/internal/p2p/giga_router.go Outdated
wen-coding and others added 2 commits May 29, 2026 20:42
Bot caught a latent issue: InitRPCOnly's early-return (when the app
already has committed state) skipped the InitChain call, so the
InitChainer defer never fired and the EVM RPC goroutines would block
forever. Today the path is unreachable (rpc-only never commits) but
read-side scope changes that.

Wrap BaseApp.Info to also fire the gate when LastBlockHeight > 0. The
trigger is the app's own committed state, not a consensus-engine flag,
so no cross-layer branching. Pairs naturally with InitChainer's defer:
fresh start fires via InitChain, restart-with-state via Info, steady-
state via ProcessBlock.

Verified: make autobahn-integration-test passes all 6 sub-tests
including RPCOnlyForwarding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread sei-tendermint/node/setup.go
wen-coding and others added 5 commits May 29, 2026 21:04
Bugbot caught the validator-address-map construction was copy-pasted
between buildGigaConfig and buildRPCOnlyGigaConfig. Pull it into a
single loadAutobahnCommittee helper that returns the parsed file
config + the committee map; both callers compose the rest of their
config from there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The override is a sei-tendermint-specific concession (rpc-only nodes
have no ProcessBlock to fire the gate from), not a general improvement
to Info. Calling that out in the header so a future reader doesn't
wonder why we touched an ABCI method that looks innocent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review caught that the unconditional Info wrapper fires the gate
before CometBFT Handshaker's ReplayBlocks runs, binding EVM HTTP/WS
while replay is mid-flight — strictly worse staleness window than the
original ProcessBlock-defer trigger (which fires after the first
replayed block commits). Re-introduce autobahnRPCOnly as a single
bool on App, set from tmConfig (guarded on AutobahnConfigFile != ""),
and gate the Info-side fire on it. Autobahn nodes skip the Handshaker
entirely, so the gating is also what makes the override safe for the
mode it exists for.

Also addresses smaller review feedback:
- LastCommittedBlockNumber: reword to match /status's actual
  committed > 0 guard; the "LastCommitted >= Latest" framing was
  overstated and fragile.
- rpc_only_test.go: rename identical-string constants to expose the
  routing intent at call sites (validatorEVMRPCURLOnHost vs
  evmRPCURLOnContainerLocalhost — one is host-curled, the other goes
  through docker exec into the rpc-only sidecar).

Verified: make autobahn-integration-test passes all 6 sub-tests
including RPCOnlyForwarding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI fresh-cluster runs failed at TestAutobahn/RPCOnlyForwarding with
"sei_associate error: : unknown" because the V/R/S hex encoding
differed from what `seid tx evm associate-address` produces:
crypto.Sign returns sig[64] as a raw byte and hex.EncodeToString of
[]byte{0x00} produces "00", but the CLI uses big.Int.Bytes() which
strips leading zeros to "" for V=0. The chain's signature
verification rejects the encoding mismatch (CheckTx code != 0).

Match the CLI exactly: round-trip through big.Int.Bytes() for R, S,
and V. Local runs previously passed because they hit the
"balance > 0 → skip associate" branch against a long-lived cluster
where admin was already associated from prior runs.

Also silence `git describe --tags 2>/dev/null` in Makefile — the
"fatal: No names found, cannot describe anything" line was cluttering
every CI log and surfaced from a shallow clone with no tags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- app/app.go: replace racy *Sent flag pair on signalEVMRPCReady with
  sync.Once. The Info-side fire site makes the race reachable from any
  concurrent /abci_info RPC call once read-path lands; cheap to fix
  now. Also restore the InitChainer doc comment that an earlier edit
  orphaned onto the helper above it.

- sei-tendermint/internal/p2p/giga_router.go: change LastCommittedBlockNumber's
  rpc-only sentinel from -1 to 0 so /status's JSON response carries a
  non-negative integer for downstream clients. status.go's "committed >
  0" guard still silently skips it, so the invariant warning stays
  quiet. Update unit test.

- integration_test/autobahn: fold teardownRPCOnlyNode into
  teardownCluster. TestMain no longer repeats the kill in the two error
  paths + the success path; adding future sidecars goes in the same
  centralized teardown.

Verified: make autobahn-integration-test passes all 6 sub-tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread app/app.go Outdated
httpServerStartSignalSent bool
wsServerStartSignalSent bool
httpServerStartSignal chan struct{}
wsServerStartSignal chan struct{}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these really are intended to be independent, or we can have just serverStartSignal channel that is closed via evmRPCReadyOnce. Optionally, note that you can wrap evmRPCReadyOnce and serverStartSignal into one sei-tendermint/libs/utils.Once object.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to touch app.go now we are doing read side.

Comment thread sei-tendermint/config/config.go Outdated
Comment thread sei-tendermint/internal/p2p/giga_router.go Outdated
// FinalizeBlock responses are not stored on disk) without reaching into
// the unexported router.cfg.
func (r *GigaRouter) MaxGasPerBlock() int64 {
if r.cfg.RPCOnly {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pattern of returning dummy values depending on the mode is not type-safe. Neither is returning errors in case the mode is incompatible. For example, instead, you can create a wrapper type sth like GigaValidatorRouter{}, and a cast method: GigaRouter.AsValidator() Option[GigaValidatorRouter], and move all validator-only methods there. This is just an example, there are probably other safe patters available as well. The general idea is to conditionally provide a wider role-specific interface, instead of custom mismatch handling in each method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

// before RPC begins serving. See the TODO(autobahn-read-path) in
// NewGigaRouter for the loops the read side will pull back in.
<-ctx.Done()
return ctx.Err()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"return nil" would be totally fine. There is no obligation for Run to block if it has nothing to do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pompon0

pompon0 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

I might be misunderstanding the goal here, but if the point is to support just RPC forwarding endpoint, then it should not create GigaRouter (and App?) at all. If this is a stop gap measure until we support full-nodes - nodes actually processing blocks, then I'd like to understand why we cannot implement it rn (i.e. skip the stop gap).

Fresh-eyes review noted the prior framing overstated the handshaker-
replay concern: autobahn nodes skip the Handshaker entirely, so that
risk applies to non-autobahn CometBFT validators instead — which is
precisely what the autobahnRPCOnly guard scopes out. Rephrase to
center the guard's actual purpose (scoping the fire to rpc-only)
rather than the autobahn-specific replay risk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…orNextBlock

watchProgress no longer polls data.NextBlock on a 5 s tick. data.State
gets a WaitForNextBlock(ctx, n) primitive that blocks on the existing
ctrl.WaitUntil predicate, and watchProgress wraps the wait in
utils.WithTimeout(stallTimeout). A timeout (and only a timeout) returns
the "no block-sync progress" error; outer ctx cancellations propagate
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread sei-tendermint/internal/p2p/giga_router_fullnode.go Outdated
wen-coding and others added 2 commits June 18, 2026 10:22
Router.Run no longer spawns GigaRouter.Run. The GigaRouter is constructed
by createRouter (in node/setup.go) before Router; ownership of its Run()
belongs to the same scope. node.go now SpawnCriticals giga.Run alongside
the transport in OnStart, and the giga router's lifecycle is independent
of Router's.

The giga_router_test mesh test now also spawns giga.Run alongside the
router goroutine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…directly

ABCI Application gains LastBlockHeight() int64 — a fast in-memory accessor
the app already maintains. proxy.Proxy passes through; sei-chain's BaseApp
already satisfies it.

LastCommittedBlockNumber now calls r.app.LastBlockHeight() each time.
Drop the lastExecutedBlock atomic.Int64 field, the seedLastExecuted method
called from both Run paths, and the Store in executeBlock — they were a
parallel cache of state the app already owns.

Single source of truth, and the atomicity argument against
lastExecutedBlock (Commit + Store not atomic so the guarantee was already
illusory) goes away.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread docker/rpcnode/scripts/step1_configure_init.sh
wen-coding and others added 4 commits June 18, 2026 11:05
Move RunInboundConn, poolIn, and the inbound-fullnode counter from
*gigaValidatorRouter to gigaRouterCommon. Fullnodes now accept inbound
peers (always non-committee) and serve the block-sync subset
(StreamFullCommitQCs + GetBlock), capped at MaxInboundFullnodePeers
(the same operator setting used on validators today).

Service.HasConsensusState lets RunInboundConn dispatch between RunServer
(validator + committee peer) and RunBlockSyncServer (everything else).

MaxInboundFullnodePeers moves from GigaValidatorConfig to
GigaRouterCommonConfig; setup.go wires it through buildFullnodeGigaConfig.

This is variant A of the validator/fullnode mesh discussion (CON-309
review): enables tree topologies — operators can deploy relay fullnodes
that pull from committee and serve other fullnodes — without taking on
discovery or parallel-subscription work. Outbound subscriber is unchanged;
fullnodes still dial a single committee member at a time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The watchdog disconnected healthy peers when the chain was idle at tip —
no validator was producing blocks, but we treated 60 s of no NextBlock
advance as a stall. Under trusted-committee, the auditor's "peer holds
the connection open without sending" concern is operator-observable but
unactionable; the simpler thing is to not auto-disconnect.

Drop watchProgress, the stall constants, and data.State.WaitForNextBlock
that existed only for the watchdog.

Docker rpcnode: gen-autobahn-config races the validator containers
populating build/generated/node_*. Poll each dir's evmrpc_url.txt (the
last autobahn-specific file each validator step writes) before invoking
gen-autobahn-config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tubs

The Go/Lint CI step re-runs go generate and diffs; my hand-edited mock
placed the new method out of the alphabetical order mockery produces.
Regenerate via the mockery_generate.sh path so the layout matches.

Also add LastBlockHeight to sei-cosmos/server's mockApplication so
rollback_test still satisfies the Application interface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docker/rpcnode/scripts/step1_configure_init.sh: add set -e and fail
  fast with a clear error if build/generated/genesis.json is missing
  after the 5-minute poll, instead of letting cp fail silently.
- integration_test/autobahn/autobahn_test.go: replace the hardcoded
  CLUSTER_SIZE=4 in setupFullnodeNode with the count from
  countSeiContainers(), so non-four-node test runs produce a committee
  that matches the running validators.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread docker/rpcnode/scripts/step1_configure_init.sh
wen-coding and others added 2 commits June 18, 2026 12:45
Same pattern as the genesis poll: if evmrpc_url.txt is still missing in
any node_* dir after the 5-minute wait, exit with a clear error rather
than fall through to gen-autobahn-config and produce an opaque failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Validators pass utils.Some(addr.Key) — they need to know exactly who
they're talking to (consensus votes). Fullnodes pass utils.None — the
block-sync data they receive is QC-signed by the committee, so peer
identity doesn't need to be checked at the dial layer.

Restored the "Filter unwanded connections." comment above the inbound
cap check in RunInboundConn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread sei-tendermint/internal/p2p/giga_router.go Outdated
Comment thread sei-tendermint/internal/p2p/giga_router_fullnode.go
Comment thread sei-tendermint/abci/types/application.go
wen-coding and others added 7 commits June 18, 2026 13:29
…er set -e

The init script now runs under set -e; if seid init's "configuration files
already exist" error fires the script aborts, which is what just broke the
Autobahn integration job. --overwrite makes the init step idempotent, in
line with the script's intent (re-init every run).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kvstore embeds BaseApplication, which makes its inherited
LastBlockHeight return 0 even after blocks commit. Info already returns
app.state.Height; expose the same value via the fast accessor so any
consumer using LastBlockHeight() against a kvstore-backed app sees the
right height.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ck changes

- GigaRouter interface doc said "RunInboundConn errors on fullnodes",
  but variant A in 8a566a5 made fullnodes accept inbound block-sync.
  Restate: both impls serve, non-committee peers get the block-sync
  subset only.
- giga_router_fullnode_test referenced lastExecutedBlock, which was
  dropped in 4e27aab. Describe what the test actually checks
  (LastCommittedBlockNumber returns 0 before InitChain).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(*url.URL, bool) → utils.Option[*url.URL]. Same semantics — Some =
forward to this URL, None = handle locally — but matches the Option
pattern used elsewhere on the GigaRouter surface (Mempool,
gigaRouter()). Touches the LocalClient interface in sei-cosmos and the
two evmrpc call sites; each now does
`url, ok := tmClient.EvmProxy(sender).Get()`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover the test clients that satisfy LocalClient by hand:
heightTestClient, blockNotFoundTestClient (via embedding), fakeTMClient,
fixedBlockClient, parityTxCountTMClient, pendingNonceClient,
lowLatestTMClient, sendProxyClient, and the MockClient in setup_test.

setup_test imports the existing sei-chain/utils package, so the new
sei-tendermint utils is aliased there as tmutils.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…State

The "is my consensus state present?" decision lived at the call site in
gigaRouterCommon.RunInboundConn via the HasConsensusState() probe.
Move it inside Service.RunInbound(ctx, server, isCommittee): the
encapsulated dispatch picks RunServer when committee + state, else
RunBlockSyncServer. Removes HasConsensusState from the public Service
surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Instead of falling back to RunBlockSyncServer when a committee peer
dials a fullnode, return an error so the connection closes. A panic
here would be reachable from a stale autobahn.json entry — one bad
config could DoS every RPC node. Closing the connection logs the
anomaly without giving the dialer a kill switch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit aa81ec8. Configure here.

Comment thread sei-tendermint/node/setup.go
Comment thread docker/rpcnode/scripts/step1_configure_init.sh Outdated
wen-coding and others added 9 commits June 19, 2026 18:06
…oles

Variant A wired the cap into the fullnode path too (relay fullnodes
serving downstream block-sync are subject to the same limit), but the
docs in config.go and toml.go still said "validator-only knob". Restate
the docs to match — applied on both validators and fullnodes, capping
non-committee inbound block-sync from peers.

Also wrap the RunInbound error with the peer key so the operator log
identifies who dialed when a connection is refused.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
set -e was added to address the bot's "genesis wait never fails" concern,
but the explicit `if [ ! -f ]; exit 1` checks after each poll already
handle that — set -e was extra. It also caused cascade problems: aborted
seid init on duplicate config (worked around with --overwrite) and would
abort the state-sync curl probes if those raced cluster startup.

Drop set -e, keep the targeted exit-1 checks for the genesis and
evmrpc_url.txt polls, leave --overwrite (still useful for recycled
containers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document that this is an integration-test setup script, that curl probes
against the cluster RPC are intentionally allowed to fail, and that
fail-fast is opt-in via explicit checks (not a blanket set -e).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fresh-eyes review punch list:

- giga_router.go RunInboundConn doc said committee peers on fullnodes
  "also get block-sync"; since aa81ec8 the connection is refused.
  Restate.
- giga/service.go RunBlockSyncServer doc said "used by validators on
  inbound from non-committee peers"; since 2af0f50 it dispatches on
  both roles via Service.RunInbound. Broaden.
- gigaFullnodeRouter.MaxGasEstimatedPerBlock guarded a ConsensusParams
  nil branch that buildFullnodeGigaConfig already rejects via
  genesisMaxGas. Drop the dead guard.
- MaxInboundFullnodePeers validation only checked >= 0; an operator-set
  value above MaxInt32 would wrap negative on the int32 cast and
  silently reject every inbound. Cap at MaxInt32 explicitly so the
  gosec nolint matches reality.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three files now hold the production code:
- giga_router.go: GigaNodeAddr, config types, GigaRouter interface (~70 lines).
- giga_router_common.go: gigaRouterCommon struct, buildDataState, shared
  methods (LastCommittedBlockNumber, BlockBy{Number,Hash},
  translateGlobalBlock, executeBlock, runExecute, dialAndRunConn,
  RunInboundConn, default EvmProxy).
- giga_router_validator.go: gigaValidatorRouter, NewGigaValidatorRouter,
  Run, MaxGasEstimatedPerBlock, Mempool, validator EvmProxy override.
- giga_router_fullnode.go: gigaFullnodeRouter, NewGigaFullnodeRouter
  (moved in from giga_router.go for cohesion with the struct), Run,
  MaxGasEstimatedPerBlock, Mempool, runFullnodeSubscriber.

No behavioural change. Each file is <450 lines and a reader navigating
by role lands on one cohesive file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the production-side split (fdb328e): move
TestGigaRouter_FinalizeBlocks and TestGigaRouter_EvmProxy
into a dedicated giga_router_validator_test.go, leaving
shared testApp helpers + TestInitChainCommitThenFinalize
in giga_router_test.go. No behavioral changes.
After the role-based test split, this file holds shared
testApp scaffolding (used by both validator and fullnode
test files) plus a contract test for testApp itself —
no actual giga_router tests remain. Rename to reflect
that.
- Init log now reads "GigaRouter initialized (validator)" vs
  "(fullnode)" instead of an asymmetric tag, and the fullnode log
  surfaces dial_interval (used by runFullnodeSubscriber).
- Introduce maxInboundFullnodePeers = 10000 (practical machine-level
  ceiling; per-peer cost + NIC bandwidth bind well before this) and
  validate against it inside buildDataState, so both constructors
  drop the duplicated MaxInt32 cast-safety check, the math import,
  and the nolint:gosec on the int32 cast.
The 10000 bound makes the int→int32 cast safe by construction,
but gosec's G115 check is flow-insensitive and can't see across
buildDataState's validation. Switching the counter (atomic.Int64)
and cap (int64) eliminates the cast entirely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants