Skip to content

tool: add full-scan EVM logical digest#3611

Open
blindchaser wants to merge 1 commit into
mainfrom
yiren/flatkv-full-scan
Open

tool: add full-scan EVM logical digest#3611
blindchaser wants to merge 1 commit into
mainfrom
yiren/flatkv-full-scan

Conversation

@blindchaser

Copy link
Copy Markdown
Contributor

Summary

Add an evm-logical-digest seidb operation for comparing EVM state across FlatKV and memIAVL at the same height. The command normalizes both backends into FlatKV physical keys, strips height-dependent value metadata, reports per-bucket bucket_digest values, and emits one FINAL_DIGEST line for backend comparison.

  • sei-db/tools/cmd/seidb/operations/evm_logical_digest.go: Adds the evm-logical-digest command with FlatKV native scanning and memIAVL snapshot scanning. FlatKV reads use RawGlobalIterator; memIAVL reads stream snapshot kvs records sequentially so scan order does not affect correctness.
  • sei-db/tools/cmd/seidb/operations/evm_logical_digest.go: Enforces an order-independent bucket accumulator over sha256(len(key)||key||len(value)||value). The final digest combines account, code, storage, and marker-adjusted legacy bucket digests so a FlatKV-only migration-version row does not create a false mismatch.
  • sei-db/tools/cmd/seidb/operations/evm_logical_digest.go: Supports semantic memIAVL normalization by default and an opt-in translator mode through --memiavl-normalization translator. Semantic mode decodes raw EVM leaves directly; translator mode routes leaves through flatkv.ImportTranslator to validate the migration mapping.
  • sei-db/tools/cmd/seidb/operations/evm_logical_digest.go: Adds --inspect-bucket, prefix sharding, row listing, backend metadata details, and --find-hash support for isolating mismatched entries. memIAVL inspect honors the same normalization flag as the global digest, so diagnostics match the selected digest path.

Test plan

  • sei-db/tools/cmd/seidb/operations/evm_logical_digest_test.go: TestSemanticMemiavlDigestMatchesTranslatorForCoreEVMKeys verifies semantic normalization matches translator normalization for account, code, storage, and legacy buckets, including delete-equivalent zero storage and empty code rows.
  • sei-db/tools/cmd/seidb/operations/evm_logical_digest_test.go: TestSemanticMemiavlInspectMatchesTranslatorForCoreEVMKeys verifies inspect bucket results match translator output for all normalized buckets.
  • sei-db/tools/cmd/seidb/operations/evm_logical_digest_test.go: TestInspectMemiavlRejectsUnknownNormalizationBeforeOpeningSnapshot guarantees invalid --memiavl-normalization values are rejected before filesystem access.
  • Manual verification: go test ./sei-db/tools/cmd/seidb/operations.

Add an `evm-logical-digest` seidb operation for comparing EVM state across FlatKV and memIAVL at the same height. The command normalizes both backends into FlatKV physical keys, strips height-dependent value metadata, reports per-bucket `bucket_digest` values, and emits one `FINAL_DIGEST` line for backend comparison.
@cursor

cursor Bot commented Jun 18, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Read-only operator tooling under sei-db/tools with no consensus or runtime path changes; incorrect normalization logic could mislead migration debugging but does not affect live nodes.

Overview
Adds a new seidb evm-logical-digest CLI that computes a backend-independent digest of EVM logical state (account, code, storage, legacy) so FlatKV and memIAVL snapshots at the same height can be compared without false mismatches from height-stamped physical values.

Both backends are normalized to FlatKV-style physical keys and digested via an order-independent XOR-of-SHA256 over len(key)||key||len(logical)||logical. FlatKV scans use RawGlobalIterator (with WAL replay to --height); memIAVL scans stream snapshot kvs sequentially instead of walking the mmap tree. FINAL_DIGEST folds the four buckets and XORs out the FlatKV-only migration/migration-version legacy row when present.

memIAVL supports --memiavl-normalization semantic (default, independent raw-key decode + local account merge) and translator (leaves through flatkv.ImportTranslator). Inspect mode adds --inspect-bucket, prefix sharding, listing, optional metadata, and --find-hash to locate a single diverging entry.

Unit tests assert semantic digest/inspect parity with translator for core EVM keys and that invalid normalization fails before opening snapshots.

Reviewed by Cursor Bugbot for commit b7bb0fc. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 18, 2026, 9:36 PM

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 23.95543% with 546 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.00%. Comparing base (4f5889e) to head (b7bb0fc).

Files with missing lines Patch % Lines
...b/tools/cmd/seidb/operations/evm_logical_digest.go 23.95% 522 Missing and 24 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3611      +/-   ##
==========================================
- Coverage   59.02%   58.00%   -1.03%     
==========================================
  Files        2215     2142      -73     
  Lines      182521   174692    -7829     
==========================================
- Hits       107734   101329    -6405     
+ Misses      65091    64343     -748     
+ Partials     9696     9020     -676     
Flag Coverage Δ
sei-chain-pr 24.87% <23.95%> (?)
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...b/tools/cmd/seidb/operations/evm_logical_digest.go 23.95% <23.95%> (ø)

... and 74 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit b7bb0fc. Configure here.

return fmt.Errorf("open kvs %s: %w", kvsPath, err)
}
defer func() { _ = f.Close() }()
r := bufio.NewReaderSize(f, 16*1024*1024)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use unit constants for buffers

Low Severity

New bufio.NewReaderSize calls use raw 16*1024*1024 and 1024*1024 literals for buffer sizes. In sei-db, byte sizes should use sei-db/common/unit constants (for example 16 * unit.MB and unit.MB) instead of bit-shift-style numeric expressions.

Additional Locations (4)
Fix in Cursor Fix in Web

Triggered by learned rule: sei-db: use unit.MB/GB constants for byte sizes, not bit-shift literals

Reviewed by Cursor Bugbot for commit b7bb0fc. Configure here.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7bb0fc7a5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +98 to +103
func EvmLogicalDigestCmd() *cobra.Command {
cmd := &cobra.Command{
Use: "evm-logical-digest",
Short: "Backend-independent digest of EVM logical state (account/code/storage) for memiavl vs flatkv comparison",
RunE: runEvmLogicalDigest,
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Register the digest command with the root CLI

The new EvmLogicalDigestCmd() factory is never added to rootCmd.AddCommand in sei-db/tools/cmd/seidb/main.go (checked the existing command list at lines 18-33), so users cannot run the advertised seidb evm-logical-digest ... command at all. Please add this command to the root command registration; the unit tests call runEvmLogicalDigest directly, so they do not catch the CLI being unreachable.

Useful? React with 👍 / 👎.

Comment on lines +367 to +370
for ; iter.Valid(); iter.Next() {
k := iter.Key()
seen++
if err := d.consume(k, iter.Value()); err != nil {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter FlatKV rows to the EVM module before digesting

When the FlatKV backend contains non-EVM module rows (for example in later migration modes such as MigrateAllButBank/FlatKVOnly), this loop digests every RawGlobalIterator row into the legacy bucket, while the memIAVL path resolves only <snapshot>/evm via resolveMemIAVLEvmSnapshotDir. That makes the advertised EVM-only comparison report mismatches caused solely by bank/staking/etc. rows in FlatKV; skip non-evm/ physical keys except the migration marker adjustment.

Useful? React with 👍 / 👎.

bdchatham added a commit to sei-protocol/seictl that referenced this pull request Jun 20, 2026
…sk panic isolation (#211)

Adds the **full-keyspace digest gate**
([sei-chain#3611](sei-protocol/sei-chain#3611
`seidb evm-logical-digest`) as a discrete sidecar task — the per-segment
boundary seal that closes the touched-key comparator's **cold-state
blind spot** (a key migrated wrong and never touched again is invisible
to per-block Layer 2). Plus three seams the systems-engineering review
called for. No "ShadowResultProducer" abstraction — that's deferred to
the 3rd producer (YAGNI).

### What's here
- **`sidecar/s3/emit.go`** — one S3 emission helper
(`StreamGzipNDJSON`/`StreamGzipJSON`/`StreamGzipFunc`), collapsing 3
duplicated gzip-pipe paths. Twofold integrity seal: an aws-chunked
SHA-256 **wire** checksum over the compressed body (io.Pipe
streaming/backpressure preserved) + an **uncompressed-payload** SHA-256
surfaced via `EmitResult` for out-of-band verification.
`result_compare`/`result_export` refactored onto it (no behavior
change).
- **`sidecar/engine`** — `recover()` in `runTask` turns a handler panic
into a failed `TaskResult` (+ `seictl_task_panics_total`) instead of
crashing the sidecar.
- **`sidecar/tasks/evm_logical_digest.go`** — the discrete task: shells
out to `seidb` for flatkv + memiavl (`semantic` + `translator`), asserts
**both** backends' opened version `== height` (fail-closed — no
wrong-height false match), parses the `FINAL_DIGEST`/per-bucket
contract, publishes an `EndpointDigestRecord`. `axes_proved`
deliberately omits **balance** (the semantic account digest zeroes it —
that axis stays the per-block comparator's job).

### Cross-review (systems-engineer + idiomatic-reviewer) — applied
- **Symmetric memiavl version assertion** (the flatkv-only check left a
wrong-height false-match hole if seidb clamps to the nearest snapshot).
- **`recover()` inside the s3 writer goroutine** — a panic there (e.g.
`MarshalJSON` over chain data) runs on a task-spawned goroutine
*outside* the engine's handler recover; converted to a returned error so
the upload aborts (no truncated-but-valid object) and the process
survives.
- **Dropped the empty-by-construction `uncompressed_sha256`** from the
published record (a record can't carry the hash of its own bytes; the
seal is out-of-band in the log/TaskResult).
- comment-precision fixes (memory bound is the uploader part-pool, not
"gzip window"; S3 checksum is per-part-composite for multipart) + a
`version:`-line length guard.

### Notes
- `seidb` is **shelled out to** (configurable `seidbPath`), not
vendored. #3611 also needs a one-line registration fix
(`EvmLogicalDigestCmd()` isn't in `seidb`'s root `AddCommand`) — flagged
to the author.
- Trigger is the **out-of-band task API**; no controller/CRD change
(consistent with the avoided `ResultExportConfig` one-way door).
- **One-way-door surfaces for confirmation before a consumer reads
them:** the task-type string `"evm-logical-digest"`, the param field
names, and the `EndpointDigestRecord` schema.
- `GOWORK=off go build ./...` clean; `go test ./sidecar/...` green
(incl. new memiavl-version-mismatch + writer-panic regression tests);
`gofmt -s` clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant