feat(tbtc/signer): Phase 7.1 hardened interactive signing session#4051
Merged
mswilkison merged 9 commits intoJun 13, 2026
Conversation
Implements sections 4-5 of the frozen Phase 7 spec: the interactive two-round signing path with engine-held nonce custody. Round-1 nonces are generated from OS randomness, live only in in-memory session state bound to (session_id, attempt_id), zeroize on consumption, abort, expiry, and replacement, and never appear in a request, response, or persisted state. The only durable artifact is the per-attempt consumption marker, persisted BEFORE the signature share leaves the engine; a persist failure rolls the marker back with the nonces left live, and a marker without a released share just kills the attempt - fail closed. A restart can therefore never produce a second share under one nonce pair, and the cloned-state nonce-reuse class is structurally gone. Entry points (strict-mode attempt contexts only, no legacy fallback): InteractiveSessionOpen (key package supplied once per session, validated against the member; idempotent by request fingerprint; conflicting reopen fails closed; a newer attempt implicitly aborts the prior live one), InteractiveRound1 (idempotent until consumed), InteractiveRound2 (verification precedes consumption: message binding, subset-of-included, exactly-threshold size, own membership, and the own-commitment byte-identity check (f) that defeats coordinator framing of honest members), InteractiveSessionAbort (idempotent; destroys nonces without a marker, so a never-consumed attempt may reopen with fresh nonces). Live sessions are capacity-capped (fail-closed at TBTC_SIGNER_MAX_LIVE_INTERACTIVE_SESSIONS, default 64) and TTL-swept lazily with abort semantics (TBTC_SIGNER_INTERACTIVE_SESSION_TTL_SECONDS, default 3600); both knobs ride the init-config surface. New structured error code consumed_nonce_replay; call/success counters for all four operations and latency tracking for the two cryptographic rounds. The four FFI exports (frost_tbtc_interactive_*) ship additively per the established pattern - the Go host adopts them in Phase 7.3. Tests: 10 engine tests pin the e2e round trip (one member through the session API, one through the stateless primitive, aggregating to a verified BIP-340 signature), the framing-attack rejection and verify-before-consume recovery, package-shape rejections, replay and restart-marker durability, persist-fault marker rollback, open lifecycle, abort, TTL expiry, and capacity; one lib test pins FFI dispatch for all four exports. Full suite 255 passed / 1 ignored, clippy -D warnings clean across all targets including the bench-restart-hook shape, Phase 5 chaos suite green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ject Two resource issues found reviewing the 7.1 session layer, both of my own making: 1. Round2 took the nonces but left interactive_signing = Some with the key package and message still resident, so a completed session held its live-session capacity slot AND kept key material in memory until the TTL sweep. A node doing many sequential signings could exhaust the cap of 64 with completed-but-unswept sessions and fail-close new opens. Round2 now clears interactive_signing once the attempt is terminal (success or post-marker share failure); the durable marker carries all further replay protection. 2. interactive_session_open inserted an empty SessionState via entry().or_default() BEFORE the consumed-marker / conflict / capacity reject checks, so a flood of rejected opens for distinct session_ids accumulated empty sessions against the global 1024 session cap, which could then starve DKG. Open now decides from a read-only view and only inserts once committed to creating the attempt. Tests extended: the e2e test asserts the live state is freed (marker retained) after Round2; the capacity test asserts a rejected open leaves no empty session. Full suite 255 passed / 1 ignored, clippy -D warnings clean, chaos suite green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The four frost_tbtc_interactive_* symbols were exported from Rust but absent from the hand-maintained public header, so a cgo consumer could not compile against the 7.1 ABI without hand-declaring them - blocking the Phase 7.3 Go adoption path (review finding). Adds the declarations with a header comment drawing the custody contrast against the stateless nonce block above: here secret nonces never cross the boundary in either direction. Verified: header symbols now exactly match the Rust exports, header parses as valid C. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three findings on the 7.1 interactive path, all mine:
P1 (security) - InteractiveSessionOpen bypassed the signing-policy
firewall. The coarse start_sign_round binds the signed message to a
prior policy-checked build_taproot_tx
(enforce_signing_message_binding_to_policy_checked_build_tx); the
interactive open never called it, so with the firewall enabled a
caller holding a key package could open a fresh session and sign an
arbitrary message - a direct violation of frozen spec section 5 ("Open
checks policy gates"). Open now enforces the same binding before
creating any state; a session with no policy-checked tx fails closed.
P2 (replay) - validate_attempt_context accepts the attempt_id hash
field case-insensitively (eq_ignore_ascii_case), but the marker
registry and live-state lookups keyed on the raw string. A consumed
attempt retried with different hex casing missed the marker and could
be signed again. Open now canonicalizes the whole attempt context via
canonical_attempt_context (matching the coarse path), and the round
entry points canonicalize the incoming attempt_id, so all keying is on
the lowercase canonical form.
P3 (TTL) - InteractiveSessionAbort was the only entry point that did
not sweep expired interactive state, so an abort for an unrelated
session left other sessions' expired nonces in memory past the TTL.
Abort now sweeps like every other locked entry point.
Tests: firewall reject-without-build-tx (the security assertion) +
bound-message accept/mismatch-reject; consumed-marker case
insensitivity across Round2 and reopen; abort sweeps an unrelated
session's expired state. Full suite 259 passed / 1 ignored, clippy -D
warnings clean, chaos suite green.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…eractive open The interactive path accepted an open after only the signing-policy firewall check, so on a session start_sign_round would refuse it could still emit a share - bypassing the established lifecycle/quarantine gates (review finding). InteractiveSessionOpen now enforces, before installing any interactive state, the same gates the coarse path does: - emergency_rekey_event on an existing session -> LifecyclePolicyRejected (emergency_rekey_required), - a terminally finalized session -> SessionFinalized, - an auto-quarantined member_identifier (absent a DAO allowlist override) -> QuarantinePolicyRejected, reusing enforce_not_quarantined_identifiers so the allowlist override is honored identically. The quarantine check targets this node's own member_identifier - the member it is about to produce a share for - rather than the whole included set, since under t-of-included a quarantined included member simply will not be among the responsive subset. Tests: an emergency-rekey session and a finalized session both refuse interactive open; a quarantined member is rejected and a DAO allowlist override restores signing. Full suite 261 passed / 1 ignored, clippy -D warnings clean, chaos suite green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The coarse start_sign_round checks the lifecycle/quarantine/firewall gates and produces the signature share in one atomic call. The interactive path splits Open from Round2, so a kill switch recorded in that window - emergency rekey, finalization, quarantine, or a re-bound policy-checked tx - was not seen when the share actually left the engine at Round2 (review finding: gates checked only at Open are stale at release time). The gate logic is extracted into enforce_interactive_signing_gates and called at BOTH Open and Round2, so the two sites cannot drift and the share is gated at the moment it leaves the engine. The Round2 recheck runs before consumption (verify-before-consume): a gate rejection leaves the nonces live and the attempt recoverable, so a transient kill switch does not burn the attempt. The message rechecked is the one bound at Open (held in interactive state), and the signing package's message is independently verified equal to it. Test: open + round1 + build package, record an emergency rekey, then Round2 is rejected (emergency_rekey_required) WITHOUT consuming the attempt; clearing the rekey lets the same attempt complete. Full suite 262 passed / 1 ignored, clippy -D warnings clean, chaos suite green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…eshold Two findings on the 7.1 path: P2 (liveness/DoS) - an interactive open that later expired or was aborted before Round2 left an otherwise-empty SessionState in the registry. Since open inserts new session IDs and ensure_session_insert_capacity counts every map entry, a caller could churn unique sessions until TBTC_SIGNER_MAX_SESSIONS filled, after which DKG / build_tx / new interactive sessions were rejected until restart. The TTL sweep and abort now drop a session that holds nothing durable once its live attempt is cleared, via a new SessionState::is_disposable that checks EVERY field so a session still carrying consumed markers or DKG material is never removed. P2 (verify-before-consume) - Open accepted a threshold below the key package's min_signers; Round2 would then accept a too-small signing package, persist the consumed marker, and only then have frost::round2::sign fail on the commitment count - burning the nonce for a validation error. Open now rejects threshold != key package min_signers before storing the session. Tests: open-then-abort churn under a 2-session cap stays bounded (no accumulation); the abort-sweep test now asserts the empty session is dropped, not just cleared; threshold-below-min_signers is rejected and the matching threshold opens. Full suite 264 passed / 1 ignored, clippy -D warnings clean, chaos suite green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…t the request
Two findings, the first central to the whole effort:
P2 (secret boundary) - InteractiveSessionOpenRequest carried
key_package_hex, forcing the private FROST key package through the
host/FFI request buffer on every open. That directly contradicts the
frozen spec section 4 ("key shares already env/command-only ... no
secret signing material transits the Go/Rust interface") and would
leave the sidecar unable to provide the signing-secret boundary that
is the entire point of Phase 7. Open now resolves the member's key
package from the session's own DKG state (run_dkg-populated), exactly
like the coarse start_sign_round: the session must already exist with
completed DKG, the request carries no key material, and the threshold
is validated against the DKG threshold. The key_package fields are
removed from the request; the spec section 5 is corrected accordingly.
Because interactive sessions now always ride a DKG-populated session
and never create registry entries, the empty-session-churn fixed last
round is impossible by construction, so the is_disposable disposal
logic (and its churn test) is reverted as dead code; sweep/abort just
clear the live attempt and retain the DKG session.
P2 (rollback) - a delayed InteractiveSessionOpen for an older attempt
could replace a newer live attempt and wipe its nonces. Open now
replaces a different live attempt ONLY when the incoming
attempt_number strictly advances the live one; an older-or-equal
attempt is rejected.
Tests: aggregation, framing, replay, restart, persist-fault, TTL,
capacity, lifecycle, quarantine, firewall, and the new TOCTOU recheck
all rebuilt on DKG-seeded sessions (key material from engine state);
added open-requires-DKG-session and non-participant rejection;
threshold mismatch now reports the DKG threshold. Full suite 264
passed / 1 ignored, clippy -D warnings clean, chaos green.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ncluded IDs Two findings: P1 (quarantine bypass) - the Round2 gate recheck only quarantine-checked this node's own member_identifier. The chosen signing subset (the package's participants) is known at Round2, so this node could release a share into a package that includes a quarantined co-signer, bypassing the all-signing-participants quarantine the coarse path enforces. Round2 now computes the package's u16 subset (after verify confirms it is a threshold-sized subset of the included set) and quarantine-checks ALL of it before consuming the nonce. The Open gate keeps checking only this member (the responsive subset is not chosen until Round2); the gate helper now takes the identifier set to check so the two sites stay aligned. P2 (phantom participants) - Open validated the attempt context's internal consistency but never checked that the included participants are real DKG members. A caller could pad the included set with phantom ids to bias the RFC-21 coordinator/attempt derivation, with Round2 then releasing a share under an attempt context that is not a genuine DKG subset. Open now rejects any included participant absent from the session's dkg_key_packages. Tests: a co-signer quarantined after round 1 blocks the Round2 share without consuming the attempt (clearing it lets the attempt complete); a phantom included id is rejected at Open even with a valid local member. Full suite 266 passed / 1 ignored, clippy -D warnings clean, chaos green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
e4a8734
into
extraction/frost-signer-mirror-2026-05-26
19 checks passed
mswilkison
added a commit
that referenced
this pull request
Jun 13, 2026
…ifying) (#4052) ## What Phase 7.2a of the frozen spec: coordinator-side **InteractiveAggregate**, the step that combines the responsive subset's signature shares into the final taproot-tweaked BIP-340 signature. Builds on the merged 7.1 session layer (#4051). ## Design - **Verifying material from session DKG state, never the request** — same no-secret-on-the-FFI discipline as 7.1 (the verifying shares are public, but resolving them from engine state prevents a caller substituting them). Session must exist with completed DKG. - **No policy gate here, deliberately.** Aggregation is public computation over shares already gated at each signer's Round2 (lifecycle/quarantine-of-subset/firewall). The lock is dropped before the crypto — no secret is released, so a kill switch racing the aggregation is irrelevant. Documented inline. - **Mandatory self-verification** of the aggregate against the taproot-tweaked group verifying key before release, matching coarse `finalize`. Message binding is enforced cryptographically: a share produced over one message won't verify against a package over another. - FFI export `frost_tbtc_interactive_aggregate` + C header; call/success counters + latency telemetry consistent with the other interactive ops. ## Attributable blame is deferred to 7.2b (review outcome) The first cut emitted a per-member `InvalidSignatureShare { culprits }` on a bad share. Review (Codex P1) correctly flagged that as **forgeable**: the engine can't yet bind the aggregate's public inputs (signing package, taproot root) to what each member actually signed at Round2, so a coordinator aggregating against a different package/root would make honest shares fail and frame their members. Binding "to what the members signed" needs the per-member signed-package envelopes (spec §6) — which are 7.2b. So this PR **fails closed with a generic error** on a verification failure (no signature, no culprit naming). Attributable blame returns in 7.2b together with the envelope binding that makes the attribution sound, and with the FFI structured-culprit payload (Codex P2 — the current FFI error carries only code/message/recovery_class). ## Verification - Engine tests: interactive + stateless member shares aggregate to a **verified BIP-340 signature**; an invalid co-signer share **fails closed** (generic error, no signature); FFI dispatch smoke. - Full suite **268 passed / 1 ignored**, clippy `-D warnings` clean, C header parses, Phase 5 chaos green. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Phase 7.1 of the frozen spec (#4049,
docs/phase-7-interactive-session-spec-freeze.md§§4–5): the engine session layer for the interactive two-round signing path, with engine-held nonce custody as the defining property.The custody contract (spec §4)
(session_id, attempt_id), zeroized on consumption/abort/expiry/replacement, never in any request, response, or persisted state.The API (spec §5, strict-mode only)
InteractiveSessionOpenInteractiveRound1InteractiveRound2tsize, own membership, and check (f) — own-commitment byte-identity, defeating coordinator framing of honest membersInteractiveSessionAbortLive-state bounds per the spec: fail-closed capacity cap (
TBTC_SIGNER_MAX_LIVE_INTERACTIVE_SESSIONS, default 64) + lazy TTL sweep with abort semantics (TBTC_SIGNER_INTERACTIVE_SESSION_TTL_SECONDS, default 3600); both knobs ride the init-config surface. New structured errorconsumed_nonce_replay. Telemetry: call/success counters ×4, latency for the two cryptographic rounds.The four
frost_tbtc_interactive_*FFI exports ship additively per the established pattern (Go adoption is Phase 7.3; nothing breaks until the host calls them).Verification
clippy -D warningsclean on all targets incl. the bench shape, Phase 5 chaos suite green, persisted-state schema additive (existing state files load unchanged).Scope boundary
InteractiveAggregate+ package-envelope evidence + cross-language vectors are Phase 7.2 by the frozen phasing; Go orchestration is 7.3. Out of scope per the freeze: DKG secret-package custody (named follow-up).🤖 Generated with Claude Code