Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 17 additions & 5 deletions fuzz/src/chanmon_consistency.rs
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ use std::sync::atomic;
use std::sync::{Arc, Mutex};

const MAX_FEE: u32 = 10_000;
const MAX_SETTLE_ITERATIONS: usize = 256;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex's analysis on why it's needed:

• They are draining real protocol work in a very serialized harness loop.

  process_all_events() checks work in this order and restarts the loop after the
  first thing that makes progress:

  1. manager persistence / monitor completions
  2. node 0 messages
  3. node 1 messages
  4. node 2 messages
  5. node 0 events
  6. node 1 events
  7. node 2 events
  8. quiet pass

  So one commitment round can take several loop iterations: deliver
  commitment_signed, complete monitor update, deliver revoke_and_ack, complete
  another monitor update, process PaymentClaimable, send update_fulfill_htlc, then
  another commitment_signed/revoke_and_ack round.

  For payload 1, the long 4:102 loop is the final ff. It is settling:

  - probe HTLCs left by the previous settle_all() channel-liveness check
  - a0: A starts an A-B splice
  - 41: B pays A
  - 4d: C pays A via B
  - a3: C starts a B-C splice
  - 73: A pays C via B with MPP split over the B-C channels

  In that one settle region, the normal log shows this wire work:

  52 commitment_signed
  48 revoke_and_ack
  15 update_add_htlc
  15 update_fulfill_htlc
  8  tx_complete
  5  tx_add_input
  5  tx_add_output
  4  tx_signatures
  4  stfu
  1  tx_init_rbf
  1  tx_ack_rbf
  1  splice_init
  1  splice_ack

  So it is mostly HTLC commitment handshakes. The splice/RBF messages are there
  too, but the big multiplier is: multiple channels, MPP parts, previous settle
  probe payments, and each add/fulfill needing the commitment dance.

  For payload 2, the long 0:103 loop is the first explicit ff. Before it, the
  payload has already staged both A-B and B-C splices, partially delivered some
  splice messages manually, sent a tiny A->B payment, sent two MPP A->B->C
  payments, two B->C direct payments, and started another B-C splice. Its config
  byte also makes node B’s monitors start as InProgress and makes A/C use deferred
  monitor writes.

  That settle region delivers:

  26 commitment_signed
  18 revoke_and_ack
  15 update_add_htlc
  15 update_fulfill_htlc
  16 tx_complete
  10 tx_add_input
  10 tx_add_output
  8  tx_signatures
  4  stfu
  2  tx_init_rbf
  2  tx_ack_rbf
  2  splice_ack

  The first visible HTLC burst is especially dense: A sends 7 update_add_htlcs to B
  in one commitment update, from the tiny direct A->B payment plus the two 3-part
  MPP first hops. B later sends 8 update_add_htlcs to C from the two direct B->C
  payments plus the two 3-part MPP second hops. Then C claims, fulfills go back
  C->B, then B fulfills back B->A, each with more commitment handshakes and monitor
  completions.

  The last iterations are not extra messages. They are cleanup:

  - payload 1: iteration 99 is still monitor/persistence progress, then 100 and 101
    are the two quiet passes.

  - payload 2: iteration 100 is still monitor/persistence progress, then 101 and
    102 are the two quiet passes.

  That is why the old 100 cap was wrong: it fired while the harness was still
  making legitimate forward progress, right before quiescence.

struct FuzzEstimator {
ret_val: atomic::AtomicU32,
}
Expand Down Expand Up @@ -2814,9 +2815,20 @@ impl<'a, Out: Output + MaybeSend + MaybeSync> Harness<'a, Out> {
..
} => {
let signed_tx = nodes[node_idx].wallet.sign_tx(unsigned_transaction).unwrap();
nodes[node_idx]
.funding_transaction_signed(&channel_id, &counterparty_node_id, signed_tx)
.unwrap();
match nodes[node_idx].funding_transaction_signed(
&channel_id,
&counterparty_node_id,
signed_tx,
) {
Ok(()) => {},
Err(APIError::APIMisuseError { ref err })
if err.contains("not expecting funding signatures") =>
{
// A queued signing event can be invalidated by a later `tx_abort`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stricter invariant might check if the tx_abort has indeed happened.

// before the application handles it.
},
Err(e) => panic!("{e:?}"),
}
},
events::Event::SpliceNegotiated { new_funding_txo, .. } => {
let mut txs = nodes[node_idx].broadcaster.txn_broadcasted.borrow_mut();
Expand Down Expand Up @@ -2854,9 +2866,9 @@ impl<'a, Out: Output + MaybeSend + MaybeSync> Harness<'a, Out> {
fn process_all_events(&mut self) {
let mut last_pass_no_updates = false;
for i in 0..std::usize::MAX {
if i == 100 {
if i == MAX_SETTLE_ITERATIONS {
panic!(
"It may take may iterations to settle the state, but it should not take forever"
"It may take many iterations to settle the state, but it should not take forever"
);
}
let mut made_progress = self.checkpoint_manager_persistences();
Expand Down