Skip to content

Clear monitor-pending RAA once regenerated#4684

Open
wpaulino wants to merge 1 commit into
lightningdevkit:mainfrom
wpaulino:bogus-async-raa-regenerated
Open

Clear monitor-pending RAA once regenerated#4684
wpaulino wants to merge 1 commit into
lightningdevkit:mainfrom
wpaulino:bogus-async-raa-regenerated

Conversation

@wpaulino

Copy link
Copy Markdown
Contributor

The chanmon_consistency fuzz target found a reconnect ordering where signer_pending_revoke_and_ack and monitor_pending_revoke_and_ack could both describe the same owed revoke_and_ack.

The channel first received a commitment_signed whose monitor update completed, but the signer could not provide the next point or secret, leaving signer_pending_revoke_and_ack set. Later, receiving the peer revoke_and_ack freed holding-cell HTLCs and produced a held monitor update. While that monitor update was still blocked, channel_reestablish saw the peer one state behind and recorded monitor_pending_revoke_and_ack, plus the corresponding monitor-pending commitment_signed, so the messages could be replayed once monitor updating was restored.

If the signer unblocked before the held monitor update was released, signer_maybe_unblocked generated and sent the RAA using signer_pending_revoke_and_ack. The monitor-pending flag was not cleared at that point, so monitor_updating_restored later generated the same RAA again when the held update completed. The peer had already advanced after accepting the signer-unblocked RAA, so it rejected the duplicate secret as not corresponding to its current pubkey and force-closed.

Fix this by clearing monitor_pending_revoke_and_ack whenever get_last_revoke_and_ack successfully constructs an RAA, alongside signer_pending_revoke_and_ack. All resend paths regenerate RAAs through this helper, so successful generation through either pending path satisfies the other pending record. If generation fails, pending signer state is still left set and monitor-pending state remains available for monitor restoration to retry.

This failure was discovered in https://github.com/lightningdevkit/rust-lightning/actions/runs/26905971318/job/79370860747.

The `chanmon_consistency` fuzz target found a reconnect ordering where
`signer_pending_revoke_and_ack` and `monitor_pending_revoke_and_ack`
could both describe the same owed `revoke_and_ack`.

The channel first received a `commitment_signed` whose monitor update
completed, but the signer could not provide the next point or secret,
leaving `signer_pending_revoke_and_ack` set. Later, receiving the peer
`revoke_and_ack` freed holding-cell HTLCs and produced a held monitor
update. While that monitor update was still blocked,
`channel_reestablish` saw the peer one state behind and recorded
`monitor_pending_revoke_and_ack`, plus the corresponding monitor-pending
`commitment_signed`, so the messages could be replayed once monitor
updating was restored.

If the signer unblocked before the held monitor update was released,
`signer_maybe_unblocked` generated and sent the RAA using
`signer_pending_revoke_and_ack`. The monitor-pending flag was not
cleared at that point, so `monitor_updating_restored` later generated
the same RAA again when the held update completed. The peer had already
advanced after accepting the signer-unblocked RAA, so it rejected the
duplicate secret as not corresponding to its current pubkey and
force-closed.

Fix this by clearing `monitor_pending_revoke_and_ack` whenever
`get_last_revoke_and_ack` successfully constructs an RAA, alongside
`signer_pending_revoke_and_ack`. All resend paths regenerate RAAs
through this helper, so successful generation through either pending
path satisfies the other pending record. If generation fails, pending
signer state is still left set and monitor-pending state remains
available for monitor restoration to retry.
@wpaulino wpaulino added this to the 0.3 milestone Jun 11, 2026
@wpaulino wpaulino requested a review from TheBlueMatt June 11, 2026 21:38
@wpaulino wpaulino self-assigned this Jun 11, 2026
@ldk-reviews-bot

ldk-reviews-bot commented Jun 11, 2026

Copy link
Copy Markdown

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@ldk-claude-review-bot

Copy link
Copy Markdown
Collaborator

No concrete bugs found in the production code change. The fix is correct and well-targeted.

Summary

The one-line production change (lightning/src/ln/channel.rs:10346) clears monitor_pending_revoke_and_ack in the success branch of get_last_revoke_and_ack, alongside the existing signer_pending_revoke_and_ack clear. I verified:

  • get_last_revoke_and_ack is the single constructor for all resend RAAs (called from signer_maybe_unblocked, monitor_updating_restored, and channel_reestablish), so clearing both pending records on successful generation is sound.
  • Only one RAA can be owed at a time, so both flags always describe the same message — clearing both can't drop a distinct owed RAA.
  • On the failure paths the flag is left intact (signer-pending is re-set, and monitor_updating_restored already clears monitor-pending at line 10047 unconditionally), so retries remain possible.
  • If the generated RAA is dropped because the peer is disconnected (channelmanager signer_unblocked only sends when connected), channel_reestablish re-derives the owed RAA from commitment numbers on reconnect, so the cleared flag is recoverable.

Cross-cutting note (not a blocker)

The symmetric commitment_signed resend path is not given the analogous treatment: get_last_commitment_update_for_send (channel.rs:10449-10456) clears only signer_pending_commitment_update, never monitor_pending_commitment_signed. In this PR's test the signer block only affects the RAA (ReleaseCommitmentSecret), so the commitment update flows solely through monitor restoration and no duplicate arises. A scenario where the signer also blocks the commitment update could in principle produce the analogous duplicate-commitment_signed problem, but that is outside the bug this PR targets. Worth keeping in mind if a similar fuzz failure surfaces for commitment updates.

The added test exercises the fixed path; note that commitment_update = signer_commitment_update.or(monitor_commitment_update) and the absence of an assertion that monitor_commitment_update.is_none() means the test would not catch a duplicate commitment_signed if one were ever produced via both paths — consistent with the above scoping observation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants