Clear monitor-pending RAA once regenerated#4684
Conversation
The `chanmon_consistency` fuzz target found a reconnect ordering where `signer_pending_revoke_and_ack` and `monitor_pending_revoke_and_ack` could both describe the same owed `revoke_and_ack`. The channel first received a `commitment_signed` whose monitor update completed, but the signer could not provide the next point or secret, leaving `signer_pending_revoke_and_ack` set. Later, receiving the peer `revoke_and_ack` freed holding-cell HTLCs and produced a held monitor update. While that monitor update was still blocked, `channel_reestablish` saw the peer one state behind and recorded `monitor_pending_revoke_and_ack`, plus the corresponding monitor-pending `commitment_signed`, so the messages could be replayed once monitor updating was restored. If the signer unblocked before the held monitor update was released, `signer_maybe_unblocked` generated and sent the RAA using `signer_pending_revoke_and_ack`. The monitor-pending flag was not cleared at that point, so `monitor_updating_restored` later generated the same RAA again when the held update completed. The peer had already advanced after accepting the signer-unblocked RAA, so it rejected the duplicate secret as not corresponding to its current pubkey and force-closed. Fix this by clearing `monitor_pending_revoke_and_ack` whenever `get_last_revoke_and_ack` successfully constructs an RAA, alongside `signer_pending_revoke_and_ack`. All resend paths regenerate RAAs through this helper, so successful generation through either pending path satisfies the other pending record. If generation fails, pending signer state is still left set and monitor-pending state remains available for monitor restoration to retry.
|
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
|
No concrete bugs found in the production code change. The fix is correct and well-targeted. SummaryThe one-line production change (
Cross-cutting note (not a blocker)The symmetric The added test exercises the fixed path; note that |
The
chanmon_consistencyfuzz target found a reconnect ordering wheresigner_pending_revoke_and_ackandmonitor_pending_revoke_and_ackcould both describe the same owedrevoke_and_ack.The channel first received a
commitment_signedwhose monitor update completed, but the signer could not provide the next point or secret, leavingsigner_pending_revoke_and_ackset. Later, receiving the peerrevoke_and_ackfreed holding-cell HTLCs and produced a held monitor update. While that monitor update was still blocked,channel_reestablishsaw the peer one state behind and recordedmonitor_pending_revoke_and_ack, plus the corresponding monitor-pendingcommitment_signed, so the messages could be replayed once monitor updating was restored.If the signer unblocked before the held monitor update was released,
signer_maybe_unblockedgenerated and sent the RAA usingsigner_pending_revoke_and_ack. The monitor-pending flag was not cleared at that point, somonitor_updating_restoredlater generated the same RAA again when the held update completed. The peer had already advanced after accepting the signer-unblocked RAA, so it rejected the duplicate secret as not corresponding to its current pubkey and force-closed.Fix this by clearing
monitor_pending_revoke_and_ackwheneverget_last_revoke_and_acksuccessfully constructs an RAA, alongsidesigner_pending_revoke_and_ack. All resend paths regenerate RAAs through this helper, so successful generation through either pending path satisfies the other pending record. If generation fails, pending signer state is still left set and monitor-pending state remains available for monitor restoration to retry.This failure was discovered in https://github.com/lightningdevkit/rust-lightning/actions/runs/26905971318/job/79370860747.