-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't send init closing_signed
too early after final HTLC removal
#2529
Don't send init closing_signed
too early after final HTLC removal
#2529
Conversation
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2529 +/- ##
==========================================
+ Coverage 88.81% 89.16% +0.34%
==========================================
Files 113 113
Lines 89116 91552 +2436
Branches 89116 91552 +2436
==========================================
+ Hits 79152 81629 +2477
+ Misses 7722 7703 -19
+ Partials 2242 2220 -22
☔ View full report in Codecov by Sentry. |
lightning/src/ln/shutdown_tests.rs
Outdated
|
||
#[test] | ||
fn htlc_removal_no_early_closing_signed() { | ||
// Previously, if we have a pending inbound HTLC on a channel which has initiated shutdown, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test coverage should also encompass a “pending outbound HTLC” to check if final HTLC removal works well in both directions. And current test is send_payment(&nodes[0], &[&nodes[1]])
, so it’s more an outbound HTLC from the viewpoint of nodes[0]
? Nodes 1 receives the SendClosingSigned
L1277
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have other coverage of HTLCs in the other direction in shutdown_tests.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the test coverage of the other HTLC in shutdown_tests.rs
would have to be test, though note I still think you’re testing the pending outbound HTLC here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what you're asking for.
lightning/src/ln/channel.rs
Outdated
@@ -3363,6 +3379,7 @@ impl<SP: Deref> Channel<SP> where | |||
if let OutboundHTLCState::LocalAnnounced(_) = htlc.state { | |||
log_trace!(logger, " ...promoting outbound LocalAnnounced {} to Committed", &htlc.payment_hash); | |||
htlc.state = OutboundHTLCState::Committed; | |||
*expecting_peer_cs = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we also need to set this on channel reestablish if we only get to process the RAA but not the CS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in theory, but by the time we get to a reestablish (if we got the RAA before shutdown/disconnect) we have forgotten about the HTLC so there's nothing we can do (except track the HTLC longer). Even after this patch we still risk force-closing if we restart right after the RAA but before the CS, but I'm not really sure that's worth fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could affect mobile nodes given the higher frequency of restarts, but I'm fine with delaying the full fix for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think any peer should be sending those two messages back-to-back. So, yes, it could impact someone on mobile who races, but I'm still kinda meh on persisting this bool? I guess we could, but I really want to avoid persisting new things and move to deriving them from monitors universally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to squash
20bfd30
to
52836ff
Compare
Squashed with a trivial test cleanup:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
lightning/src/ln/channel.rs
Outdated
// If we're waiting on a monitor persistence, that implies we're also waiting to send some | ||
// message to our counterparty (probably a `revoke_and_ack`). In such a case, we shouldn't | ||
// initiate `closing_signed` negotiation until we're clear of all pending messages. | ||
if (self.context.channel_state & (ChannelState::MonitorUpdateInProgress as u32)) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a hidden intention for this check being added right now along with below check?
If so, we should add a small comment in the check below. Otherwise these two look like and are two perfectly fine independent checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what you're asking here - its not a "hidden intention", but rather an extension of the below check - if we think we're ready to go but are actually waiting on our own monitor update to send some message, we're not really ready to go.
1d14942
to
dfef4a7
Compare
Rebased to address use conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this somehow fell through, feel free to squash
@@ -3334,6 +3349,7 @@ impl<SP: Deref> Channel<SP> where | |||
if let &InboundHTLCRemovalReason::Fulfill(_) = reason { | |||
value_to_self_msat_diff += htlc.amount_msat as i64; | |||
} | |||
*expecting_peer_commitment_signed = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, can we use the existing mark_awaiting_response
here? This will also make sure we disconnect them if they take too long or don't send it at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this more generally? Its an indication of an HTLC hang on the remote end which could still cause issues. I'd kinda like to do that as a followup, though, since its a separate concept and a lot more stuff to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
dfef4a7
to
168899f
Compare
Needs a rebase, CI is not happy |
Rebased. |
168899f
to
0fe7423
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing blocking, LGTM
use crate::chain::transaction::OutPoint; | ||
use crate::events::{MessageSendEvent, MessageSendEventsProvider, ClosureReason}; | ||
use crate::events::{MessageSendEvent, HTLCDestination, MessageSendEventsProvider, ClosureReason}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: imports out of order
lightning/src/ln/channel.rs
Outdated
// message to our counterparty (probably a `revoke_and_ack`). In such a case, we shouldn't | ||
// initiate `closing_signed` negotiation until we're clear of all pending messages. | ||
if (self.context.channel_state & (ChannelState::MonitorUpdateInProgress as u32)) != 0 { | ||
return Ok((None, None, None)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage here, also seems logically separate from the rest of the commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, its actually unreachable, closing_negotiation_ready
(the first check we do) already checks that MonitorUpdateInProgress
is not set. The test as written does hit this case. For clarity I went ahead and removed the redundant case and moved the comment to the top.
02c1070
0fe7423
to
02c1070
Compare
If we remove an HTLC (or fee update), commit, and receive our counterparty's `revoke_and_ack`, we remove all knowledge of said HTLC (or fee update). However, the latest local commitment transaction that we can broadcast still contains the HTLC (or old fee), thus we are not eligible for initiating the `closing_signed` negotiation if we're shutting down and are generally expecting a counterparty `commitment_signed` immediately. Because we don't have any tracking of these updates in the `Channel` (only the `ChannelMonitor` is aware of the HTLC being in our latest local commitment transaction), we'd previously send a `closing_signed` too early, causing LDK<->LDK channels with an HTLC pending towards the channel initiator at the time of `shutdown` to always fail to cooperatively close. To fix this race, we add an additional unpersisted bool to `Channel` and use that to gate sending the initial `closing_signed`.
02c1070
to
70b1866
Compare
Removed the redundant check so that we don't screw up and get confused later. $ git diff-tree -U1 0fe7423d 70b18663
diff --git a/lightning/src/ln/channel.rs b/lightning/src/ln/channel.rs
index 5c341b264..f36777467 100644
--- a/lightning/src/ln/channel.rs
+++ b/lightning/src/ln/channel.rs
@@ -4400,2 +4400,6 @@ impl<SP: Deref> Channel<SP> where
{
+ // If we're waiting on a monitor persistence, that implies we're also waiting to send some
+ // message to our counterparty (probably a `revoke_and_ack`). In such a case, we shouldn't
+ // initiate `closing_signed` negotiation until we're clear of all pending messages. Note
+ // that closing_negotiation_ready checks this case (as well as a few others).
if self.context.last_sent_closing_fee.is_some() || !self.closing_negotiation_ready() {
@@ -4411,9 +4415,2 @@ impl<SP: Deref> Channel<SP> where
- // If we're waiting on a monitor persistence, that implies we're also waiting to send some
- // message to our counterparty (probably a `revoke_and_ack`). In such a case, we shouldn't
- // initiate `closing_signed` negotiation until we're clear of all pending messages.
- if (self.context.channel_state & (ChannelState::MonitorUpdateInProgress as u32)) != 0 {
- return Ok((None, None, None));
- }
-
// If we're waiting on a counterparty `commitment_signed` to clear some updates from our |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can land since it's not too different from when it was last ACK'd by @wpaulino
So land it :p |
If we remove an HTLC (or fee update), commit, and receive our counterparty's
revoke_and_ack
, we remove all knowledge of said HTLC (or fee update). However, the latest local commitment transaction that we can broadcast still contains the HTLC (or old fee), thus we are not eligible for initiating theclosing_signed
negotiation if we're shutting down and are generally expecting a counterpartycommitment_signed
immediately.Because we don't have any tracking of these updates in the
Channel
(only theChannelMonitor
is aware of the HTLC being in our latest local commitment transaction), we'd previously send aclosing_signed
too early, causing LDK<->LDK channels with an HTLC pending towards the channel initiator at the time ofshutdown
to always fail to cooperatively close.To fix this race, we add an additional unpersisted bool to
Channel
and use that to gate sending the initialclosing_signed
.