-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi: set zero bandwidth hint for channels that do not have any available slots #5355
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,229 @@ | ||
package itest | ||
|
||
import ( | ||
"context" | ||
"testing" | ||
|
||
"github.com/lightningnetwork/lnd/lnrpc" | ||
"github.com/lightningnetwork/lnd/lnrpc/invoicesrpc" | ||
"github.com/lightningnetwork/lnd/lnrpc/routerrpc" | ||
"github.com/lightningnetwork/lnd/lntest" | ||
"github.com/lightningnetwork/lnd/lntypes" | ||
"github.com/stretchr/testify/require" | ||
) | ||
|
||
// testMaxHtlcPathfind tests the case where we try to send a payment over a | ||
// channel where we have already reached the limit of the number of htlcs that | ||
// we may add to the remote party's commitment. This test asserts that we do | ||
// not attempt to use the full channel at all in our pathfinding. | ||
func testMaxHtlcPathfind(net *lntest.NetworkHarness, t *harnessTest) { | ||
ctxb := context.Background() | ||
|
||
// Setup a channel between Alice and Bob where Alice will only allow | ||
// Bob to add a maximum of 5 htlcs to her commitment. | ||
maxHtlcs := 5 | ||
|
||
ctxt, _ := context.WithTimeout(ctxb, channelOpenTimeout) | ||
chanPoint := openChannelAndAssert( | ||
ctxt, t, net, net.Alice, net.Bob, | ||
lntest.OpenChannelParams{ | ||
Amt: 1000000, | ||
PushAmt: 800000, | ||
RemoteMaxHtlcs: uint16(maxHtlcs), | ||
}, | ||
) | ||
|
||
// Wait for Alice and Bob to receive the channel edge from the | ||
// funding manager. | ||
ctxt, _ = context.WithTimeout(ctxb, defaultTimeout) | ||
err := net.Alice.WaitForNetworkChannelOpen(ctxt, chanPoint) | ||
require.NoError(t.t, err, "alice does not have open channel") | ||
|
||
ctxt, _ = context.WithTimeout(ctxb, defaultTimeout) | ||
err = net.Bob.WaitForNetworkChannelOpen(ctxt, chanPoint) | ||
require.NoError(t.t, err, "bob does not have open channel") | ||
|
||
// Alice and bob should have one channel open with each other now. | ||
assertNodeNumChannels(t, net.Alice, 1) | ||
assertNodeNumChannels(t, net.Bob, 1) | ||
|
||
// Send our maximum number of htlcs from Bob -> Alice so that we get | ||
// to a point where Alice won't accept any more htlcs on the channel. | ||
subscriptions := make([]*holdSubscription, maxHtlcs) | ||
cancelCtxs := make([]func(), maxHtlcs) | ||
|
||
for i := 0; i < maxHtlcs; i++ { | ||
subCtx, cancel := context.WithTimeout(ctxb, defaultTimeout) | ||
cancelCtxs[i] = cancel | ||
|
||
subscriptions[i] = acceptHoldInvoice( | ||
subCtx, t.t, i, net.Bob, net.Alice, | ||
) | ||
} | ||
|
||
// Cancel all of our subscriptions on exit. | ||
defer func() { | ||
for _, cancel := range cancelCtxs { | ||
cancel() | ||
} | ||
}() | ||
|
||
err = assertNumActiveHtlcs([]*lntest.HarnessNode{ | ||
net.Alice, net.Bob, | ||
}, maxHtlcs) | ||
require.NoError(t.t, err, "htlcs not active") | ||
|
||
// Now we send a payment from Alice -> Bob to sanity check that our | ||
// commitment limit is not applied in the opposite direction. | ||
subCtx, cancel := context.WithTimeout(ctxb, defaultTimeout) | ||
defer cancel() | ||
aliceBobSub := acceptHoldInvoice( | ||
subCtx, t.t, maxHtlcs, net.Alice, net.Bob, | ||
) | ||
err = assertNumActiveHtlcs([]*lntest.HarnessNode{ | ||
net.Alice, net.Bob, | ||
}, maxHtlcs+1) | ||
require.NoError(t.t, err, "htlcs not active") | ||
|
||
// Now, we're going to try to send another payment from Bob -> Alice. | ||
// We've hit our max remote htlcs, so we expect this payment to spin | ||
// out dramatically with pathfinding. | ||
ctxt, _ = context.WithTimeout(ctxb, defaultTimeout) | ||
payment, err := net.Bob.RouterClient.SendPaymentV2( | ||
ctxt, &routerrpc.SendPaymentRequest{ | ||
Amt: 1000, | ||
Dest: net.Alice.PubKey[:], | ||
TimeoutSeconds: 60, | ||
FeeLimitSat: 1000000, | ||
MaxParts: 10, | ||
Amp: true, | ||
}, | ||
) | ||
require.NoError(t.t, err, "send payment failed") | ||
|
||
update, err := payment.Recv() | ||
require.NoError(t.t, err, "no payment in flight update") | ||
require.Equal(t.t, lnrpc.Payment_IN_FLIGHT, update.Status, | ||
"payment not inflight") | ||
|
||
update, err = payment.Recv() | ||
require.NoError(t.t, err, "no payment failed update") | ||
require.Equal(t.t, lnrpc.Payment_FAILED, update.Status) | ||
require.Len(t.t, update.Htlcs, 0, "expected no htlcs dispatched") | ||
|
||
// Now that we're done, we cancel all our pending htlcs so that we | ||
// can cleanup the channel with a coop close. | ||
for _, sub := range subscriptions { | ||
ctxt, _ = context.WithTimeout(ctxb, defaultTimeout) | ||
sub.cancel(ctxt, t.t) | ||
carlaKC marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
ctxt, _ = context.WithTimeout(ctxb, defaultTimeout) | ||
aliceBobSub.cancel(ctxt, t.t) | ||
|
||
err = assertNumActiveHtlcs([]*lntest.HarnessNode{ | ||
net.Alice, net.Bob, | ||
}, 0) | ||
require.NoError(t.t, err, "expected all htlcs canceled") | ||
|
||
ctxt, _ = context.WithTimeout(ctxb, channelCloseTimeout) | ||
closeChannelAndAssert(ctxt, t, net, net.Alice, chanPoint, false) | ||
} | ||
|
||
type holdSubscription struct { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
recipient invoicesrpc.InvoicesClient | ||
hash lntypes.Hash | ||
invSubscription invoicesrpc.Invoices_SubscribeSingleInvoiceClient | ||
paymentSubscription routerrpc.Router_SendPaymentV2Client | ||
} | ||
|
||
// cancel updates a hold invoice to cancel from the recipient and consumes | ||
// updates from the payer until it has reached a final, failed state. | ||
func (h *holdSubscription) cancel(ctx context.Context, t *testing.T) { | ||
_, err := h.recipient.CancelInvoice(ctx, &invoicesrpc.CancelInvoiceMsg{ | ||
PaymentHash: h.hash[:], | ||
}) | ||
require.NoError(t, err, "invoice cancel failed") | ||
|
||
invUpdate, err := h.invSubscription.Recv() | ||
require.NoError(t, err, "cancel invoice subscribe failed") | ||
require.Equal(t, lnrpc.Invoice_CANCELED, invUpdate.State, | ||
"expected invoice canceled") | ||
|
||
// We expect one in flight update when our htlc is canceled back, and | ||
// another when we fail the payment as a whole. | ||
payUpdate, err := h.paymentSubscription.Recv() | ||
require.NoError(t, err, "cancel payment subscribe failed") | ||
require.Len(t, payUpdate.Htlcs, 1) | ||
carlaKC marked this conversation as resolved.
Show resolved
Hide resolved
|
||
require.Equal(t, lnrpc.Payment_IN_FLIGHT, payUpdate.Status) | ||
|
||
payUpdate, err = h.paymentSubscription.Recv() | ||
require.NoError(t, err, "cancel payment subscribe failed") | ||
require.Equal(t, lnrpc.Payment_FAILED, payUpdate.Status, | ||
"expected payment failed") | ||
require.Equal(t, lnrpc.PaymentFailureReason_FAILURE_REASON_INCORRECT_PAYMENT_DETAILS, | ||
payUpdate.FailureReason, "expected unknown details") | ||
} | ||
|
||
// acceptHoldInvoice adds a hold invoice to the recipient node, pays it from | ||
// the sender and asserts that we have reached the accepted state where htlcs | ||
// are locked in for the payment. | ||
func acceptHoldInvoice(ctx context.Context, t *testing.T, idx int, sender, | ||
receiver *lntest.HarnessNode) *holdSubscription { | ||
|
||
hash := [lntypes.HashSize]byte{byte(idx + 1)} | ||
|
||
invoice, err := receiver.AddHoldInvoice( | ||
ctx, &invoicesrpc.AddHoldInvoiceRequest{ | ||
ValueMsat: 10000, | ||
Hash: hash[:], | ||
}, | ||
) | ||
require.NoError(t, err, "couldn't add invoice") | ||
|
||
invStream, err := receiver.InvoicesClient.SubscribeSingleInvoice( | ||
ctx, &invoicesrpc.SubscribeSingleInvoiceRequest{ | ||
RHash: hash[:], | ||
}, | ||
) | ||
require.NoError(t, err, "could not subscribe to invoice") | ||
|
||
inv, err := invStream.Recv() | ||
require.NoError(t, err, "invoice open stream failed") | ||
require.Equal(t, lnrpc.Invoice_OPEN, inv.State, | ||
"expected open") | ||
|
||
payStream, err := sender.RouterClient.SendPaymentV2( | ||
ctx, &routerrpc.SendPaymentRequest{ | ||
PaymentRequest: invoice.PaymentRequest, | ||
TimeoutSeconds: 60, | ||
FeeLimitSat: 1000000, | ||
}, | ||
) | ||
require.NoError(t, err, "send payment failed") | ||
|
||
// Finally, assert that we progress to an accepted state. We expect | ||
// the payer to get one update for the creation of the payment, and | ||
// another when a htlc is dispatched. | ||
payment, err := payStream.Recv() | ||
require.NoError(t, err, "payment in flight stream failed") | ||
require.Equal(t, lnrpc.Payment_IN_FLIGHT, payment.Status) | ||
require.Len(t, payment.Htlcs, 0) | ||
|
||
payment, err = payStream.Recv() | ||
require.NoError(t, err, "payment in flight stream failed") | ||
require.Equal(t, lnrpc.Payment_IN_FLIGHT, payment.Status) | ||
require.Len(t, payment.Htlcs, 1) | ||
|
||
inv, err = invStream.Recv() | ||
require.NoError(t, err, "invoice accepted stream failed") | ||
require.Equal(t, lnrpc.Invoice_ACCEPTED, inv.State, | ||
"expected accepted invoice") | ||
|
||
return &holdSubscription{ | ||
recipient: receiver.InvoicesClient, | ||
hash: hash, | ||
invSubscription: invStream, | ||
paymentSubscription: payStream, | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this if we already have
CheckHtlcTransit
which callscanSendHTLC
that eventually callsBandwidth
to ensure that the channel has enough idle bandwidth to carry the HTLC? In this case, one could say that if a channel has no free HTLC slots, then it effectively has zero bandwidth.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more, I think as is this addresses the case of backing off when we're the sender, but not the case where an outgoing link is fully saturated (in terms of idle HTLCs), and a new forward come across destined to that link. With the current control flow, we'll go all the way through and call
AddHTLC
on the link with that eventually failing and returning back a temp chan failure. However if we add this check toCheckHtlcForward
, then we skip sending a message (with channel overhead, etc) to the link and can fail things back earlier at the forwarding site within the switch itself.If we like this alternative direction, then "mock validation" via the new
validateAddHtlc
can be used within the link implementation (we don't need to extend the main interface) to apply this extra step of validation before the multi-hop related checks (fee and timelocks, etc).One tradeoff with this alternative direction is that we'll allow an HTLC to proceed to the switch (it'll fail at the call to
getLocalLink
) vs it failing earlier at the router level (it'll get a nil policy as the bandwidth will show up as being zero). FWIW if it gets all the way here, then we'll emit a link failure event for the subscription RPC which might be useful for alerts to keep an eye on persistently full HTLC slots.Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of moving this check into
CheckHtlcForward
to get theLinkFailure
event, but it doesn't solve the issue of pathfinding becoming aware of the link failure and excluding the link as a possible route (which is what's making us spin out at present).This endless storm of htlcs happens in the following case:
The
queryBandwidth
check felt like the best way to communicate this unavailability toRequestRoute
? The alternative would be to update handleSendError's error processing to specifically understand this type of switch error and report it to mission control differently, since right now we're just logging it. This is more involved (previously discussed and decided against it iirc), but it does seem like the less hacky way to go.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also would just like to point out that
Bandwidth()
is only concerned with bandwidth, and doesn't take into account space on the commitment. So the commitment could be full, but not all of its bandwidth consumed, which would be misleading if we relied oncanSendHtlc
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeh this makes more sense, thanks for the breakdown.
Don't think this is the correct route either as we'd end up "penalizing" our local channels using our existing observation system when we we're actually able to have an up-to-date view of the state since it's a local channel.
The thing that made be to a double take was the fact that we're extending a fundamental interface in the switch with a method that was effectively only put in place to solve a one-off issue. However looking at this addition from another angle, it forces callers that handle a link to be aware of that fact that we actually have two constraints w.r.t being able to forward HTLCs: the set of available bandwidth, and the current flow control constraints.
With that said, how about renaming the newly added method from
MayAddOutgoingHtlc
to something likefunc PaymentSlotAvailable(amt) bool
(just a suggestion)? This reflects that the method is a best-effort attempt at claiming an available payment slots (which encapsulates things like the future dust limits, max htlc, max in flight, etc). It's best-effort as no strong guarantees are made as it's possible that a forwarded HTLC instead claims the slots, and the caller would need to be aware of that. With this interface change, we'd also start to thread through the amount as then we ensure we're able to factor in all the other flow control constraints. Actually without this, I think other infinite loops exist where adding an HTLC violates another constraint (aside from max HTLCs).Doesn't look like we have the amount available here when we go to query for bandwidth hints, but with a small tweak we could have the
bandwidthHintsMap
map to a function closure (or interface) to allow it to accept an amount (and possibly later other parmaters). Looking at this section of the codebase with fresh eyes, it doesn't look like we attempt to adhere to the link level constraints enforced by the remote party w.r.t what type of HTLCs they'll accept (like the link level min HTLC param). Taking this final step to make the bandwidth hints (from the PoV of the router) aware (via direction) of these link level constraints should address this gap and any other lingering ones.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah def, also didn't love this approach
Took a look at this approach today and it looks like a good direction to go in - there's a working branch here, although I haven't fully tested it.
One thing I ran into is that there are two cases where we use our hints:
Ended up using
*lnwire.Millisatoshi
to infer whether we want to check flow as well, could possibly be a different method in thebandwidthHints
interface.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah in that case, I'd expect that the map look up is
!ok
(we don't define a function for that edge) so we instead fall through to using the existing balance.