Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch on-chain claims more aggressively per channel #3340

Merged
merged 5 commits into from
Dec 11, 2024

Conversation

wvanlint
Copy link
Contributor

When batch claiming was first added, it was only done so for claims which were not pinnable, i.e. those which can only be claimed by us.

This was the conservative choice - pinning of outputs claimed by a batch would leave the entire batch unable to confirm on-chain. However, if pinning is considered an attack that can be executed with a high probability of success, then there is no reason not to batch claims of pinnable outputs together, separate from unpinnable outputs.

Whether specific outputs are pinnable can change over time - those that are not pinnable will eventually become pinnable at the height at which our counterparty can spend them. Thus, outputs are treated as pinnable if they're within COUNTERPARTY_CLAIMABLE_WITHIN_BLOCKS_PINNABLE of that height.

Aside from outputs being pinnable or not, locktimes are also a factor for batching claims. HTLC-Timeout claims have locktimes fixed by the counterparty's signature and thus can only be aggregated with other HTLCs of the same CLTV, which we have to check for.

The complexity required here is worth it - aggregation can save users a significant amount of fees in the case of a force-closure, and directly impacts the number of UTXOs needed as a reserve for anchors.

@wvanlint
Copy link
Contributor Author

This change depends on #3297.

Copy link

codecov bot commented Sep 25, 2024

Codecov Report

Attention: Patch coverage is 98.11617% with 12 lines in your changes missing coverage. Please review.

Project coverage is 90.46%. Comparing base (726dd5c) to head (5355ab9).
Report is 29 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/chain/package.rs 97.98% 4 Missing and 1 partial ⚠️
lightning/src/chain/onchaintx.rs 88.88% 3 Missing and 1 partial ⚠️
lightning/src/ln/monitor_tests.rs 98.88% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3340      +/-   ##
==========================================
+ Coverage   89.69%   90.46%   +0.77%     
==========================================
  Files         130      130              
  Lines      107335   112294    +4959     
  Branches   107335   112294    +4959     
==========================================
+ Hits        96273   101591    +5318     
+ Misses       8660     8343     -317     
+ Partials     2402     2360      -42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TheBlueMatt
Copy link
Collaborator

Needs rebase now 🎉

@TheBlueMatt TheBlueMatt linked an issue Oct 19, 2024 that may be closed by this pull request
requests[j].merge_package(merge);
break;
if let Err(rejected) = requests[j].merge_package(merge) {
requests.insert(i, rejected);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, removing then inserting at every step is kinda annoying cause it generally requires a vec shift...I'm not entirely convinced by this commit. If we want to reduce the risk of accidental panic introductions with code changes maybe we rename merge_package to make it clearer that it assumes can_merge_with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the re-inserts were only introduced as an alternative to panicking on the Err(_) here. However, those errors should never occur as we call can_merge_with beforehand. Added a debug_assert!(false, _). I can change the re-inserts to a panic! as well to maintain the previous behavior.

The main goal was to push panic!s up in the stack, while avoiding preconditions on merge_package. Since the Result of merge_package is determined by can_merge_with, the latter can be used beforehand to optimize any calls.

I can remove the commit as well though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. Note that the fixup commit will need to go on the Make PackageTemplate::merge_package fallible commit, not at the end where it currently sits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the fixup commit into the right place.

node_txn.swap_remove(0);
// The unpinnable, revoked to_self output, and the pinnable, revoked htlc output will
// be claimed in separate transactions.
assert_eq!(node_txn.len(), 2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Care to check that the transactions spend different inputs? (similar elsewhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added additional checks to verify that they're spending different outputs here and throughout.

/// Checks if this and `other` are spending types of inputs which could have descended from the
/// same commitment transaction(s) and thus could both be spent without requiring a
/// double-spend.
fn is_possibly_from_same_tx_tree(&self, other: &PackageSolvingData) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incremental-mutants thinks that replacing this entire function with true doesn't cause any tests to fail. If its easy, we should consider a reorg test that hits this (I think that's the only way to hit this?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an additional test in package.rs around merging of packages from different transaction trees if that's sufficient.

@@ -340,6 +344,124 @@ fn sorted_vec<T: Ord>(mut v: Vec<T>) -> Vec<T> {
v
}

fn verify_claimable_balances(mut balances_1: Vec<Balance>, mut balances_2: Vec<Balance>, margin: u64) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused why we can't keep asserting the balance set matches a predefined list exactly? We should be able to calculate the exact fees paid, no?

Copy link
Contributor Author

@wvanlint wvanlint Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got confused with the varying size of the signatures and the fee calculation of spend_spendable_outputs. The weight of the transaction multiplied by the fee rate didn't line up with the actual transaction fee.

Calculated the fee of the transaction exactly now by looking up the input values.

@wvanlint wvanlint marked this pull request as ready for review November 21, 2024 22:14
@wvanlint wvanlint requested a review from TheBlueMatt November 21, 2024 22:14
@TheBlueMatt TheBlueMatt added this to the 0.1 milestone Nov 25, 2024
@dunxen dunxen self-requested a review November 25, 2024 15:42
/// When we go to force-close a channel because an HTLC is expiring, we should ensure that the
/// HTLC(s) expiring are not considered pinnable, allowing us to aggregate them with other HTLC(s)
/// expiring at the same time.
const _HTLCS_NOT_PINNABLE_ON_CLOSE: u32 = CLTV_CLAIM_BUFFER - COUNTERPARTY_CLAIMABLE_WITHIN_BLOCKS_PINNABLE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be fine to get rid of this since it's unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an assertion to verify that CLTV_CLAIM_BUFFER > COUNTERPARTY_CLAIMABLE_WITHIN_BLOCKS_PINNABLE, the same pattern is used elsewhere as well. I just found out that assert! can be used in const contexts since Rust 1.57 though, so used that for clarity instead.

lightning/src/ln/functional_tests.rs Outdated Show resolved Hide resolved
lightning/src/chain/package.rs Outdated Show resolved Hide resolved
lightning/src/chain/package.rs Outdated Show resolved Hide resolved
let package_locktime = req.package_locktime(cur_height);
if package_locktime > cur_height + 1 {
log_info!(logger, "Delaying claim of package until its timelock at {} (current height {}), the following outpoints are spent:", package_locktime, cur_height);
for outpoint in req.outpoints() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing, but it looks like PackageTemplate::outpoints might be able to return an iterator and save some allocations. Probably not for this PR though.

Copy link
Contributor Author

@wvanlint wvanlint Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick stab at it, but it did not turn out to be entirely straightforward. Is it okay to postpone to another PR?

Comment on lines 948 to 952
fn minimum_locktime(&self) -> u32 {
self.inputs.iter().filter_map(|(_, outp)| outp.minimum_locktime()).max().unwrap_or(0)
}
pub(crate) fn package_locktime(&self, current_height: u32) -> u32 {
let minimum_locktime = self.inputs.iter().filter_map(|(_, outp)| outp.minimum_locktime()).max().unwrap_or(0);
let minimum_locktime = self.minimum_locktime();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be reverted since it looks like minimum_locktime is only used in package_locktime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


locktime
if let Some(signed_locktime) = self.signed_locktime() {
debug_assert!(signed_locktime >= minimum_locktime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signed_locktime and minimum_locktime should never be set at the same time (ignoring the unwrap_or(0) above), so I think it would be clearer to assert minimum_locktime.is_none() here, something like this:

		let minimum_locktime =
			self.inputs.iter().filter_map(|(_, outp)| outp.minimum_locktime()).max();
		if let Some(signed_locktime) = self.signed_locktime() {
			debug_assert!(minimum_locktime.is_none());
			signed_locktime
		} else {
			core::cmp::max(current_height, minimum_locktime.unwrap_or(0))
		}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I was thinking about cross-channel aggregation where one claim could have a signed locktime, and another a minimum locktime. However, this is not possible right now so added in that verification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks for that explanation and the one below, makes sense

@valentinewallace
Copy link
Contributor

valentinewallace commented Dec 4, 2024

Would like to get confirmation on one case --

In test_bump_penalty_txn_on_revoked_commitment, we have 1 revoked HTLC output where the counterparty can claim it at height 41 and another where the counterparty can claim it at 81. The current height is 25.

Currently we'll aggregate these outputs, but it seems like the former output is more urgent and may warrant more aggressive fee-bumping, so aggregating them might result in paying more fees due to the overall increased transaction size? Just want to make sure this is the intended behavior.

In the next commit we'll be changing the order some transactions
get spent in packages, causing some tests to spuriously fail. Here
we update a few tests to avoid that by checking sets of inputs
rather than specific ordering.
Currently our package merging logic is strewn about between
`package.rs` (which decides various flags based on the package
type) and `onchaintx.rs` (which does the actual merging based on
the derived flags as well as its own logic), making the logic hard
to follow.

Instead, here we consolidate the package merging logic entirely
into `package.rs` with a new `PackageTemplate::can_merge_with`
method that decides if merging can happen. We also simplify the
merge pass in `update_claims_view_from_requests` to try to
maximally merge by testing each pair of `PackageTemplate`s we're
given to see if they can be merged.

This is overly complicated (and inefficient) for today's merge
logic, but over the coming commits we'll expand when we can merge
and not having to think about the merge pass' behavior makes that
much simpler (and O(N^2) for <1000 elements done only once when a
commitment transaction confirms is fine).
@wvanlint wvanlint force-pushed the claim_batching branch 2 times, most recently from b6d894e to 9153c6b Compare December 5, 2024 04:28
@wvanlint
Copy link
Contributor Author

wvanlint commented Dec 5, 2024

Would like to get confirmation on one case --

In test_bump_penalty_txn_on_revoked_commitment, we have 1 revoked HTLC output where the counterparty can claim it at height 41 and another where the counterparty can claim it at 81. The current height is 25.

Currently we'll aggregate these outputs, but it seems like the former output is more urgent and may warrant more aggressive fee-bumping, so aggregating them might result in paying more fees due to the overall increased transaction size? Just want to make sure this is the intended behavior.

I definitely think there is some trade-off to be made there that we can optimize further at the cost of some complexity, and such cases would indeed exist right now. Between always having separate transactions and always aggregating, aggregating aggressively seems to the better choice though - the total weight across all transactions will decrease and perhaps transactions will confirm quickly before additional fee bumping.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after @TheBlueMatt takes another look

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, a few nits, and some really small comments. Otherwise ACK. Feel free to squash when you address.

lightning/src/chain/package.rs Show resolved Hide resolved
requests[j].merge_package(merge);
break;
if let Err(rejected) = requests[j].merge_package(merge) {
requests.insert(i, rejected);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. Note that the fixup commit will need to go on the Make PackageTemplate::merge_package fallible commit, not at the end where it currently sits.

/// When we go to force-close a channel because an HTLC is expiring, we should ensure that the
/// HTLC(s) expiring are not considered pinnable, allowing us to aggregate them with other HTLC(s)
/// expiring at the same time.
const _: () = assert!(CLTV_CLAIM_BUFFER > COUNTERPARTY_CLAIMABLE_WITHIN_BLOCKS_PINNABLE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL you can do this in our MSRV...lol we've got some code cleanup to do...


// Check if the packages have signed locktimes. If they do, we only want to aggregate
// packages with the same, signed locktime.
if self.signed_locktime() != other.signed_locktime() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably shouldn't change it in this PR, but I guess in theory we can merge two packages where one has a signed locktime and the other has no signed locktime as long as the min locktime lines up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that should be possible in the future! Perhaps not currently without cross-channel aggregation though, as we are aggregating claims from a single commitment transaction. The claims from a single commitment transaction would consistently have a locktime or not I believe?

@@ -945,17 +945,14 @@ impl PackageTemplate {
}
signed_locktime
}
fn minimum_locktime(&self) -> u32 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this fixup needs to move up a few commits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved up.

lightning/src/ln/monitor_tests.rs Outdated Show resolved Hide resolved
// The HTLC timeout claim corresponding to the counterparty preimage claim is removed from the
// aggregated package.
handle_bump_htlc_event(&nodes[0], 1);
timeout_htlc_txn = nodes[0].tx_broadcaster.unique_txn_broadcast();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some checking of timeout_htlc_txn here? ie that the length is as expected and that it spends the outputs we want and that it is the right witness script length (ie check that its actually an HTLC timeout tx)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added additional checks here.

@@ -1341,7 +1356,8 @@ fn do_test_revoked_counterparty_commitment_balances(anchors: bool, confirm_htlc_

// Prior to channel closure, B considers the preimage HTLC as its own, and otherwise only
// lists the two on-chain timeout-able HTLCs as claimable balances.
assert_eq!(sorted_vec(vec![Balance::ClaimableOnChannelClose {
verify_claimable_balances(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this seems like unnecessary diff. I'm alright with moving to verify_claimable_balances over the previous explicit sorting (though I'm not excited about it), but we should either do it everywhere or nowhere, not in a few places, and if we do use it we should introduce it and use it in a separate commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed verify_claimable_balances.

check_spends!(as_second_htlc_claim_tx[1], revoked_local_txn[0]);
(as_second_htlc_claim_tx.remove(0), as_second_htlc_claim_tx.remove(0))
}
assert_eq!(as_second_htlc_claim_tx.len(), 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to the revoked_to_self_claim? Shouldn't we still be claiming that? Similarly later in the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unpinnable revoked to_self claims are aggregated separately from the pinnable revoked HTLC output claims.

In this specific test, it is still there, but not rebroadcasted as it was separated from the other claims.

In the test further below, it was removed as an Option<_> as it is always separated from other claims now. It's now claim_txn[0]. Added additional comments there to clarify which transaction claims what there.

There are multiple factors affecting the locktime of a package:
- HTLC transactions rely on a fixed timelock due to the counterparty's
  signature.
- HTLC timeout claims on the counterparty's commitment transaction
  require satisfying a CLTV timelock.
- The locktime can be set to the latest height to avoid fee sniping.
These factors were combined in a single method, making the separate
factors less clear.
This moves panics to a higher level, allows failures to be handled
gracefully in some cases, and supports more explicit testing without
using `#[should_panic]`.
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to squash the fixups into the relevant commits.

// DER-encoded ECDSA signatures vary in size.
// https://github.com/lightning/bolts/blob/master/03-transactions.md#expected-weight-of-htlc-timeout-and-htlc-success-transactions
assert!(
timeout_htlc_txn[0].input[0].witness.size() >= 284 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, well this is a fine way to do it too. In most of the rest of the file/codebase we just check that the last witness stack element is the expected length (for which we have various constants, eg ACCEPTED_HTLC_SCRIPT_WEIGHT and ACCEPTED_HTLC_SCRIPT_WEIGHT_ANCHORS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that does look cleaner, changed to that approach. Thanks!

When batch claiming was first added, it was only done so for claims
which were not pinnable, i.e. those which can only be claimed by us.

This was the conservative choice - pinning of outputs claimed by a batch
would leave the entire batch unable to confirm on-chain. However, if
pinning is considered an attack that can be executed with a high
probability of success, then there is no reason not to batch claims of
pinnable outputs together, separate from unpinnable outputs.

Whether specific outputs are pinnable can change over time - those that
are not pinnable will eventually become pinnable at the height at which
our counterparty can spend them. Outputs are treated as pinnable if
they're within `COUNTERPARTY_CLAIMABLE_WITHIN_BLOCKS_PINNABLE` of that
height.

Aside from outputs being pinnable or not, locktimes are also a factor
for batching claims. HTLC-timeout claims have locktimes fixed by the
counterparty's signature and thus can only be aggregated with other
HTLCs of the same CLTV, which we have to check for.

The complexity required here is worth it - aggregation can save users a
significant amount of fees in the case of a force-closure, and directly
impacts the number of UTXOs needed as a reserve for anchors.

Co-authored-by: Matt Corallo <[email protected]>
@wvanlint
Copy link
Contributor Author

Squashed the fixups.

@valentinewallace valentinewallace merged commit ddeaab6 into lightningdevkit:main Dec 11, 2024
18 of 19 checks passed
@wvanlint wvanlint deleted the claim_batching branch December 11, 2024 18:24
@arik-so
Copy link
Contributor

arik-so commented Dec 12, 2024

This seriously helped with previously broken unit tests for a PR I'm working on, thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Batching of HTLC transactions for anchor output channels
4 participants