Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bridges - revert-back and improve congestion #6231

Open
wants to merge 87 commits into
base: master
Choose a base branch
from

Conversation

bkontur
Copy link
Contributor

@bkontur bkontur commented Oct 25, 2024

Closes: #5551
Closes: #5550

Context

Before permissionless lanes, bridges only supported hard-coded, static lanes. The congestion mechanism was based on sending Transact(report_bridge_status(is_congested)) from pallet-xcm-bridge-hub to pallet-xcm-bridge-hub-router. Depending on is_congested, we adjusted the fee factor to increase or decrease fees. This congestion mechanism relied on monitoring XCMP queues, which could cause issues like suspending the entire XCMP queue rather than just the affected bridge.

Additionally, we are progressing with deploying bridge message pallets/routing directly on AssetHub, where we don’t interact with XCMP to perform ExportXcm locally.

Description

This PR re-introduces and improves congestion for bridges:

  • Enhanced Bridge Congestion Mechanism: The bridge queue mechanism has been restructured to operate independently of XCMP, with a refined protocol for congestion detection and suspension management.

  • Bridge-Specific Channel Suspension: pallet-xcm-bridge-hub and pallet-xcm-bridge-hub-router now use BridgeId to identify specific bridges, enabling selective suspension and resumption of individual bridge channels.

  • Dynamic Congestion Detection: pallet-xcm-bridge-hub now includes callbacks for fn suspend_bridge and fn resume_bridge based on congestion status:

    • For sibling chains, the router sends xcm::Transact(report_bridge_status(bridge_id, is_congested)) using the stored callback information.
    • For local chain deployments, the router manages state directly.
  • New Stop Threshold: A stop_threshold limit in pallet-xcm-bridge-hub enables or disables ExportXcm::validate, providing a fallback mechanism when the router does not adhere to the suspend signal.

  • Flexible Message Routing: pallet-xcm-bridge-hub-router has been refactored to support message routing for both sibling chains (ExportMessage) and local deployment (ExportXcm).

These updates improve modularity, allow more granular bridge congestion handling, and support diverse deployment scenarios.

@bkontur bkontur added the T15-bridges This PR/Issue is related to bridges. label Oct 25, 2024
@bkontur bkontur self-assigned this Oct 25, 2024
@bkontur
Copy link
Contributor Author

bkontur commented Oct 25, 2024

bot fmt
/cmd prdoc --audience runtime_dev --bump patch

@bkontur bkontur force-pushed the bko-bridges-congestion branch 2 times, most recently from 659be89 to b48b8a5 Compare October 25, 2024 21:14
prdoc/pr_6231.prdoc Outdated Show resolved Hide resolved
@bkontur
Copy link
Contributor Author

bkontur commented Oct 26, 2024

bot fmt

@bkontur
Copy link
Contributor Author

bkontur commented Oct 28, 2024

bot fmt

@bkontur bkontur force-pushed the bko-bridges-congestion branch 7 times, most recently from c78e707 to 152389a Compare November 5, 2024 12:33
@bkontur
Copy link
Contributor Author

bkontur commented Nov 5, 2024

bot fmt

@bkontur bkontur force-pushed the bko-bridges-congestion branch 3 times, most recently from edd9c5c to 38f1bb3 Compare November 7, 2024 13:33
@bkontur
Copy link
Contributor Author

bkontur commented Nov 7, 2024

/cmd bench --runtime asset-hub-westend asset-hub-rococo --pallet pallet_xcm_bridge_hub_router

@bkontur
Copy link
Contributor Author

bkontur commented Nov 7, 2024

bot bench cumulus-assets --runtime=asset-hub-westend --pallet=pallet_xcm_bridge_hub_router
bot bench cumulus-assets --runtime=asset-hub-rococo --pallet=pallet_xcm_bridge_hub_router

@bkontur
Copy link
Contributor Author

bkontur commented Nov 7, 2024

bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-assets --runtime=asset-hub-westend --pallet=pallet_xcm_bridge_hub_router
bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-assets --runtime=asset-hub-rococo --pallet=pallet_xcm_bridge_hub_router

bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-bridge-hubs --runtime=bridge-hub-rococo --pallet=pallet_bridge_messages
bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-bridge-hubs --runtime=bridge-hub-westend --pallet=pallet_bridge_messages

@bkontur
Copy link
Contributor Author

bkontur commented Nov 7, 2024

bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-bridge-hubs --runtime=bridge-hub-rococo --pallet=pallet_bridge_messages
bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-bridge-hubs --runtime=bridge-hub-westend --pallet=pallet_bridge_messages
bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-bridge-hubs --subcommand=xcm --runtime=bridge-hub-rococo --pallet=pallet_xcm_benchmarks::generic
bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-bridge-hubs --subcommand=xcm --runtime=bridge-hub-westend --pallet=pallet_xcm_benchmarks::generic

@bkontur
Copy link
Contributor Author

bkontur commented Nov 8, 2024

bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-bridge-hubs --runtime=bridge-hub-rococo --pallet=pallet_bridge_messages
bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-bridge-hubs --runtime=bridge-hub-westend --pallet=pallet_bridge_messages
bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-bridge-hubs --runtime=bridge-hub-rococo --pallet=pallet_xcm_bridge_hub
bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-bridge-hubs --runtime=bridge-hub-westend --pallet=pallet_xcm_bridge_hub

bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-bridge-hubs --subcommand=xcm --runtime=bridge-hub-rococo --pallet=pallet_xcm_benchmarks::generic
bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-bridge-hubs --subcommand=xcm --runtime=bridge-hub-westend --pallet=pallet_xcm_benchmarks::generic

bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-assets --runtime=asset-hub-westend --pallet=pallet_xcm_bridge_hub_router
bot bench -v PIPELINE_SCRIPTS_REF=bko-fix cumulus-assets --runtime=asset-hub-rococo --pallet=pallet_xcm_bridge_hub_router

@bkontur bkontur added the A4-needs-backport Pull request must be backported to all maintained releases. label Nov 11, 2024
@command-bot command-bot bot deleted a comment from github-actions bot Nov 16, 2024
@command-bot command-bot bot deleted a comment from github-actions bot Nov 16, 2024
Copy link
Contributor

@serban300 serban300 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did just a first pass

{
break;
}
bridges_to_update.push((bridge_id, previous_factor, bridge_state));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not process it on the spot ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not process it on the spot ?

Well, at least for bridges_to_remove I can't/shouldn't do that according to the documentation:

/// Enumerate all elements in the map in no particular order.
///
/// If you alter the map while doing this, you'll get undefined results.

I don't know, maybe inserting the same key with different value while iter would work (I didn't try), but I assume that it is also "alter the map", so I better used the same pattern for bridges_to_update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, you're right. How about translate() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, it says that translate() iterates all elements (removes for None), so then we would not need this weight metering, which @franciscoaguirre reported here: #6231 (comment).

I think I've seen translate used only for migrations, I don't know :)
@serban300 so what do you suggest? if I want to also trigger events, should I do it inside translate function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've seen translate used only for migrations, I don't know :)

I don't know either. I never used translate. It just seemed to permit editing items on the spot.

@serban300 so what do you suggest? if I want to also trigger events, should I do it inside translate function?

I guess so. I don't know. Is there a reason not to do it ?

bridges/modules/xcm-bridge-hub-router/src/lib.rs Outdated Show resolved Hide resolved
bridges/modules/xcm-bridge-hub-router/src/lib.rs Outdated Show resolved Hide resolved
bridges/modules/xcm-bridge-hub-router/src/impls.rs Outdated Show resolved Hide resolved
bridges/modules/xcm-bridge-hub-router/src/impls.rs Outdated Show resolved Hide resolved
bridges/modules/xcm-bridge-hub-router/src/lib.rs Outdated Show resolved Hide resolved
pub fn open_bridge(
origin: OriginFor<T>,
bridge_destination_universal_location: Box<VersionedInteriorLocation>,
maybe_notify: Option<Receiver>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe_notify doesn't seem very suggestive. Maybe something like congestion_notif_receiver would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually, I re-used maybe_notify name/pattern from the pallet_xcm's QueryStatus maybe_notify: Option<(u8, u8)>, :), which does exactly the same, Receiver is basically the same as (u8, u8).

@paritytech-review-bot paritytech-review-bot bot requested a review from a team November 19, 2024 22:05
Copy link
Contributor

@serban300 serban300 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, the mechanism definitely works. It solves the congestion problem. But I don't like that it adds complexity and in some aspects we have to duplicate the XCMP congestion ideas.

Personally, I liked more the idea of adding XCMP logical channels and rely on the XCMP congestion logic. Not sure if it's still applicable or what happened to it.

@bkontur
Copy link
Contributor Author

bkontur commented Nov 20, 2024

As far as I understand, the mechanism definitely works. It solves the congestion problem. But I don't like that it adds complexity and in some aspects we have to duplicate the XCMP congestion ideas.

Well, before the permissionless lanes PR, we used this exact mechanism with report/update_bridge_status. However, it was hard-coded and specifically adjusted to support the AH<>BH lane. Additionally, we expected using HrmpXcmpSignal::Suspend/Resume here. There are concerns from SA that when the bridge queue is congested and we suspend HRMP, we inadvertently disable all other non-bridging scenarios between the sibling parachain and the parachain where the bridge messages pallets are deployed. Yes, the solution would indeed be HRMP logical channels, as you mentioned:

Personally, I liked more the idea of adding XCMP logical channels and rely on the XCMP congestion logic. Not sure if it's still applicable or what happened to it.

Yes, we discussed HRMP/XCMP logical channels, but I think this is not part of the near, short or mid-term plan. Similarly, we discussed HRMP/XCMP protocol credits, but implementing that would require reworking the HRMP/XCMP queue system, which I would also say is not on the near, short or mid-term plan.

Handling bridge congestion over XCMP is only half the story. The other important aspect is that we also want (and need) to manage bridge congestion beyond XCMP. We are moving towards deploying permissionless lanes directly on AssetHub (with just the messaging pallets). This approach would mean the following:

  • Other sibling parachains can use AssetHub as an XCM message exporter. In this case, we need to handle congestion over HRMP/XCMP using the update_bridge_status extrinsic with maybe_notify.
  • Additionally, we will have an AHP<>AHK lane deployed directly on AssetHub. When functionality like moving assets over the bridge is triggered, we won't touch any HRMP/XCMP since the messaging pallet will be directly on AssetHub. In this case, we also need to address bridge congestion.

This PR essentially:

  • Reverts report/update_bridge_status and extends it for use with permissionless lanes.
  • Adds support for handling congestion in both scenarios mentioned above: as a message exporter for sibling/relay chains and as a message exporter for the local chain.

@paritytech-review-bot paritytech-review-bot bot requested a review from a team November 20, 2024 15:34
@bkontur
Copy link
Contributor Author

bkontur commented Nov 21, 2024

/cmd fmt

Copy link

Command "fmt" has started 🚀 See logs here

Copy link

Command "fmt" has finished ✅ See logs here

@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/12049990133
Failed job name: fmt

@bkontur
Copy link
Contributor Author

bkontur commented Nov 27, 2024

/cmd fmt

Copy link

Command "fmt" has started 🚀 See logs here

Copy link

Command "fmt" has finished ✅ See logs here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A4-needs-backport Pull request must be backported to all maintained releases. T15-bridges This PR/Issue is related to bridges.
Projects
Status: In Progress
Status: Scheduled
Development

Successfully merging this pull request may close these issues.

Add LocalXcmChannelManager impls for XcmpQueue and BridgeHubs Add benchmarks for pallet-xcm-bridge-hub
4 participants