Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit network actions per batch #3415

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jvff
Copy link
Contributor

@jvff jvff commented Feb 26, 2025

Motivation

There's a risk of O(n^2) behavior in the chain worker, because when network actions are created (for cross-chain messages) all outboxes are inspected in order to figure out which actions should be created. However, the outboxes are only emptied after a confirmation of the receipt from the receiver. This means that until that confirmation is received, the same network actions might be created again and again, stressing the cross-chain channels.

Proposal

Limit the number of outboxes inspected to create the network actions. The limit is configured on server shards to be equal to the cross-shard communication channel queue size. If the limit is reached, random outboxes are skipped when creating a new batch of network actions.

Test Plan

CI should catch any regressions.
TODO: should we add a stress test for this scenario?

Release Plan

  • These changes should be backported to the latest devnet branch, then
    • be released in a validator hotfix.
  • These changes should be backported to the latest testnet branch, then
    • be released in a validator hotfix.

Because this could be related to a performance issue the testnet is running into.

Links

jvff added 6 commits February 26, 2025 01:44
Allow it to be read from other crates.
Prepare to limit the number of network actions created in a single
batch.
Add a builder method to set the configuration option.
Ensure that the server has a network actions batch limit set.
Truncate the number of network actions added to a batch.
Avoid retrying to send network actions to the same recipients endlessly.
@jvff jvff added the enhancement New feature or request label Feb 26, 2025
@jvff jvff added this to the Testnet #2 milestone Feb 26, 2025
@jvff jvff requested review from ma2bd and afck February 26, 2025 02:59
@jvff jvff self-assigned this Feb 26, 2025
@jvff jvff requested review from christos-h, deuszx and ndr-ds February 26, 2025 03:02
targets.truncate(limit);
}
}

if let Some(tracked_chains) = self.tracked_chains.as_ref() {
let publishers = self
.chain
Copy link
Contributor

@ma2bd ma2bd Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I was only worried about returning too many NetworkActions, so I'm a little bit on fence regarding the randomization idea. Let's see what others think...

if let Some(limit) = self.config.network_actions_batch_limit {
if limit > targets.len() / 2 {
let elements_to_discard = targets.len().saturating_sub(limit);
targets.partial_shuffle(&mut thread_rng(), elements_to_discard);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use choose_multiple? That should be faster, not require the case distinction, and the order of the selected elements doesn't really matter that much. I guess the downside is that it clones the entries, but they are small, I think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants