Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce slowness of relayer when spawned before channels are bootstrapped #1977

Closed
wants to merge 24 commits into from

Conversation

soareschen
Copy link
Contributor

@soareschen soareschen commented Mar 17, 2022

Closes #1978.

This PR is based on #1976.

Description

This PR reproduces the relayer runnning very slowly in the test test_ternary_ibc_transfer when the relayer is spawned first before the channels are being bootstrapped. The test can be reproduced manually by running:

cargo test -p ibc-integration-test --features manual -- test_slow_ternary_ibc_transfer

The ternary IBC transfer test performs IBC transfer between 3 fully connected chains. i.e. there are IBC channels established from chains A to A, A to B, A to C, B to B, B to C, C to C. As compared to the binary transfer tests, due to the increased number of chains and channels, the relaying seems to slow down exponentially as a result.

In the original test, the relaying starts pretty much immediately after an IBC transfer is initiated. But in this modified test, relaying the first IBC transfer takes 41 seconds, and subsequently 71 seconds, 75 seconds, and 118 (!) seconds for each transfer to be relayed. This is a rather significant delay that should not be ignored.

There don't seem to be any significant slow down if the relayer supervisor is spawned first in the case of binary transfer test, which can be reproduced manually with:

cargo test -p ibc-integration-test --features manual -- test_slow_ibc_transfer

More investigation is needed to determine why the relaying slows down exponentially as more channels are added after the relayer starts, but not if the channels are added before the relayer is started.


PR author checklist:

  • Added changelog entry, using unclog.
  • Added tests: integration (for Hermes) or unit/mock tests (for modules).
  • Linked to GitHub issue.
  • Updated code comments and documentation (e.g., docs/).

Reviewer checklist:

  • Reviewed Files changed in the GitHub PR explorer.
  • Manually tested (in case integration/unit/mock tests are absent).

@soareschen soareschen marked this pull request as draft March 17, 2022 11:37
@romac
Copy link
Member

romac commented Mar 23, 2022

Here are the logs after applying the fix in #1991:

2022-03-23T08:34:40.622138Z  INFO ibc_test_framework::bootstrap::binary::channel: created new chain/client/connection/channel from ibc-2-965873d2/07-tendermint-2/connection-2/channel-2 to ibc-2-965873d2/07-tendermint-2/connection-3/channel-3

2022-03-23T08:34:40.708887Z  INFO ibc_integration_test::tests::ternary_transfer: Sending IBC transfer from chain ibc-0-a8e7a113 to chain ibc-1-b6f7c215 with amount of 5000 coin1f1f97b0
2022-03-23T08:34:40.920669Z  INFO ibc_integration_test::tests::ternary_transfer: Waiting for user on chain B to receive IBC transferred amount of 5000 ibc/5A2D383BA61059028264E96D5957CCD65A870ECFCC4B794CF088AA4921B47696
2022-03-23T08:34:43.173515Z  INFO ibc_test_framework::util::retry: task wallet reach cosmos14c4qgfc2x7980sqf2mkjmsz69exyxvwrqypj7u amount 6285325396237 coin1f1f97b0 succeed after 1 tries
2022-03-23T08:35:20.893174Z  INFO ibc_test_framework::util::retry: task wallet reach cosmos13etqdnranzs9ve75v2hee5gkxp74uyukf3690f amount 5000 ibc/5A2D383BA61059028264E96D5957CCD65A870ECFCC4B794CF088AA4921B47696 succeed after 34 tries
2022-03-23T08:35:20.893230Z  INFO ibc_integration_test::tests::ternary_transfer: successfully performed IBC transfer from chain ibc-0-a8e7a113 to chain ibc-1-b6f7c215

2022-03-23T08:35:21.071600Z  INFO ibc_integration_test::tests::ternary_transfer: Waiting for user on chain C to receive IBC transferred amount of 2500 ibc/431D4CEA93B9AF622A8795F3C5CB835A3A45F50955503750FE267BECD38E4328
2022-03-23T08:35:22.153898Z  INFO ibc_test_framework::util::retry: task wallet reach cosmos13etqdnranzs9ve75v2hee5gkxp74uyukf3690f amount 2500 ibc/5A2D383BA61059028264E96D5957CCD65A870ECFCC4B794CF088AA4921B47696 succeed after 0 tries
2022-03-23T08:36:19.440289Z  INFO ibc_test_framework::util::retry: task wallet reach cosmos1xnqv7hhgz077z6ug8weh3ufj3978t6vl2gfs44 amount 2500 ibc/431D4CEA93B9AF622A8795F3C5CB835A3A45F50955503750FE267BECD38E4328 succeed after 52 tries

2022-03-23T08:36:19.605170Z  INFO ibc_integration_test::tests::ternary_transfer: Waiting for user on chain A to receive IBC transferred amount of 800 ibc/1F1900700AF2156AC4AD75AB7D8CEE48F4426DBDA8763FF3D584AABA703676C1
2022-03-23T08:36:21.768615Z  INFO ibc_test_framework::util::retry: task wallet reach cosmos1xnqv7hhgz077z6ug8weh3ufj3978t6vl2gfs44 amount 1700 ibc/431D4CEA93B9AF622A8795F3C5CB835A3A45F50955503750FE267BECD38E4328 succeed after 1 tries
2022-03-23T08:37:20.053763Z  INFO ibc_test_framework::util::retry: task wallet reach cosmos14c4qgfc2x7980sqf2mkjmsz69exyxvwrqypj7u amount 800 ibc/1F1900700AF2156AC4AD75AB7D8CEE48F4426DBDA8763FF3D584AABA703676C1 succeed after 53 tries

2022-03-23T08:37:20.192995Z  INFO ibc_integration_test::tests::ternary_transfer: Waiting for user on chain B to receive IBC transferred amount of 500 ibc/5A2D383BA61059028264E96D5957CCD65A870ECFCC4B794CF088AA4921B47696
2022-03-23T08:37:21.278556Z  INFO ibc_test_framework::util::retry: task wallet reach cosmos1xnqv7hhgz077z6ug8weh3ufj3978t6vl2gfs44 amount 1200 ibc/431D4CEA93B9AF622A8795F3C5CB835A3A45F50955503750FE267BECD38E4328 succeed after 0 tries
2022-03-23T08:38:30.372944Z  INFO ibc_test_framework::util::retry: task wallet reach cosmos15ah2pn42kppy7sj9866wkps5h0gyez6f540j7g amount 500 ibc/5A2D383BA61059028264E96D5957CCD65A870ECFCC4B794CF088AA4921B47696 succeed after 63 tries

To no avail, the relayer still takes about 40 seconds to start.

@romac
Copy link
Member

romac commented Mar 25, 2022

Shall we keep this as a regression test now that #2007 is merged?

@romac
Copy link
Member

romac commented Mar 25, 2022

We can perhaps put an upper bound on how long the "slow" test can run and fail the test if it goes over that?

@soareschen
Copy link
Contributor Author

I'm not sure if the GitHub CI is suitable for detecting this kind of slow down. We could put a rough timeout on the workflow, but that is applied to running all tests together. Otherwise we could run it as a separate step with timeout after the main test, if that helps.

Base automatically changed from soares/supervisor-test-trait to master March 29, 2022 13:14
@soareschen
Copy link
Contributor Author

This has been fixed by #2007.

@soareschen soareschen closed this Mar 29, 2022
@adizere adizere added this to the v0.13.0 milestone Mar 29, 2022
@adizere adizere added the A: bug Admin: something isn't working label Mar 29, 2022
@adizere adizere removed this from the v0.13.0 milestone Mar 29, 2022
@adizere adizere removed the A: bug Admin: something isn't working label Mar 29, 2022
@romac romac deleted the soares/slow-ternary-transfer-test branch November 22, 2023 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Relaying of packets slows down exponentially if many channels are created after the relayer
3 participants