Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

availability-recovery: use IfDisconnected::TryConnect for chunks #6081

Merged
merged 7 commits into from
Oct 18, 2022

Conversation

ordian
Copy link
Member

@ordian ordian commented Sep 29, 2022

I was looking at how https://github.com/paritytech/cumulus/blob/master/client/pov-recovery works and it seems to me that it doesn't unless I'm overlooking something obvious.

Here we issue a recovery request without backing fast-path, so it will try to request chunks from all validators.

But you need to be connected over /validation/1 peerset, which has limited number of slots (up to 10) for non-validators. And these connection requests are only issued in gossip-support only for validators if IfDisconnected::ImmediateError is used. to at least 1/3 of validators for the recovery to work.

I've changed that to IfDisconnected::TryConnect (only for chunk requests).

Related paritytech/cumulus#1423.

cumulus companion: paritytech/cumulus#1711

  • fix tests
  • ensure companion works

@ordian ordian added A2-insubstantial Pull request requires no code review (e.g., a sub-repository hash update). B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Sep 29, 2022
@ordian ordian requested a review from bkchr September 29, 2022 13:50
@ordian ordian marked this pull request as draft September 29, 2022 14:18
@ordian ordian marked this pull request as ready for review September 29, 2022 15:41
@bkchr
Copy link
Member

bkchr commented Sep 29, 2022

Hmm, given the docs and your explanation it sounds sensible. So, this only works in our tests because all nodes are already connected?

We have a zombienet test, could you please make one validator being only connected to one other validator using --reserved-nodes? Then this test should fail and start working with your test.

@ordian
Copy link
Member Author

ordian commented Sep 29, 2022

We have a zombienet test, could you please make one validator being only connected to one other validator using --reserved-nodes? Then this test should fail and start working with your test.

I assume you mean this test. Interesting. Last time I talked to @eskimor about req/resp protocols, our understanding was that it needs a connections specifically over /validation/1 peerset, not just --reserved-nodes. Looks like that assumption is not correct. But even if not, we'd still need that change in polkadot to make it work for collators.

@bkchr
Copy link
Member

bkchr commented Sep 29, 2022

I mean the docs are talking about peer:

/// When sending a request, what to do on a disconnected recipient.
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
pub enum IfDisconnected {
	/// Try to connect to the peer.
	TryConnect,
	/// Just fail if the destination is not yet connected.
	ImmediateError,
}

What I meant with my changes is that currently on the zombienet test we are probably connected to all peers. So, when we do this request on the validation peerset, we actually already have a connection to the peer, but not yet on the validation peerset.

@bkchr
Copy link
Member

bkchr commented Sep 29, 2022

But even if not, we'd still need that change in polkadot to make it work for collators.

Which is fine and I'm not against the fix ;) I just want to have a regression test that this behavior not is coming back by accident.

@ordian
Copy link
Member Author

ordian commented Sep 29, 2022

But even if not, we'd still need that change in polkadot to make it work for collators.

Which is fine and I'm not against the fix ;) I just want to have a regression test that this behavior not is coming back by accident.

Yes, I'll tweak the test to see if it work with 1 peer connection.

@eskimor
Copy link
Member

eskimor commented Sep 30, 2022

No the peerset does not matter -as long as some connection exists, req/resp. works.

* master:
  Governance v2 (Kusama only) (#5205)
  Companion for substrate#12124: add BEEFY request response protocol (#6035)
  Remove orchestra and metered channel (#6086)
  Companion for pallet-mmr: generate historical proofs  (#6061)
  Fix: minor typo (#6088)
  add para_id to fetch_pov logging (#6084)
* master:
  Companion for BEEFY: Simplify hashing for pallet-beefy-mmr (#6098)
  Use active_leaves instead of known_leaves (#6068)
* master:
  Pass through `runtime-benchmark` feature (#6110)
  Properly migrate weights to v2 (#6091)
  Buffered connection management for collator-protocol (#6022)
  Add unknown words (#6105)
  Batch vote import in dispute-distribution (#5894)
  Bump lru from 0.7.8 to 0.8.0 (#6060)
  Keep sessions in window for the full unfinalized chain (#6054)
* master:
  service: use MmrRootProvider as custom BEEFY payload provider (companion for 12428) (#6112)
  Skip `unexpected metric type`
  update kvdb & co (#6111)
  Companion for #11649: Bound uses of `Call` (#5729)
* master:
  Maximum value for `MultiplierUpdate` (#6021)
@ordian ordian marked this pull request as ready for review October 16, 2022 21:53
@ordian
Copy link
Member Author

ordian commented Oct 17, 2022

bot merge

@paritytech-processbot
Copy link

Error: "Check reviews" status is not passing for paritytech/cumulus#1711

@ordian
Copy link
Member Author

ordian commented Oct 18, 2022

bot merge

@paritytech-processbot paritytech-processbot bot merged commit 4e4fb31 into master Oct 18, 2022
@paritytech-processbot paritytech-processbot bot deleted the ao-recovery-try-connect branch October 18, 2022 13:15
ordian added a commit that referenced this pull request Oct 21, 2022
* master: (21 commits)
  try and fix build (#6170)
  Companion for EPM duplicate submissions (#6115)
  Bump docker/setup-buildx-action from 2.0.0 to 2.1.0 (#6141)
  companion for #12212 (#6162)
  Bump substrate (#6164)
  BlockId removal: refactor: StorageProvider (#6160)
  availability-recovery: use `IfDisconnected::TryConnect` for chunks (#6081)
  Update clap to version 4 (#6128)
  Add `force_open_hrmp_channel` Call (#6155)
  Fix fuzzing builds xcm-fuzz and erasure-coding fuzzer (#6153)
  BlockId removal refactor: Backend::state_at (#6149)
  First round of implementers guide fixes (#6146)
  bump zombienet version (#6142)
  lingua.dic is not managed by CI team (#6148)
  pallet-mmr: RPC and Runtime APIs work with block numbers (#6072)
  Separate preparation timeouts for PVF prechecking and execution (#6139)
  Malus: add disputed block percentage (#6100)
  refactor grid topology to expose more info to subsystems (#6140)
  Manual Para Lock (#5451)
  Expose node subcommands in Malus CLI (#6135)
  ...
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A2-insubstantial Pull request requires no code review (e.g., a sub-repository hash update). B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants