Implement basic equivocations detection loop #2367

serban300 · 2023-08-22T10:49:31Z

Related to #2400

First version of the equivocation detection loop.

Known limitations that will be addressed in future PRs:

the loop is just implemented, but it's not used anywhere. We need to start it together with the finality loop probably
no unit tests
limited error management: in case of an error, we log it, try to reconnect and skip the item. Maybe we should retry
no integration test

- store a Vec<FinalityProof> - transform prune `buf_limit` to Option

acatangiu

Looks good so far, left some comments

acatangiu · 2023-08-22T14:55:01Z

relays/lib-substrate-relay/src/equivocation/target.rs

+	async fn best_finalized_header_number(
+		&self,
+	) -> Result<BlockNumberOf<P::TargetChain>, Self::Error> {


nit: it's a bit inconsistent that this method doesn't allow interrogating target chain at a particular BlockNumber, while the other methods do.

It returns best finalized target chain block, which is the target chain property - it doesn't access the runtime code or storage at all. Other methods are working with the target chain runtime code/storage/state, which was active at some target block. So this method returns an anchor that is later used as at in other method calls

Yes, exactly.

acatangiu · 2023-08-22T14:56:30Z

relays/finality/src/finality_proofs.rs

+impl<P: FinalityPipeline, SC: SourceClientBase<P>> Default for FinalityProofsStream<P, SC> {
+	fn default() -> Self {
+		Self::new()
+	}
+}


nit: FinalityProofsStream seems to only contain Option<...>, I think you can just derive Default.

You're right. Done.

relays/equivocation/src/reporter.rs

relays/equivocation/src/equivocation_loop.rs

acatangiu · 2023-08-22T15:50:25Z

relays/equivocation/src/equivocation_loop.rs

+			if self.from_block_num <= self.until_block_num {
+				let mut context = match self.build_context(self.from_block_num).await {
+					Some(context) => context,
+					None => return,
+				};
+
+				self.check_block(self.from_block_num, &mut context).await;
+
+				self.from_block_num = self.from_block_num.saturating_add(1.into());
+			}


This could be a loop while self.from_block_num <= self.until_block_num {} and do full catch-up checks in a single tick.

Doesn't look like that much work to have latency concerns..

Also I would actually suggest to skip the checking full (sub)chain and just check latest (source highest/best finalized as seen by target) - only if that has different hash than block on source at same height, then you know it forked somewhere since last tick and you can iterate to find the fork. WDYT?

while self.from_block_num <= self.until_block_num {}

👍 Also @serban300 please note that from_block_num is initialized to zero, so it'll start from genesis - I think it should be initialized to best target block at the beginning of the loop?

Also I would actually suggest to skip the checking full (sub)chain and just check latest (source highest/best finalized as seen by target) - only if that has different hash than block on source at same height, then you know it forked somewhere since last tick and you can iterate to find the fork. WDYT?

I think an attacker(s) may be able to submit { submit_finality_proof(forked_header#100), submit_message(malicious_message), submit_finality_proof(good_header#101) } in a single block. So if you'll just look at the latest, you may skip this forked_header submission. So imo we should look at all headers here

I think an attacker(s) may be able to submit { submit_finality_proof(forked_header#100), submit_message(malicious_message), submit_finality_proof(good_header#101) } in a single block. So if you'll just look at the latest, you may skip this forked_header submission. So imo we should look at all headers here

but how would good_header#101 have the correct hash if it's been built on top of bad_header#100? Once relayer introduces any forked header, all subsequent child chain will be a fork, no?

We don't check the connection between headers anywhere. So it could be built on top of other "normal" header. E.g. in following schema:

99 ---> 100 ---> 101 \--> 100'

the relayer1 may track the bad fork and submit the 100' first and relayer2 may then submit the 101

We don't check the connection between headers anywhere.

oh, ok, I thought (without checking the code) we did. (I now realize we don't check the chain continuity because we don't want to import every header and we're happy with every other Nth header as long as it has a GRANDPA justification).

valid scenario then, let's keep the equivocation check for every submitted header 👍

This could be a loop while self.from_block_num <= self.until_block_num {} and do full catch-up checks in a single tick.

Sounds good. Done.

Also @serban300 please note that from_block_num is initialized to zero, so it'll start from genesis - I think it should be initialized to best target block at the beginning of the loop?

Yes, you're right. That was the intention, but somehow I forgot to implement it.

Also I would actually suggest to skip the checking full (sub)chain and just check latest (source highest/best finalized as seen by target) - only if that has different hash than block on source at same height, then you know it forked somewhere since last tick and you can iterate to find the fork. WDYT?

I think an attacker(s) may be able to submit { submit_finality_proof(forked_header#100), submit_message(malicious_message), submit_finality_proof(good_header#101) } in a single block. So if you'll just look at the latest, you may skip this forked_header submission. So imo we should look at all headers here

That's true. Also I think it's good to check for equivocations even if synced hash == source hash. There might be cases where we could still find equivocations. And it shouldn't add a big overhead.

relays/lib-substrate-relay/src/finality/target.rs

relays/utils/src/lib.rs

svyatonik

Looks good - I've left some comments. Most are for the future PRs and ideas, so maybe we should just log it somewhere (some may be ignored I suppose :))

relays/equivocation/src/equivocation_loop.rs

svyatonik · 2023-08-23T07:20:43Z

relays/equivocation/src/equivocation_loop.rs

+							continue
+						},
+						false => {
+							// Irrecoverable error. End the loop.


I know - you'll be polishing the error handling in future PRs, just wanted to leave a comment here, because it maybe be confusing a bit (and I already see that the non-connection error is called the "irrecoverable" here). So in relays we have two kind of errors. Connections errors are errors that are likely network errors or jsonrpsee internal errors. If you see a connection error, you need to reconnect the client - no other solution is possible. Other errors (non-connection errors) are assumed to be recoverable - they are caused e.g. by us submitting the obsolete transaction or e.g. server rejecting our RPC because it is overloaded.

The relay_utils::relay_loop function is supposed to handle the connection errors. So when you're starting loop using relay_utils::relay_loop, your "run_loop" function is expected to return an error when connection error is met. Then the RelayLoop itself reconnects to the failed client and restarts run_loop. Other (non-connection) errors are supposed to be handled by the run_loop function itself (e.g. by retrying with some exponential backoff or simply sleep for some time). Right now error handling here is implemented in an opposite way - you yield to RelayLoop on non-connection errors and do reconnect yourself.

This isn't the big issue - let's see the future error-handling PR, just wanted to share that ^^^. Maybe it'll allow you to remove some code here in favor of already existing code

Thanks for the explanations. Personally I thought that any error other than connection errors are irrecoverable. For example:

we can't generate the key ownership prof, because we're too many blocks ahead.

let's say that the chains were updated and the relayer has old data structures, so it can't encode/decode them
etc

I think we should take this into consideration, because otherwise we risk getting stuck in a retrial loop. But it's true that other errors could be recoverable. So I don't know what would be best here. But seems like error handling will be a complex problem for a future PR.

But for the moment changing the strategy to consider every error recoverable and just sleep a bit and skip the item that lead to the error when possible.

we risk getting stuck in a retrial loop

That's true. Normally (in other relay loops) we'll eventually break of this loop when we'll read updated state and realized that something has changed. But until then we'll keep retrying. Which is fine (imo), unless we spam RPC server with failing RPC requests without some backoff.

In your examples:

we can't generate the key ownership prof, because we're too many blocks ahead: then the loop should just log an error it and go further (imo);

let's say that the chains were updated and the relayer has old data structures, so it can't encode/decode them etc: that's definitely a maintenance flaw - the best we could do here is to have a backoff mechanism. Like your current loop implementation would just exit the process if you encounter any non-connection error. Is it good? E.g. if we can't generate ownership proof, then we could just keep working with next justifications.

But as I said - in the end it is up to you, how to handle issues :)

relays/equivocation/src/reporter.rs

svyatonik · 2023-08-23T07:44:53Z

relays/equivocation/src/reporter.rs

+				Poll::Ready(tx_status) => {
+					match tx_status {
+						TrackedTransactionStatus::Lost => {
+							log::error!(target: "bridge", "Equivocation report tx was lost");


Possible improvement for the future PRs:

let's also log some report-related info here + when submitting. At least - which block/round/validator has caused misbehavior;

apart from having justifications db (still talking about future updates) we may also write all submitted reports there to be able to resubmit them manually if automatic submit has failed

Totally agree. I was also planning this for a future PR.

Could you expand on this please ? I'm not sure what is the justifications db.

I mean that in the future I think we should keep (store) all seen justifications in some database (something I've been talking about here). This would allow ED to outlive connection issues with target node (we are not simply losing justifications on reconnect/restart) + if we have failed to submit justification (don't know the reason - just imagine it has happened - e.g. we have failed to generate key ownership proof), we could then just get some alert (based on ED logs) and submit it manually using justifications db. Please note that justifications are not stored anywhere, so if we've missed it, it is just lost. And even if we see that something bad has happened, we have no means to submit equivocation proof - it requires a justification (actually a vote).

But that's a stuff for the future.

svyatonik · 2023-08-23T07:50:43Z

relays/equivocation/src/reporter.rs

+		at: P::Hash,
+		equivocation: P::EquivocationProof,
+	) -> Result<(), SC::Error> {
+		let pending_report = source_client.report_equivocation(at, equivocation).await?;


A side note (maybe worth exploring + thinking of it later, maybe - don't): if report_equivocation transactions are signed and we'll submit such transactions from multiple ED loops, we'll lose funds proportional to number of such loops (i.e. tx_cost * count(loops)). Of course, that is not that important - because we're talking about validator misbehavior here, but maybe we need to check e.g. if validator has been already slashed (i.e. report has been submitted). Or maybe there's already this check in the report_equivocation call?

It looks like report_equivocation() doesn't refund in case of duplicate equivocation reports. And it makes sense to be like this. Cause otherwise it looks like it could be a DOS vector.

And currently I don't think there is any way to check if any report was already submitted. Currently there doesn't seem to be any runtime API for getting this information. But it's a good point. Maybe we could add a runtime API method for this.

~~afaict valid reports are free even if duplicates: https://github.com/paritytech/substrate/blob/e806cefa1ebf0edfe493dafa184f7e50d8772a68/frame/grandpa/src/lib.rs#L212 (as long as they're valid)~~

the offenses pallet will record all offense reports within a session and at the end of it apply slashing once to all offending validators with the slash amount growing exponentially with the number of offenders in same session

L.E.: actually, it seems duplicate reports are actually treated as "errors" -> invalid report -> not free
(https://github.com/paritytech/substrate/blob/db6ebf564bcdfdfaf5fd026bb321de6f7d7e6fc0/frame/offences/src/lib.rs#L121C24-L121C52)

Maybe we could read it from here. In any case - it isn't a big deal, just some possible improvement :)

afaict valid reports are free even if duplicates: https://github.com/paritytech/substrate/blob/e806cefa1ebf0edfe493dafa184f7e50d8772a68/frame/grandpa/src/lib.rs#L212 (as long as they're valid)

If so, then anyone could just spam the network for free with report_equivocation transactions. Either it is true (and then the issue in in the report_equivocation), or it internally checks for duplicates and returns error here

svyatonik · 2023-08-23T07:59:59Z

relays/equivocation/src/equivocation_loop.rs

+			if self.from_block_num <= self.until_block_num {
+				let mut context = match self.build_context(self.from_block_num).await {
+					Some(context) => context,
+					None => return,
+				};
+
+				self.check_block(self.from_block_num, &mut context).await;
+
+				self.from_block_num = self.from_block_num.saturating_add(1.into());
+			}


while self.from_block_num <= self.until_block_num {}

👍 Also @serban300 please note that from_block_num is initialized to zero, so it'll start from genesis - I think it should be initialized to best target block at the beginning of the loop?

Also I would actually suggest to skip the checking full (sub)chain and just check latest (source highest/best finalized as seen by target) - only if that has different hash than block on source at same height, then you know it forked somewhere since last tick and you can iterate to find the fork. WDYT?

I think an attacker(s) may be able to submit { submit_finality_proof(forked_header#100), submit_message(malicious_message), submit_finality_proof(good_header#101) } in a single block. So if you'll just look at the latest, you may skip this forked_header submission. So imo we should look at all headers here

serban300 · 2023-08-23T12:35:28Z

@svyatonik @acatangiu thanks for the review. I addressed the comments and also created #2377 for tracking the things that need to be done in future PRs. PTAL when you have time.

acatangiu · 2023-08-23T12:54:07Z

relays/equivocation/src/reporter.rs

+		at: P::Hash,
+		equivocation: P::EquivocationProof,
+	) -> Result<(), SC::Error> {
+		let pending_report = source_client.report_equivocation(at, equivocation).await?;


~~afaict valid reports are free even if duplicates: https://github.com/paritytech/substrate/blob/e806cefa1ebf0edfe493dafa184f7e50d8772a68/frame/grandpa/src/lib.rs#L212 (as long as they're valid)~~

the offenses pallet will record all offense reports within a session and at the end of it apply slashing once to all offending validators with the slash amount growing exponentially with the number of offenders in same session

L.E.: actually, it seems duplicate reports are actually treated as "errors" -> invalid report -> not free
(https://github.com/paritytech/substrate/blob/db6ebf564bcdfdfaf5fd026bb321de6f7d7e6fc0/frame/offences/src/lib.rs#L121C24-L121C52)

acatangiu · 2023-08-23T12:55:40Z

relays/equivocation/src/equivocation_loop.rs

+				let mut context =
+					match self.build_equivocation_reporting_context(current_block_number).await {
+						Some(context) => context,
+						None => return,


I don't think we want to give up completely if try_read_from_target() in self.build_equivocation_reporting_context() returns Err() or Ok(None);
We could just do another tick and try again from where we left off.

Suggested change

None => return,

None => break

👍 Maybe even continue instead of break?

Done. Sorry, leftover from before, when I was treating all non-connection errors as unrecoverable

With continue this becomes a busy-wait loop continuously retrying try_read_from_target() until Ok(Some(_)), with the break we'd at least get some ratelimiting from tick...

I guess both are fine if we expect try_read_from_target() to return Ok(Some(_)) eventually...

True. Sorry, I changed it to continue in the end and merged the PR in the meanwhile. But will fix it when implementing better error management. This is the next step for the equivocation loop. Will keep this in mind.

svyatonik

LGTM, modulo @acatangiu suggestion

* FinalityProofsBuf adjustments - store a Vec<FinalityProof> - transform prune `buf_limit` to Option * FinalityProof: add target_header_hash() * Target client: implement best_synced_header_hash() * Implement first version of the equivocations detection loop * Address code review comments * Leftover

* Implement basic equivocations detection loop (#2367) * FinalityProofsBuf adjustments - store a Vec<FinalityProof> - transform prune `buf_limit` to Option * FinalityProof: add target_header_hash() * Target client: implement best_synced_header_hash() * Implement first version of the equivocations detection loop * Address code review comments * Leftover * polkadot-staging adjustments

Polkadot-Forum · 2023-09-13T10:01:01Z

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/polkadot-kusama-bridge/2971/5

* Implement basic equivocations detection loop (paritytech#2367) * FinalityProofsBuf adjustments - store a Vec<FinalityProof> - transform prune `buf_limit` to Option * FinalityProof: add target_header_hash() * Target client: implement best_synced_header_hash() * Implement first version of the equivocations detection loop * Address code review comments * Leftover * polkadot-staging adjustments

* FinalityProofsBuf adjustments - store a Vec<FinalityProof> - transform prune `buf_limit` to Option * FinalityProof: add target_header_hash() * Target client: implement best_synced_header_hash() * Implement first version of the equivocations detection loop * Address code review comments * Leftover

serban300 added 4 commits August 21, 2023 21:25

FinalityProofsBuf adjustments

9db6dfb

- store a Vec<FinalityProof> - transform prune `buf_limit` to Option

FinalityProof: add target_header_hash()

d45fc94

Target client: implement best_synced_header_hash()

10fb926

Implement first version of the equivocations detection loop

1d83ef5

serban300 self-assigned this Aug 22, 2023

serban300 changed the title ~~Implement first version of the equivocations detection loop~~ Implement basic equivocations detection loop Aug 22, 2023

acatangiu reviewed Aug 22, 2023

View reviewed changes

svyatonik reviewed Aug 23, 2023

View reviewed changes

relays/lib-substrate-relay/src/finality/target.rs Show resolved Hide resolved

svyatonik reviewed Aug 23, 2023

View reviewed changes

relays/utils/src/lib.rs Show resolved Hide resolved

svyatonik reviewed Aug 23, 2023

View reviewed changes

Address code review comments

13603d9

serban300 mentioned this pull request Aug 25, 2023

Make the equivocations detection loop production ready #2377

Closed

2 tasks

acatangiu reviewed Aug 23, 2023

View reviewed changes

svyatonik approved these changes Aug 23, 2023

View reviewed changes

Leftover

22d0f6c

serban300 merged commit b995ac0 into paritytech:master Aug 23, 2023

serban300 mentioned this pull request Aug 23, 2023

Backport: Implement basic equivocations detection loop #2375

Merged

serban300 deleted the master-equivocation-3 branch September 5, 2023 11:06

acatangiu mentioned this pull request Jul 18, 2024

Relayers to implement bridge BEEFY fork equivocation detection Snowfork/snowbridge#1260

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement basic equivocations detection loop #2367

Implement basic equivocations detection loop #2367

serban300 commented Aug 22, 2023

acatangiu left a comment

acatangiu Aug 22, 2023

svyatonik Aug 23, 2023

serban300 Aug 23, 2023

acatangiu Aug 22, 2023

serban300 Aug 23, 2023

acatangiu Aug 22, 2023

svyatonik Aug 23, 2023

acatangiu Aug 23, 2023

svyatonik Aug 23, 2023

acatangiu Aug 23, 2023 •

edited

Loading

serban300 Aug 23, 2023

svyatonik left a comment

svyatonik Aug 23, 2023

serban300 Aug 23, 2023

svyatonik Aug 23, 2023

svyatonik Aug 23, 2023

serban300 Aug 23, 2023

svyatonik Aug 23, 2023

svyatonik Aug 23, 2023

serban300 Aug 23, 2023

acatangiu Aug 23, 2023 •

edited

Loading

svyatonik Aug 23, 2023

svyatonik Aug 23, 2023

svyatonik Aug 23, 2023

serban300 commented Aug 23, 2023

acatangiu Aug 23, 2023 •

edited

Loading

acatangiu Aug 23, 2023

svyatonik Aug 23, 2023

serban300 Aug 23, 2023

acatangiu Aug 23, 2023

serban300 Aug 23, 2023 •

edited

Loading

svyatonik left a comment

Polkadot-Forum commented Sep 13, 2023

Implement basic equivocations detection loop #2367

Implement basic equivocations detection loop #2367

Conversation

serban300 commented Aug 22, 2023

acatangiu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acatangiu Aug 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

svyatonik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acatangiu Aug 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serban300 commented Aug 23, 2023

acatangiu Aug 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serban300 Aug 23, 2023 • edited Loading

Choose a reason for hiding this comment

svyatonik left a comment

Choose a reason for hiding this comment

Polkadot-Forum commented Sep 13, 2023

acatangiu Aug 23, 2023 •

edited

Loading

acatangiu Aug 23, 2023 •

edited

Loading

acatangiu Aug 23, 2023 •

edited

Loading

serban300 Aug 23, 2023 •

edited

Loading