Node batch verification #94

bxue-l2 · 2023-11-30T18:01:46Z

Why are these changes needed?

This PR adds Batch Chunks verification to reduce the processing time for a node when receiving a batch

The technical details is well documented in https://ethresear.ch/t/a-universal-verification-equation-for-data-availability-sampling/13240#step-by-step-derivation-of-the-universal-equation-11 (A eth research blog)

Important struct:

Sample: the basic unit for batch verification, for eigenDA it is a chunk
SubBatch: groups of Samples that has the same dimension

Big changes:

Add interfaces to both encoding.go and validator.go for batchVerification
Replace the previous blob-by-blob verification from the node/node.go with the new batch verification
Parallel processing for each SubBatch

Some suggestions to the following will be helpful:

Since batchVerification is defined inside core/validator, the parallelization is done inside core (some feedback appreciated)
Since batchVerification is defined inside core/validator, I removes the "n.Config.NumBatchValidators" which allows clients to specify how many worker thread for verification. Instead I simply spin a goroutine for each subBatch, otherwise I will have to put the numWorker in the core interface (might be it is fine?)
Although I can use workerpool for parallelization, but it is unnatural to add that dependency in the core

Checks

I've made sure the lint is passing in this PR.
I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
Testing Strategy
- Unit tests
- Integration tests
- This PR is not tested :(

…eigenda into node-batch-verification

ian-shim

can you take a look at the unit tests failure?
also, do the existing unit tests cover the new verification logic well?

ian-shim · 2023-12-04T18:49:01Z

node/node.go

-		out <- err
-		return
-	}
+//func (n *Node) validateBlob(ctx context.Context, blob *core.BlobMessage, operatorState *core.OperatorState, out chan error) {


bxue-l2 · 2023-12-05T00:42:25Z

can you take a look at the unit tests failure? also, do the existing unit tests cover the new verification logic well?

The unit tests are added in the library level, pkg/encoding/kzgEncoder, and at the core level core/test/core_test

In the core test, the batch verification is done across multiple setup (security params, bloblength, quant factor).

bxue-l2 · 2023-12-06T22:36:10Z

Added some benchmark test. In general, we get 4.5 - 25 times speedup.

the variable parameters are quant factor, num opr, adv/quo ratio, blob length, and stake distribution.

The stake distribution is either uniform or quadratic (stake= i^2*6+32), where i is opr index. The table is shown below, whose full access is in the link.

For non-uniform stake, there is a histogram of time, please refer to the doc for more info.

One additional thing we can optimize for, is batch verification for length proof . Currently it is taking 0.63*n millisecond out of the entire verification. Effectively it is taking 2/3 of the batch verification time. So in principle we can get an additional 3x speedup.

ian-shim

lgtm, but this should get a stamp from @mooselumph

core/data.go

core/encoding.go

jianoaix · 2023-12-08T21:53:50Z

core/validator.go

+	for params, subBatch := range subBatchMap {
+		params := params
+		subBatch := subBatch
+		go v.universalVerifyWorker(params, subBatch, out)


I agree that it's better to cap the threading with a pool.
I don't have strong objection to use a threadpool here. But if that's a concern, an alternative is to make the core lib to create the SubBatch map, and the Node to run a threadpool to invoke the execution for each SubBatch.

for the second approach, we need to export the encoder interface from the validator. That will need discussion and a proper code restructure. I will choose option 1. @mooselumph what do you think

I don't think the refactoring would be needed... just give the validator a public method to be called by the node worker. But I don't really have a strong opinion about whether the threadpool should go here or in the node.

I add the workerPool to the call, common exposes an interface

core/encoding.go

pkg/encoding/kzgEncoder/multiframe.go

jianoaix · 2023-12-08T23:31:32Z

pkg/encoding/kzgEncoder/multiframe.go

+
+	var randomFr bls.Fr
+
+	err := bls.HashToSingleField(&randomFr, buffer.Bytes())


Does it need to add a random salt to actually make it random (now it's a deterministic hash)?

It is gist of fiat-Shamir transform https://en.wikipedia.org/wiki/Fiat%E2%80%93Shamir_heuristic

Does this need to be deterministic? The gob encoder seems to be stateful (will always encoded-decode properly, but the encoded bytes can vary).

No, it does not need to be deterministic.

I haven't read details about Fiat Shamir, but intuitively, if this "randomness" is actually deterministic and determined by the input samples, which is supplied by the prover, can the verifier be fooled by the prover?
It looks the verifier needs some randomness that the prover can never guess.

jianoaix · 2023-12-08T23:51:44Z

pkg/encoding/kzgEncoder/multiframe.go

+		bls.CopyG1(&commits[row], &s.Commitment)
+	}
+
+	ftG1 := bls.LinCombG1(commits, ftCoeffs)


It'll be more readability to name ft, st ... more properly. One option I see is: combinedProof, combinedCommitment, combinedInterpolation etc.

I am a bit against it, because combined commitment is not specific enough. You can say addition of two commitment a combined commitment.

I feel like it would be nice to find a way to show the hierarchy among the sections. It's unclear without prior knowledge that //second term refers to //rhs g1, for instance

One thing could improve the readability is to break some parts into functions. I will rewrite and decompose some logics.

@jianoaix @mooselumph I refactored the code, please take a look

I would like this types of documentation in code, which makes the connection between code and theory clear:

I remember one of the highly useful documentation for me is to comment out that the coefficients in chunk is for interpolation polynomial, which connects the whole thing.

pkg/encoding/kzgEncoder/multiframe.go

mooselumph · 2023-12-11T23:04:24Z

pkg/encoding/kzgEncoder/multiframe.go

+
+	var randomFr bls.Fr
+
+	err := bls.HashToSingleField(&randomFr, buffer.Bytes())


Does this need to be deterministic? The gob encoder seems to be stateful (will always encoded-decode properly, but the encoded bytes can vary).

pkg/encoding/kzgEncoder/multiframe.go

mooselumph · 2023-12-12T00:02:26Z

core/validator.go

+	for params, subBatch := range subBatchMap {
+		params := params
+		subBatch := subBatch
+		go v.universalVerifyWorker(params, subBatch, out)


I don't think the refactoring would be needed... just give the validator a public method to be called by the node worker. But I don't really have a strong opinion about whether the threadpool should go here or in the node.

mooselumph · 2023-12-12T00:03:42Z

pkg/encoding/kzgEncoder/multiframe.go

+		bls.CopyG1(&commits[row], &s.Commitment)
+	}
+
+	ftG1 := bls.LinCombG1(commits, ftCoeffs)


I feel like it would be nice to find a way to show the hierarchy among the sections. It's unclear without prior knowledge that //second term refers to //rhs g1, for instance

core/validator.go

mooselumph · 2023-12-12T00:08:34Z

core/validator.go

+
+		// for each quorum
+		for _, quorumHeader := range blob.BlobHeader.QuorumInfos {
+			// Check if the operator is a member of the quorum


Add comment: NumChunks=0 so we can skip this blob

it is part of the preprocessBlob logic.

core/validator.go

pkg/encoding/kzgEncoder/multiframe.go

jianoaix · 2023-12-12T04:33:36Z

pkg/encoding/kzgEncoder/multiframe.go

+	Proof      bls.G1Point
+	RowIndex   int // corresponds to a row in the verification matrix
+	Coeffs     []bls.Fr
+	X          uint // X is the same assignment index of chunk in EigenDa


nit: EigenDa -> EigenDA

Also if it's chunk assignment index, can it be named as "ChunkIndex" for readability? "X" does not seem a proper name.

I wil refactor the X a bit. The reason I avoid using assignment because, the pkg should relatively stand alone, X is the evaluation index

jianoaix · 2023-12-12T04:34:57Z

core/validator.go

+
+		// for each quorum
+		for _, quorumHeader := range blob.BlobHeader.QuorumInfos {
+			// Check if the operator is a member of the quorum


nit: this comment is redundant (it's in preprocessBlob where the check is actually happening)

pkg/encoding/kzgEncoder/multiframe.go

jianoaix

LGTM

jianoaix · 2023-12-14T23:20:46Z

core/validator.go

-		// Check the received chunks against the commitment
-		err = v.encoder.VerifyChunks(chunks, assignment.GetIndices(), blob.BlobHeader.BlobCommitments, params)
+	for i := 0; i < numResult; i++ {
+		err := <-out
 		if err != nil {
 			return err


Do you want to stop() the workerpool before return, since there is no point for remaining threads to continue?

the workerpool is created by node thread, so it does not make a lot sense to stop them.

jianoaix · 2023-12-14T23:24:52Z

pkg/encoding/kzgEncoder/multiframe.go

+
+	var randomFr bls.Fr
+
+	err := bls.HashToSingleField(&randomFr, buffer.Bytes())


I haven't read details about Fiat Shamir, but intuitively, if this "randomness" is actually deterministic and determined by the input samples, which is supplied by the prover, can the verifier be fooled by the prover?
It looks the verifier needs some randomness that the prover can never guess.

jianoaix · 2023-12-14T23:26:58Z

pkg/encoding/kzgEncoder/multiframe.go

+// generate a random value using Fiat Shamir transform
+// we can also pseudo randomness generated locally, but we have to ensure no adversary can manipulate it
+// Hashing everything takes about 1ms, so Fiat Shamir transform does not incur much cost
+func GenRandomness(samples []Sample) (bls.Fr, error) {


Randomness -> RandomFactor? (it's the terminology used by the research post anyway)

The prover can fool the verifier based on a randomness by modifying its input. However, since there is a deterministic relation between input and the randomness, the adversarial disperser cannot perform this attack

jianoaix · 2023-12-14T23:29:19Z

pkg/encoding/kzgEncoder/multiframe.go

+// m is number of blob, samples is a list of chunks
+//
+// The order of samples do not matter.
+// Each sample need not have unique row, it is possible that multiple chunks of the same blob are validated altogether


it is possible -> it is always? The chunks from the same blob always share the same encoding param.

but a operator might only be assigned 1 chunk in one blob

The context here is that all chunks of the same blob assigned to this operator (will be always validated together)

jianoaix · 2023-12-15T00:15:17Z

pkg/encoding/kzgEncoder/multiframe.go

+// m is number of blob, samples is a list of chunks
+//
+// The order of samples do not matter.
+// Each sample need not have unique row, it is possible that multiple chunks of the same blob are validated altogether


The context here is that all chunks of the same blob assigned to this operator (will be always validated together)

jianoaix · 2023-12-15T00:23:35Z

pkg/encoding/kzgEncoder/multiframe.go

+	// All samples in a subBatch has identical chunkLen
+	aggPolyG1 := bls.LinCombG1(ks.Srs.G1[:D], aggPolyCoeffs)
+
+	// third term


btw I checked the code of these 3 terms against the research post and it looks good (quite impressive you nail those details rigorously!), except the 3rd term which I cannot follow (need to understand the coset shifting stuff a bit more)

Ubuntu added 6 commits November 30, 2023 00:21

working batch verification

43f6bee

organize data struct and add fiat shamir

0352b05

code refactor

327699f

add batch verification parallelization, fix bug

8a8123f

add comments

65fc825

fix lint

e87d8e3

bxue-l2 requested review from gpsanant, mooselumph, jianoaix and ian-shim November 30, 2023 18:01

Ubuntu added 8 commits November 30, 2023 18:43

working batch verification

b842d94

organize data struct and add fiat shamir

13b22c9

code refactor

d965718

add batch verification parallelization, fix bug

7cd21b5

add comments

004aef7

fix lint

82f1f16

fix rebase

3f95936

Merge branch 'node-batch-verification' of https://github.com/bxue-l2/…

2910b35

…eigenda into node-batch-verification

bxue-l2 self-assigned this Dec 1, 2023

ian-shim reviewed Dec 4, 2023

View reviewed changes

Ubuntu added 4 commits December 4, 2023 19:28

fix test

324b188

fix a bug that assign the wrong blobIndex, add unit test for core

6d03b77

fix test

945ec47

rm commented old code

5fb4c0a

Ubuntu added 2 commits December 6, 2023 20:13

add some changes

9bd0d3c

change test

789cc64

ian-shim reviewed Dec 8, 2023

View reviewed changes

jianoaix reviewed Dec 8, 2023

View reviewed changes

Ubuntu added 2 commits December 11, 2023 20:54

take feedbacks

6f84c3e

parallelize length proof verification

9119b43

mooselumph reviewed Dec 12, 2023

View reviewed changes

Ubuntu added 4 commits December 12, 2023 00:16

move pool creation location

70f4b6a

fix mock

9bfef6f

refactor multiframe code

9563204

fix lint

daa1c17

jianoaix reviewed Dec 12, 2023

View reviewed changes

refactor Sample

9ecbf15

jianoaix reviewed Dec 14, 2023

View reviewed changes

change func name

ffc9912

jianoaix approved these changes Dec 15, 2023

View reviewed changes

jianoaix reviewed Dec 15, 2023

View reviewed changes

additional check

219a01d

bxue-l2 merged commit 448a261 into Layr-Labs:master Dec 15, 2023
4 checks passed


		var randomFr bls.Fr

		err := bls.HashToSingleField(&randomFr, buffer.Bytes())

Node batch verification #94

Node batch verification #94

Conversation

bxue-l2 commented Nov 30, 2023 • edited Loading

Why are these changes needed?

Checks

ian-shim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bxue-l2 commented Dec 5, 2023

bxue-l2 commented Dec 6, 2023 • edited Loading

ian-shim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jianoaix Dec 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jianoaix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jianoaix Dec 15, 2023 • edited Loading

Choose a reason for hiding this comment

jianoaix Dec 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bxue-l2 commented Nov 30, 2023 •

edited

Loading

bxue-l2 commented Dec 6, 2023 •

edited

Loading

jianoaix Dec 12, 2023 •

edited

Loading

jianoaix Dec 15, 2023 •

edited

Loading

jianoaix Dec 15, 2023 •

edited

Loading