Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node batch verification #94

Merged
merged 29 commits into from
Dec 15, 2023
Merged

Conversation

bxue-l2
Copy link
Contributor

@bxue-l2 bxue-l2 commented Nov 30, 2023

Why are these changes needed?

This PR adds Batch Chunks verification to reduce the processing time for a node when receiving a batch

The technical details is well documented in https://ethresear.ch/t/a-universal-verification-equation-for-data-availability-sampling/13240#step-by-step-derivation-of-the-universal-equation-11 (A eth research blog)

Important struct:

  • Sample: the basic unit for batch verification, for eigenDA it is a chunk
  • SubBatch: groups of Samples that has the same dimension

Big changes:

  • Add interfaces to both encoding.go and validator.go for batchVerification
  • Replace the previous blob-by-blob verification from the node/node.go with the new batch verification
  • Parallel processing for each SubBatch

Some suggestions to the following will be helpful:

  • Since batchVerification is defined inside core/validator, the parallelization is done inside core (some feedback appreciated)
  • Since batchVerification is defined inside core/validator, I removes the "n.Config.NumBatchValidators" which allows clients to specify how many worker thread for verification. Instead I simply spin a goroutine for each subBatch, otherwise I will have to put the numWorker in the core interface (might be it is fine?)
  • Although I can use workerpool for parallelization, but it is unnatural to add that dependency in the core

Checks

  • I've made sure the lint is passing in this PR.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

@bxue-l2 bxue-l2 self-assigned this Dec 1, 2023
Copy link
Contributor

@ian-shim ian-shim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you take a look at the unit tests failure?
also, do the existing unit tests cover the new verification logic well?

node/node.go Outdated
out <- err
return
}
//func (n *Node) validateBlob(ctx context.Context, blob *core.BlobMessage, operatorState *core.OperatorState, out chan error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

@bxue-l2
Copy link
Contributor Author

bxue-l2 commented Dec 5, 2023

can you take a look at the unit tests failure? also, do the existing unit tests cover the new verification logic well?

The unit tests are added in the library level, pkg/encoding/kzgEncoder, and at the core level core/test/core_test

In the core test, the batch verification is done across multiple setup (security params, bloblength, quant factor).

@bxue-l2
Copy link
Contributor Author

bxue-l2 commented Dec 6, 2023

Added some benchmark test. In general, we get 4.5 - 25 times speedup.

the variable parameters are quant factor, num opr, adv/quo ratio, blob length, and stake distribution.

The stake distribution is either uniform or quadratic (stake= i^2*6+32), where i is opr index. The table is shown below, whose full access is in the link.

For non-uniform stake, there is a histogram of time, please refer to the doc for more info.

Screenshot 2023-12-06 at 1 55 54 PM

One additional thing we can optimize for, is batch verification for length proof . Currently it is taking 0.63*n millisecond out of the entire verification. Effectively it is taking 2/3 of the batch verification time. So in principle we can get an additional 3x speedup.

Copy link
Contributor

@ian-shim ian-shim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but this should get a stamp from @mooselumph

core/data.go Show resolved Hide resolved
core/encoding.go Outdated Show resolved Hide resolved
for params, subBatch := range subBatchMap {
params := params
subBatch := subBatch
go v.universalVerifyWorker(params, subBatch, out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's better to cap the threading with a pool.
I don't have strong objection to use a threadpool here. But if that's a concern, an alternative is to make the core lib to create the SubBatch map, and the Node to run a threadpool to invoke the execution for each SubBatch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the second approach, we need to export the encoder interface from the validator. That will need discussion and a proper code restructure. I will choose option 1. @mooselumph what do you think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the refactoring would be needed... just give the validator a public method to be called by the node worker. But I don't really have a strong opinion about whether the threadpool should go here or in the node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add the workerPool to the call, common exposes an interface

core/encoding.go Outdated Show resolved Hide resolved
pkg/encoding/kzgEncoder/multiframe.go Outdated Show resolved Hide resolved
pkg/encoding/kzgEncoder/multiframe.go Outdated Show resolved Hide resolved
pkg/encoding/kzgEncoder/multiframe.go Outdated Show resolved Hide resolved

var randomFr bls.Fr

err := bls.HashToSingleField(&randomFr, buffer.Bytes())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to add a random salt to actually make it random (now it's a deterministic hash)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be deterministic? The gob encoder seems to be stateful (will always encoded-decode properly, but the encoded bytes can vary).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it does not need to be deterministic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read details about Fiat Shamir, but intuitively, if this "randomness" is actually deterministic and determined by the input samples, which is supplied by the prover, can the verifier be fooled by the prover?
It looks the verifier needs some randomness that the prover can never guess.

bls.CopyG1(&commits[row], &s.Commitment)
}

ftG1 := bls.LinCombG1(commits, ftCoeffs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be more readability to name ft, st ... more properly. One option I see is: combinedProof, combinedCommitment, combinedInterpolation etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit against it, because combined commitment is not specific enough. You can say addition of two commitment a combined commitment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would be nice to find a way to show the hierarchy among the sections. It's unclear without prior knowledge that //second term refers to //rhs g1, for instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing could improve the readability is to break some parts into functions. I will rewrite and decompose some logics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianoaix @mooselumph I refactored the code, please take a look

Copy link
Contributor

@jianoaix jianoaix Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like this types of documentation in code, which makes the connection between code and theory clear:
Screenshot 2023-12-11 at 8 47 50 PM

I remember one of the highly useful documentation for me is to comment out that the coefficients in chunk is for interpolation polynomial, which connects the whole thing.

pkg/encoding/kzgEncoder/multiframe.go Outdated Show resolved Hide resolved

var randomFr bls.Fr

err := bls.HashToSingleField(&randomFr, buffer.Bytes())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be deterministic? The gob encoder seems to be stateful (will always encoded-decode properly, but the encoded bytes can vary).

pkg/encoding/kzgEncoder/multiframe.go Show resolved Hide resolved
for params, subBatch := range subBatchMap {
params := params
subBatch := subBatch
go v.universalVerifyWorker(params, subBatch, out)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the refactoring would be needed... just give the validator a public method to be called by the node worker. But I don't really have a strong opinion about whether the threadpool should go here or in the node.

bls.CopyG1(&commits[row], &s.Commitment)
}

ftG1 := bls.LinCombG1(commits, ftCoeffs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would be nice to find a way to show the hierarchy among the sections. It's unclear without prior knowledge that //second term refers to //rhs g1, for instance

core/validator.go Outdated Show resolved Hide resolved

// for each quorum
for _, quorumHeader := range blob.BlobHeader.QuorumInfos {
// Check if the operator is a member of the quorum
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment: NumChunks=0 so we can skip this blob

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is part of the preprocessBlob logic.

core/validator.go Outdated Show resolved Hide resolved
core/validator.go Outdated Show resolved Hide resolved
pkg/encoding/kzgEncoder/multiframe.go Show resolved Hide resolved
pkg/encoding/kzgEncoder/multiframe.go Outdated Show resolved Hide resolved
Proof bls.G1Point
RowIndex int // corresponds to a row in the verification matrix
Coeffs []bls.Fr
X uint // X is the same assignment index of chunk in EigenDa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: EigenDa -> EigenDA

Also if it's chunk assignment index, can it be named as "ChunkIndex" for readability? "X" does not seem a proper name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wil refactor the X a bit. The reason I avoid using assignment because, the pkg should relatively stand alone, X is the evaluation index

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored


// for each quorum
for _, quorumHeader := range blob.BlobHeader.QuorumInfos {
// Check if the operator is a member of the quorum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this comment is redundant (it's in preprocessBlob where the check is actually happening)

pkg/encoding/kzgEncoder/multiframe.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jianoaix jianoaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// Check the received chunks against the commitment
err = v.encoder.VerifyChunks(chunks, assignment.GetIndices(), blob.BlobHeader.BlobCommitments, params)
for i := 0; i < numResult; i++ {
err := <-out
if err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to stop() the workerpool before return, since there is no point for remaining threads to continue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the workerpool is created by node thread, so it does not make a lot sense to stop them.


var randomFr bls.Fr

err := bls.HashToSingleField(&randomFr, buffer.Bytes())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read details about Fiat Shamir, but intuitively, if this "randomness" is actually deterministic and determined by the input samples, which is supplied by the prover, can the verifier be fooled by the prover?
It looks the verifier needs some randomness that the prover can never guess.

// generate a random value using Fiat Shamir transform
// we can also pseudo randomness generated locally, but we have to ensure no adversary can manipulate it
// Hashing everything takes about 1ms, so Fiat Shamir transform does not incur much cost
func GenRandomness(samples []Sample) (bls.Fr, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Randomness -> RandomFactor? (it's the terminology used by the research post anyway)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prover can fool the verifier based on a randomness by modifying its input. However, since there is a deterministic relation between input and the randomness, the adversarial disperser cannot perform this attack

// m is number of blob, samples is a list of chunks
//
// The order of samples do not matter.
// Each sample need not have unique row, it is possible that multiple chunks of the same blob are validated altogether
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is possible -> it is always? The chunks from the same blob always share the same encoding param.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but a operator might only be assigned 1 chunk in one blob

Copy link
Contributor

@jianoaix jianoaix Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context here is that all chunks of the same blob assigned to this operator (will be always validated together)

// m is number of blob, samples is a list of chunks
//
// The order of samples do not matter.
// Each sample need not have unique row, it is possible that multiple chunks of the same blob are validated altogether
Copy link
Contributor

@jianoaix jianoaix Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context here is that all chunks of the same blob assigned to this operator (will be always validated together)

// All samples in a subBatch has identical chunkLen
aggPolyG1 := bls.LinCombG1(ks.Srs.G1[:D], aggPolyCoeffs)

// third term
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I checked the code of these 3 terms against the research post and it looks good (quite impressive you nail those details rigorously!), except the 3rd term which I cannot follow (need to understand the coset shifting stuff a bit more)

@bxue-l2 bxue-l2 merged commit 448a261 into Layr-Labs:master Dec 15, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants