Added blob table data structure. #677

cody-littley · 2024-07-31T17:03:03Z

Why are these changes needed?

This is code split off from another PR that got too big: #666

This PR adds a data structure called a BlobTable, which is used by the traffic generator to track blobs.

Checks

I've made sure the lint is passing in this PR.
I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
Testing Strategy
- Unit tests
- Integration tests
- This PR is not tested :(

Signed-off-by: Cody Littley <[email protected]>

ian-shim · 2024-07-31T17:40:19Z

tools/traffic/table/blob_metadata.go

+	// checksum of the blob.
+	checksum *[16]byte
+
+	// batchHeaderHash of the blob in bytes.


this comment should be updated

oops, fixed

ian-shim · 2024-07-31T17:42:35Z

tools/traffic/table/blob_table.go

+	defer table.lock.Unlock()
+
+	blob.index = table.size
+	table.blobs[table.size] = blob


what if the table.size is out of bound?

Fixed.

Good catch. It just so happened that the original capacity was 1024, which exceeded the number of blobs I was testing with.

I also made the initial capacity 0, thus making existing unit tests sensitive to this problem (as well as eliminating a magic number).

ian-shim · 2024-07-31T17:42:54Z

tools/traffic/table/blob_table.go

+// NewBlobTable creates a new BlobTable instance.
+func NewBlobTable() BlobTable {
+	return BlobTable{
+		blobs: make([]*BlobMetadata, 1024),


why don't we make the size of this slice variable?

That would work as long as we only add blobs. When blobs are removed from the table, the size of the table may actually be smaller than the size of this slice.

ian-shim · 2024-07-31T17:43:36Z

tools/traffic/table/blob_table.go

+	size uint
+
+	// lock is used to synchronize access to the requiredReads.
+	lock sync.Mutex


we can use RWMutex for more granular control

ian-shim · 2024-07-31T17:44:51Z

tools/traffic/table/blob_table.go

+	table.size++
+}
+
+// AddOrReplace adds a blob to the requiredReads if there is capacity or replaces an existing blob at random


What's the requiredReads?

Documentation was out of date. Fixed.

// AddOrReplace is equivalent to Add if there is capacity, or replaces an existing blob at random // if the is no remaining capacity. This method is a no-op if maximumCapacity is 0.

ian-shim · 2024-07-31T17:46:31Z

tools/traffic/table/blob_table.go

+	defer table.lock.Unlock()
+
+	if table.size >= maximumCapacity {
+		// replace random existing blob


what if the existing blob hasn't been retrieved the required number of times?

Then it is removed.

This is not a problem for our use case though, since we keep two blob tables: one for required blobs, and another for optional blobs. The code never calls AddOrReplace() on the table that contains the required blobs.

ian-shim · 2024-07-31T17:49:23Z

tools/traffic/table/blob_table.go

+	blob := table.blobs[rand.Int31n(int32(table.size))]
+
+	removed := false
+	if decrement && blob.remainingReadPermits != -1 {


shouldn't the condition be blob.remainingReadPermits != -1?
Say blob.remainingReadPermits is 0 before this call. Then it gets decremented to -1 inside this block and is never removed

Read permits for blobs in the table are never 0. They are always -1 or greater than 0, and removed the moment the count reaches 0.

Just in case, I added an assertion in the NewBlobMetadata() method to validate that this invariant is not violated.

if readPermits == 0 { panic("readPermits must be greater than 0, or -1 for unlimited reads") }

ian-shim · 2024-07-31T17:58:06Z

tools/traffic/table/blob_table.go

+}
+
+// remove a blob from the requiredReads.
+func (table *BlobTable) remove(blob *BlobMetadata) {


Is there a particular reason why we need table to be a slice?
I feel like removing/adding elements would be a lot simpler and less brittle if we used a map

Primary reason why I use a slice is to give an O(1) implementation of GetRandom(). Can you think of a good way to do this backed by a map?

Do we need to access a random element from the slice?
For sampling, could we just use the first element from the map?

The entire reason why this complex data structure exists in the first place is to facilitate random access. 😜

For the blobs with a number of required reads: yes, we could get away without random access. But I'm under the impression a random access pattern is preferable to a fixed one when simulating workloads like this.

For the pool of optional blobs to read, I think random access is necessary. Otherwise, we'd just be reading the same blob over and over until we get a new blob to start reading.

I'm open to discussing this more in depth if you are not convinced by my reasoning.

For the blobs with a number of required reads: yes, we could get away without random access. But I'm under the impression a random access pattern is preferable to a fixed one when simulating workloads like this.

We could make get semi random access with a map in constant time (just generate random number n < 10 and pick _n_th element from map) if the access doesn't require sampling from uniform distribution.

For the pool of optional blobs to read, I think random access is necessary. Otherwise, we'd just be reading the same blob over and over until we get a new blob to start reading.

I don't think optional blob reads were ever part of the spec. The primary goal for this observability tool is to monitor if the network can handle a given retrieval traffic. Are there benefits of saturating the network with optional reads?

Not a big deal since you have it implemented already, but would bias toward simplicity vs. optimization

Refactored to use a map based implementation.

jianoaix · 2024-07-31T17:50:57Z

tools/traffic/table/blob_metadata.go

+// BlobMetadata encapsulates various information about a blob written by the traffic generator.
+type BlobMetadata struct {
+	// key of the blob, set when the blob is initially uploaded.
+	key *[]byte


here and below: may not need the pointer as slice is already a pointer and cheap to copy around

Interesting, TIL. Will simplify code to not pass slice pointers.

jianoaix · 2024-07-31T18:00:36Z

tools/traffic/table/blob_metadata.go

+	// checksum of the blob.
+	checksum *[16]byte
+
+	// batchHeaderHash of the blob in bytes.


batchHeaderHash is fixed 32 bytes so no need to track it

Field removed.

jianoaix · 2024-07-31T18:05:22Z

tools/traffic/table/blob_table.go

+
+// Size returns the total number of blobs currently tracked by the requiredReads.
+func (table *BlobTable) Size() uint {
+	table.lock.Lock()


Here and a few places below: can just use read lock

Yup, I've switched to a read/write lock pattern.

Signed-off-by: Cody Littley <[email protected]>

ian-shim · 2024-08-01T23:42:26Z

tools/traffic/table/blob_table.go

+// remove a blob from the requiredReads.
+func (table *BlobTable) remove(blob *BlobMetadata) {
+	if table.blobs[blob.index] != blob {
+		panic(fmt.Sprintf("blob %x is not not present in the requiredReads at index %d", blob.Key(), blob.index))


Should we log at error level and handle this case gracefully vs. crashing the whole program?

This code was replaced with the simpler implementation.

ian-shim · 2024-08-01T23:46:18Z

tools/traffic/table/blob_table.go

+}
+
+// remove a blob from the requiredReads.
+func (table *BlobTable) remove(blob *BlobMetadata) {


For the blobs with a number of required reads: yes, we could get away without random access. But I'm under the impression a random access pattern is preferable to a fixed one when simulating workloads like this.

We could make get semi random access with a map in constant time (just generate random number n < 10 and pick _n_th element from map) if the access doesn't require sampling from uniform distribution.

For the pool of optional blobs to read, I think random access is necessary. Otherwise, we'd just be reading the same blob over and over until we get a new blob to start reading.

I don't think optional blob reads were ever part of the spec. The primary goal for this observability tool is to monitor if the network can handle a given retrieval traffic. Are there benefits of saturating the network with optional reads?

Not a big deal since you have it implemented already, but would bias toward simplicity vs. optimization

ian-shim · 2024-08-01T23:49:25Z

tools/traffic/table/blob_table.go

+	blob := table.blobs[rand.Int31n(int32(table.size))]
+
+	removed := false
+	if decrement && blob.remainingReadPermits != -1 {


ian-shim · 2024-08-01T23:50:31Z

tools/traffic/table/blob_metadata.go

+	readPermits int) *BlobMetadata {
+
+	if readPermits == 0 {
+		panic("readPermits must be greater than 0, or -1 for unlimited reads")


We could return error instead of crashing when validation fails

change made

Signed-off-by: Cody Littley <[email protected]>

ian-shim

lgtm! Can we address this comment from last review?

Signed-off-by: Cody Littley <[email protected]>

jianoaix · 2024-08-02T21:25:41Z

tools/traffic/table/blob_metadata.go

+	key []byte
+
+	// checksum of the blob.
+	checksum *[16]byte


Does this need to be a pointer?

Changed to a non-pointer variable.

jianoaix · 2024-08-02T21:28:55Z

tools/traffic/table/blob_metadata.go

+	size uint
+
+	// blobIndex of the blob.
+	blobIndex uint


What is blob index?

Blob index is one of the arguments needed when retrieving a blob. I am unsure of the deeper meaning of this field, I figured out I needed it through reverse engineering.

func (r *retrievalClient) RetrieveBlob( ctx context.Context, batchHeaderHash [32]byte, blobIndex uint32, 👈 referenceBlockNumber uint, batchRoot [32]byte, quorumID core.QuorumID) ([]byte, error) {

Yea, this index is the the position of the blob in the batch (the batch is a list of blobs)

jianoaix · 2024-08-02T21:30:50Z

tools/traffic/table/blob_metadata.go

+// BlobMetadata encapsulates various information about a blob written by the traffic generator.
+type BlobMetadata struct {
+	// key of the blob, set when the blob is initially uploaded.
+	key []byte


Is this the blobKey:

eigenda/disperser/disperser.go

Line 52 in 2566c21

type BlobKey struct {

?

The key, in this context, is the []byte return value of this method:

func (c *disperserClient) DisperseBlob(ctx context.Context, data []byte, quorums []uint8) (*disperser.BlobStatus, []byte, error) {

The BlobKey seems like it is holding the same data, although it is in string form, not byte array form. I originally had the MetadataHash field as well, although that was removed based on a prior comment.

Should I be using the BlobKey struct here?

jianoaix · 2024-08-02T21:33:16Z

tools/traffic/table/blob_metadata.go

+	}, nil
+}
+
+// Key returns the key of the blob.


What about making the member variables public? These getters are quite simple so seem not worth the verbosity.

Intention was to make it possible to read these values without the capability of updating them. But maybe that's more of a java design pattern. I've made the member variables public as you suggest.

Yea it feels quite Java-ish

Signed-off-by: Cody Littley <[email protected]>

Added blob table data structure.

1fa355e

Signed-off-by: Cody Littley <[email protected]>

cody-littley assigned cody-littley, jianoaix and ian-shim and unassigned jianoaix and ian-shim Jul 31, 2024

cody-littley requested review from jianoaix and ian-shim July 31, 2024 17:04

ian-shim reviewed Jul 31, 2024

View reviewed changes

jianoaix reviewed Jul 31, 2024

View reviewed changes

cody-littley added 3 commits August 1, 2024 11:35

Made suggested changes.

af9e50f

Signed-off-by: Cody Littley <[email protected]>

Made suggested changes.

1e8702a

Signed-off-by: Cody Littley <[email protected]>

Made suggested changes.

ccfb9da

Signed-off-by: Cody Littley <[email protected]>

cody-littley requested review from jianoaix and ian-shim August 1, 2024 19:46

ian-shim reviewed Aug 1, 2024

View reviewed changes

cody-littley added 2 commits August 2, 2024 10:04

Add blob store.

5db2284

Signed-off-by: Cody Littley <[email protected]>

Simplify data structure.

cd04356

Signed-off-by: Cody Littley <[email protected]>

ian-shim approved these changes Aug 2, 2024

View reviewed changes

Make suggested changes.

494ab5c

Signed-off-by: Cody Littley <[email protected]>

jianoaix reviewed Aug 2, 2024

View reviewed changes

Made suggested changes.

1054348

Signed-off-by: Cody Littley <[email protected]>

jianoaix approved these changes Aug 5, 2024

View reviewed changes

cody-littley merged commit 7020083 into Layr-Labs:master Aug 6, 2024
6 checks passed

cody-littley deleted the blob-table-fragment branch August 6, 2024 12:57

Added blob table data structure. #677

Added blob table data structure. #677

Conversation

cody-littley commented Jul 31, 2024 • edited Loading

Why are these changes needed?

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cody-littley Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ian-shim Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ian-shim Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ian-shim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cody-littley Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cody-littley commented Jul 31, 2024 •

edited

Loading

cody-littley Aug 1, 2024 •

edited

Loading

ian-shim Aug 1, 2024 •

edited

Loading

ian-shim Aug 1, 2024 •

edited

Loading

cody-littley Aug 5, 2024 •

edited

Loading