-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added blob table data structure. #677
Added blob table data structure. #677
Conversation
Signed-off-by: Cody Littley <[email protected]>
tools/traffic/table/blob_metadata.go
Outdated
// checksum of the blob. | ||
checksum *[16]byte | ||
|
||
// batchHeaderHash of the blob in bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment should be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, fixed
tools/traffic/table/blob_table.go
Outdated
defer table.lock.Unlock() | ||
|
||
blob.index = table.size | ||
table.blobs[table.size] = blob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the table.size
is out of bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Good catch. It just so happened that the original capacity was 1024, which exceeded the number of blobs I was testing with.
I also made the initial capacity 0, thus making existing unit tests sensitive to this problem (as well as eliminating a magic number).
tools/traffic/table/blob_table.go
Outdated
// NewBlobTable creates a new BlobTable instance. | ||
func NewBlobTable() BlobTable { | ||
return BlobTable{ | ||
blobs: make([]*BlobMetadata, 1024), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we make the size of this slice variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work as long as we only add blobs. When blobs are removed from the table, the size of the table may actually be smaller than the size of this slice.
tools/traffic/table/blob_table.go
Outdated
size uint | ||
|
||
// lock is used to synchronize access to the requiredReads. | ||
lock sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use RWMutex
for more granular control
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tools/traffic/table/blob_table.go
Outdated
table.size++ | ||
} | ||
|
||
// AddOrReplace adds a blob to the requiredReads if there is capacity or replaces an existing blob at random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the requiredReads
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation was out of date. Fixed.
// AddOrReplace is equivalent to Add if there is capacity, or replaces an existing blob at random
// if the is no remaining capacity. This method is a no-op if maximumCapacity is 0.
tools/traffic/table/blob_table.go
Outdated
defer table.lock.Unlock() | ||
|
||
if table.size >= maximumCapacity { | ||
// replace random existing blob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the existing blob hasn't been retrieved the required number of times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it is removed.
This is not a problem for our use case though, since we keep two blob tables: one for required blobs, and another for optional blobs. The code never calls AddOrReplace()
on the table that contains the required blobs.
tools/traffic/table/blob_table.go
Outdated
blob := table.blobs[rand.Int31n(int32(table.size))] | ||
|
||
removed := false | ||
if decrement && blob.remainingReadPermits != -1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the condition be blob.remainingReadPermits != -1
?
Say blob.remainingReadPermits
is 0 before this call. Then it gets decremented to -1 inside this block and is never removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read permits for blobs in the table are never 0. They are always -1 or greater than 0, and removed the moment the count reaches 0.
Just in case, I added an assertion in the NewBlobMetadata()
method to validate that this invariant is not violated.
if readPermits == 0 {
panic("readPermits must be greater than 0, or -1 for unlimited reads")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh got it
tools/traffic/table/blob_table.go
Outdated
} | ||
|
||
// remove a blob from the requiredReads. | ||
func (table *BlobTable) remove(blob *BlobMetadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason why we need table to be a slice?
I feel like removing/adding elements would be a lot simpler and less brittle if we used a map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primary reason why I use a slice is to give an O(1)
implementation of GetRandom()
. Can you think of a good way to do this backed by a map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to access a random element from the slice?
For sampling, could we just use the first element from the map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire reason why this complex data structure exists in the first place is to facilitate random access. 😜
For the blobs with a number of required reads: yes, we could get away without random access. But I'm under the impression a random access pattern is preferable to a fixed one when simulating workloads like this.
For the pool of optional blobs to read, I think random access is necessary. Otherwise, we'd just be reading the same blob over and over until we get a new blob to start reading.
I'm open to discussing this more in depth if you are not convinced by my reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the blobs with a number of required reads: yes, we could get away without random access. But I'm under the impression a random access pattern is preferable to a fixed one when simulating workloads like this.
We could make get semi random access with a map in constant time (just generate random number n < 10 and pick _n_th element from map) if the access doesn't require sampling from uniform distribution.
For the pool of optional blobs to read, I think random access is necessary. Otherwise, we'd just be reading the same blob over and over until we get a new blob to start reading.
I don't think optional blob reads were ever part of the spec. The primary goal for this observability tool is to monitor if the network can handle a given retrieval traffic. Are there benefits of saturating the network with optional reads?
Not a big deal since you have it implemented already, but would bias toward simplicity vs. optimization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to use a map based implementation.
tools/traffic/table/blob_metadata.go
Outdated
// BlobMetadata encapsulates various information about a blob written by the traffic generator. | ||
type BlobMetadata struct { | ||
// key of the blob, set when the blob is initially uploaded. | ||
key *[]byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and below: may not need the pointer as slice is already a pointer and cheap to copy around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, TIL. Will simplify code to not pass slice pointers.
tools/traffic/table/blob_metadata.go
Outdated
// checksum of the blob. | ||
checksum *[16]byte | ||
|
||
// batchHeaderHash of the blob in bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batchHeaderHash is fixed 32 bytes so no need to track it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field removed.
tools/traffic/table/blob_table.go
Outdated
|
||
// Size returns the total number of blobs currently tracked by the requiredReads. | ||
func (table *BlobTable) Size() uint { | ||
table.lock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and a few places below: can just use read lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I've switched to a read/write lock pattern.
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
tools/traffic/table/blob_table.go
Outdated
// remove a blob from the requiredReads. | ||
func (table *BlobTable) remove(blob *BlobMetadata) { | ||
if table.blobs[blob.index] != blob { | ||
panic(fmt.Sprintf("blob %x is not not present in the requiredReads at index %d", blob.Key(), blob.index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log at error level and handle this case gracefully vs. crashing the whole program?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was replaced with the simpler implementation.
tools/traffic/table/blob_table.go
Outdated
} | ||
|
||
// remove a blob from the requiredReads. | ||
func (table *BlobTable) remove(blob *BlobMetadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the blobs with a number of required reads: yes, we could get away without random access. But I'm under the impression a random access pattern is preferable to a fixed one when simulating workloads like this.
We could make get semi random access with a map in constant time (just generate random number n < 10 and pick _n_th element from map) if the access doesn't require sampling from uniform distribution.
For the pool of optional blobs to read, I think random access is necessary. Otherwise, we'd just be reading the same blob over and over until we get a new blob to start reading.
I don't think optional blob reads were ever part of the spec. The primary goal for this observability tool is to monitor if the network can handle a given retrieval traffic. Are there benefits of saturating the network with optional reads?
Not a big deal since you have it implemented already, but would bias toward simplicity vs. optimization
tools/traffic/table/blob_table.go
Outdated
blob := table.blobs[rand.Int31n(int32(table.size))] | ||
|
||
removed := false | ||
if decrement && blob.remainingReadPermits != -1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh got it
tools/traffic/table/blob_metadata.go
Outdated
readPermits int) *BlobMetadata { | ||
|
||
if readPermits == 0 { | ||
panic("readPermits must be greater than 0, or -1 for unlimited reads") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could return error instead of crashing when validation fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change made
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! Can we address this comment from last review?
Signed-off-by: Cody Littley <[email protected]>
tools/traffic/table/blob_metadata.go
Outdated
key []byte | ||
|
||
// checksum of the blob. | ||
checksum *[16]byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to a non-pointer variable.
tools/traffic/table/blob_metadata.go
Outdated
size uint | ||
|
||
// blobIndex of the blob. | ||
blobIndex uint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is blob index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blob index is one of the arguments needed when retrieving a blob. I am unsure of the deeper meaning of this field, I figured out I needed it through reverse engineering.
func (r *retrievalClient) RetrieveBlob(
ctx context.Context,
batchHeaderHash [32]byte,
blobIndex uint32, 👈
referenceBlockNumber uint,
batchRoot [32]byte,
quorumID core.QuorumID) ([]byte, error) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this index is the the position of the blob in the batch (the batch is a list of blobs)
tools/traffic/table/blob_metadata.go
Outdated
// BlobMetadata encapsulates various information about a blob written by the traffic generator. | ||
type BlobMetadata struct { | ||
// key of the blob, set when the blob is initially uploaded. | ||
key []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the blobKey:
eigenda/disperser/disperser.go
Line 52 in 2566c21
type BlobKey struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key, in this context, is the []byte
return value of this method:
func (c *disperserClient) DisperseBlob(ctx context.Context, data []byte, quorums []uint8) (*disperser.BlobStatus, []byte, error) {
The BlobKey
seems like it is holding the same data, although it is in string form, not byte array form. I originally had the MetadataHash
field as well, although that was removed based on a prior comment.
Should I be using the BlobKey
struct here?
tools/traffic/table/blob_metadata.go
Outdated
}, nil | ||
} | ||
|
||
// Key returns the key of the blob. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about making the member variables public? These getters are quite simple so seem not worth the verbosity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intention was to make it possible to read these values without the capability of updating them. But maybe that's more of a java design pattern. I've made the member variables public as you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea it feels quite Java-ish
Signed-off-by: Cody Littley <[email protected]>
Why are these changes needed?
This is code split off from another PR that got too big: #666
This PR adds a data structure called a
BlobTable
, which is used by the traffic generator to track blobs.Checks