-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunk store #848
Chunk store #848
Conversation
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
@@ -0,0 +1,103 @@ | |||
package chunkstore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like chunkstore belongs outside disperser
. Should it be under relay
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to the relay
.
I was actually a little confused about where this should live. The relay reads from it, but the encoder will be writing to it. But I guess relay
is as good as anywhere (unless we decide to make it a top level directory, which I don't think it deserves to be).
encoding/rs/frame.go
Outdated
@@ -33,3 +37,61 @@ func Decode(b []byte) (Frame, error) { | |||
} | |||
return f, nil | |||
} | |||
|
|||
// EncodeFrames serializes a slice of frames into a byte slice. | |||
func EncodeFrames(frames []*Frame) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we use existing serialization methods like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From split encoder perspective we want to serialize the proofs and the coefficients separately. From node perspective we could make sure it receives the chunks in the expected serialized format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmanc requested that the chunk store had the capability of uploading the encoding.Proof
objects and rs.Frame
objects separately. The code you point to is capable of serializing both at the same time, but does provide a way to serialize/deserialize them separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. How about this one?
func (r *chunkReader) GetChunkCoefficients( | ||
ctx context.Context, | ||
blobKey disperser.BlobKey, | ||
metadata *ChunkCoefficientMetadata) ([]*rs.Frame, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does chunk reader need chunkMetadataStore
if this method takes metadata as input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point, it won't need this to be passed in. I didn't notice because it's not actually used in this PR's iteration of the feature. Removed.
) | ||
|
||
// ChunkMetadataStore is an interface for storing and retrieving metadata about chunks. | ||
type ChunkMetadataStore interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we need another store abstraction for writing/reading chunk metadata.
Since chunk metadata lives inside blob metadata, I think write/read should happen via blob metadata store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was something separate since we had originally discussed not putting the extra metadata into the regular blob metadata store. Now that this data has merged into the other blob matadata, I agree that it doesn't make sense to have a separate chunk metadata store. Removed.
// The total size of file containing all chunk coefficients for the blob. | ||
DataSize int | ||
// The maximum fragment size used to store the chunk coefficients. | ||
FragmentSize int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we be more specific here and say its uint64 or what's appropriate than generic int type ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Let's go with uint64
for the sake of future compatibility (I can't imagine having >4gb files, but let's not limit ourselves here... it's not that much overhead).
As a side note, one of the quirks of golang that drives me up a wall is how they strongly encourage everybody to use the int
type everywhere. For example, why does len(x)
return a signed value? If I were in charge of the language design, I'd never have supported the types int
and uint
in the first place. /rant
bytes = append(bytes, proofBytes[:]...) | ||
} | ||
|
||
err := c.s3Client.UploadObject(ctx, c.bucketName, s3Key, bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is our s3 object versioning policy ? would we need to care for objects that already exist before uploading ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, repeating conclusion of that talk here for others.
Since each key is unique and has a deterministic value, writing a value to a key more than once is harmless (i.e. the data is overwritten with the exact same data).
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! one comment re: serialization. Can you also take a look at the lint failure?
encoding/rs/frame.go
Outdated
@@ -33,3 +37,61 @@ func Decode(b []byte) (Frame, error) { | |||
} | |||
return f, nil | |||
} | |||
|
|||
// EncodeFrames serializes a slice of frames into a byte slice. | |||
func EncodeFrames(frames []*Frame) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. How about this one?
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, left a couple small comments
encoding/rs/frame.go
Outdated
// GnarkEncodeFrames serializes a slice of frames into a byte slice. | ||
func GnarkEncodeFrames(frames []*Frame) ([]byte, error) { | ||
|
||
// Serialization format: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this to above the function so it shows up in the go docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// GnarkEncodeFrames serializes a slice of frames into a byte slice.
//
// Serialization format:
// [number of frames: 4 byte uint32]
// [size of frame 1: 4 byte uint32][frame 1]
// [size of frame 2: 4 byte uint32][frame 2]
// ...
// [size of frame n: 4 byte uint32][frame n]
//
// Where relevant, big endian encoding is used.
func GnarkEncodeFrames(frames []*Frame) ([]byte, error) {
encoding/rs/frame.go
Outdated
return nil, 0, fmt.Errorf("invalid frame size: %d", len(serializedFrame)) | ||
} | ||
|
||
coeffs := make([]encoding.Symbol, frameCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I feel like encoding.Symbol is not really used anywhere. Maybe it's worth deprecating it and just using fr.Element.
// Symbol is a symbol in the field used for polynomial commitments
type Symbol = fr.Element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Cody Littley <[email protected]>
relay/chunkstore/chunk_reader.go
Outdated
|
||
func (r *chunkReader) GetChunkProofs( | ||
ctx context.Context, | ||
blobKey disperser.BlobKey) ([]*encoding.Proof, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this blobkey references the v1 blob key. In StoreBlob for V2 we use blobKey.Hex() = string
func (b *BlobStore) StoreBlob(ctx context.Context, blobKey string, data []byte) error { |
V2 blob key:
type BlobKey [32]byte
func (b BlobKey) Hex() string {
return hex.EncodeToString(b[:])
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also when we fetch for proofs vs coefficients don't we need a different S3 key to differentiate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been assuming we'd use different buckets. Started a slack conversation to discuss. Will circle back on this prior to merging once we decide how we want to handle buckets and namespacing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've switched over to using v2.BlobKey
as recommended by @ian-shim.
relay/chunkstore/chunk_writer.go
Outdated
|
||
// ChunkCoefficientMetadata contains metadata about how chunk coefficients are stored. | ||
// Required for reading chunk coefficients using ChunkReader.GetChunkCoefficients(). | ||
type ChunkCoefficientMetadata struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these the same? encoding.FragmentInfo
type FragmentInfo struct {
TotalChunkSizeBytes uint32
NumFragments uint32
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They aren't currently the same, but should be. Now fixed.
The primary reason why I didn't originally enable fragmented read/write operations was because I wasn't initially sure how the metdata store would handle this data. Now that Ian merged his PR, I've unified ChunkCoefficientMetadata
with FragmentInfo
and have enabled chunk file fragmentation.
Signed-off-by: Cody Littley <[email protected]>
Signed-off-by: Cody Littley <[email protected]>
Why are these changes needed?
This PR adds a framework that will be used by encoders to upload data.
As requested by @dmanc, I split apart the logic for uploading proofs and uploading coefficients into separate methods.
Since this functionality is needed to unblock @dmanc, I'm pushing it in a partially completed form. Namely, the framework does not currently break large files into smaller ones when pushing to S3. I plan on adding that as a follow up task.
Checks