-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Chunk Data Pack Pruner] Add Block Iterator #6858
base: master
Are you sure you want to change the base?
Conversation
} | ||
|
||
// BlockIterator is an interface for iterating over blocks | ||
type BlockIterator interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BlockIterator interface can be implemented into height based iterator and view based iterator.
The block iterator not long can be used by chunk data pack pruner, but alsoin future to implement protocol state pruner.
The height based iterator is easy to implement, however, it can't guarantee to prune all data, since it doesn't iterate unfinalized blocks. The view based iterator can guarantee all blocks are pruned, but it's more complicated to implement.
In this PR, I first implement the height based iterator, for chunk data pack, it's OK that we only prune by height, however, for protocol state, it's better that we can prune by view and ensure a more throughout pruning.
jobCreator IteratorJobCreator | ||
} | ||
|
||
func NewIteratorFactory( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the interfaces in the arguments are implemented, then the logic to create the BlockIterator can be reused. That's why, I put this function here along with the interface definitions, so that it's clear to see how the interfaces will be used for creating the block iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just call this function NewBlockIterator
and return a BlockIterator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be two NewBlockIterator implementations: NewHeightBasedBlockIterator and NewViewBasedBlockIterator, both of them will need to implement progress initialization and creating job with range of height / view. And these logic are the same for both, so extracting the iteration factory is to reuse them.
In other words, there will be one IterationFactory, many different BlockIterator creators.
77fb95b
to
5b15c09
Compare
b688ac2
to
0af09d6
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6858 +/- ##
==========================================
- Coverage 41.11% 41.08% -0.04%
==========================================
Files 2116 2120 +4
Lines 185749 185895 +146
==========================================
- Hits 76378 76373 -5
- Misses 102954 103116 +162
+ Partials 6417 6406 -11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a couple small comments, but otherwise this looks good.
func (b *HeightIterator) Next() (flow.Identifier, bool, error) { | ||
// exit when the context is done | ||
select { | ||
case <-b.ctx.Done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need context here? since it's only used for this check and only at the beginning of the function call, it seems like we should make the check the caller's responsibility
"github.com/onflow/flow-go/storage" | ||
) | ||
|
||
type HeightIterator struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not concurrency safe. should it be? if not, can you add a warning to the godoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, suppose we should have only one iterator for a task.
require.NoError(t, err) | ||
|
||
// iterate through all blocks | ||
visited := make(map[flow.Identifier]struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this a slice instead of a map so the verification step can check that they were also visited in the correct order?
// verify we don't iterate two many blocks | ||
count++ | ||
if count > len(bs) { | ||
t.Fatal("visited too many blocks") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use a slice, you can omit this and just compare the final length to len(bs)
// if the iteration is interrupted (e.g. by a restart), the iterator can be | ||
// resumed from the last checkpoint, which might result in the same block being | ||
// iterated again. | ||
Next() (blockID flow.Identifier, hasNext bool, exception error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about making this a go iterator? Instead of Next()
it could be
func (b *HeightIterator) Range() iter.Seq2[flow.Identifier, error] {
return func(yield func(flow.Identifier, error) bool) {
for b.nextHeight <= b.endHeight {
next, err := b.headers.BlockIDByHeight(b.nextHeight)
if err != nil {
yield(flow.ZeroID, fmt.Errorf("failed to fetch block at height %v: %w", b.nextHeight, err))
return
}
b.nextHeight++
if !yield(next, nil) {
return
}
}
}
}
then
for blockID, err := range heightIterator.Range() {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a go 1.23 feature right? 1.22 seems doesn't have it yet.
Thanks, I will add a TODO once we upgrade to 1.23
|
||
type HeightIterator struct { | ||
// dependencies | ||
headers storage.Headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only the BlockIDByHeight
function is needed from headers. Consider just using a func (height) flow.Identifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. Actually if I change it into a GetBlockIDByHeight
function, the whole HeightIterator will be almost identical to ViewIterator, meaning we could just use HeightIterator as ViewIterator by passing a GetBlockIDByView function as GetBlockIDByHeight.
// if the iteration is interrupted (e.g. by a restart), the iterator can be | ||
// resumed from the last checkpoint, which might result in the same block being | ||
// iterated again. | ||
Next() (blockID flow.Identifier, hasNext bool, exception error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea!
jobCreator IteratorJobCreator | ||
} | ||
|
||
func NewIteratorFactory( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just call this function NewBlockIterator
and return a BlockIterator
?
module/block_iterator.go
Outdated
// the range could be either view based range or height based range. | ||
// when specifying the range, the start and end are inclusive, and the end must be greater than or | ||
// equal to the start | ||
type IterateJob struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion:
type IterateJob struct { | |
type IterationRange struct { |
module/block_iterator.go
Outdated
// ReadNext reads the next block to iterate | ||
// caller must ensure the reader is created by the IterateProgressInitializer, | ||
// otherwise ReadNext would return exception. | ||
ReadNext() (uint64, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReadNext() (uint64, error) | |
LoadState() (uint64, error) |
module/block_iterator.go
Outdated
// IterateProgressWriter saves the progress of the iterator | ||
type IterateProgressWriter interface { | ||
// SaveNext persists the next block to be iterated | ||
SaveNext(uint64) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SaveNext(uint64) error | |
SaveState(uint64) error |
} | ||
|
||
// IterateProgressWriter saves the progress of the iterator | ||
type IterateProgressWriter interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a bit overkill to have separate interfaces for read and save in this case, since you always need both for iterating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see this PR , how the writer and reader are separated.
The reader is used by the Job creator to read the start height and creating a height range. And it doesn't need the writer to update progress.
The writer is used by the iterator for saving the iterated height. Since the iteration range is decided by the input (IteratorJob), the iterator doesn't need the reader to read progress from storage.
This PR adds a height based block iterator that iterates blocks by height, without iterating siblings of finalized blocks, which will be done later by implementing view based block iterator.