[Chunk Data Pack Pruner] Add Block Iterator #6858

zhangchiqing · 2025-01-08T01:08:45Z

This PR adds a height based block iterator that iterates blocks by height, without iterating siblings of finalized blocks, which will be done later by implementing view based block iterator.

zhangchiqing · 2025-01-08T17:14:28Z

module/block_iterator.go

+}
+
+// BlockIterator is an interface for iterating over blocks
+type BlockIterator interface {


The BlockIterator interface can be implemented into height based iterator and view based iterator.

The block iterator not long can be used by chunk data pack pruner, but alsoin future to implement protocol state pruner.

The height based iterator is easy to implement, however, it can't guarantee to prune all data, since it doesn't iterate unfinalized blocks. The view based iterator can guarantee all blocks are pruned, but it's more complicated to implement.

In this PR, I first implement the height based iterator, for chunk data pack, it's OK that we only prune by height, however, for protocol state, it's better that we can prune by view and ensure a more throughout pruning.

zhangchiqing · 2025-01-08T17:16:12Z

module/block_iterator.go

+	jobCreator     IteratorJobCreator
+}
+
+func NewIteratorFactory(


Once the interfaces in the arguments are implemented, then the logic to create the BlockIterator can be reused. That's why, I put this function here along with the interface definitions, so that it's clear to see how the interfaces will be used for creating the block iterator.

Why not just call this function NewBlockIterator and return a BlockIterator?

There will be two NewBlockIterator implementations: NewHeightBasedBlockIterator and NewViewBasedBlockIterator, both of them will need to implement progress initialization and creating job with range of height / view. And these logic are the same for both, so extracting the iteration factory is to reuse them.

In other words, there will be one IterationFactory, many different BlockIterator creators.

codecov-commenter · 2025-01-15T16:25:09Z

Codecov Report

Attention: Patch coverage is 40.00000% with 27 lines in your changes missing coverage. Please review.

Project coverage is 41.08%. Comparing base (b740fc0) to head (a57ac00).
Report is 65 commits behind head on master.

Files with missing lines	Patch %	Lines
module/block_iterator.go	0.00%	21 Missing ⚠️
module/block_iterator/height_based/iterator.go	75.00%	4 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6858      +/-   ##
==========================================
- Coverage   41.11%   41.08%   -0.04%     
==========================================
  Files        2116     2120       +4     
  Lines      185749   185895     +146     
==========================================
- Hits        76378    76373       -5     
- Misses     102954   103116     +162     
+ Partials     6417     6406      -11

Flag	Coverage Δ
unittests	`41.08% <40.00%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

peterargue

added a couple small comments, but otherwise this looks good.

peterargue · 2025-01-21T19:56:06Z

module/block_iterator/height_based/iterator.go

+func (b *HeightIterator) Next() (flow.Identifier, bool, error) {
+	// exit when the context is done
+	select {
+	case <-b.ctx.Done():


why do we need context here? since it's only used for this check and only at the beginning of the function call, it seems like we should make the check the caller's responsibility

peterargue · 2025-01-21T20:01:25Z

module/block_iterator/height_based/iterator.go

+	"github.com/onflow/flow-go/storage"
+)
+
+type HeightIterator struct {


This is not concurrency safe. should it be? if not, can you add a warning to the godoc

yeah, suppose we should have only one iterator for a task.

peterargue · 2025-01-21T20:03:55Z

module/block_iterator/height_based/iterator_test.go

+		require.NoError(t, err)
+
+		// iterate through all blocks
+		visited := make(map[flow.Identifier]struct{})


can you make this a slice instead of a map so the verification step can check that they were also visited in the correct order?

peterargue · 2025-01-21T20:05:04Z

module/block_iterator/height_based/iterator_test.go

+			// verify we don't iterate two many blocks
+			count++
+			if count > len(bs) {
+				t.Fatal("visited too many blocks")
+			}


if you use a slice, you can omit this and just compare the final length to len(bs)

peterargue · 2025-01-21T21:02:26Z

module/block_iterator.go

+	// if the iteration is interrupted (e.g. by a restart), the iterator can be
+	// resumed from the last checkpoint, which might result in the same block being
+	// iterated again.
+	Next() (blockID flow.Identifier, hasNext bool, exception error)


what do you think about making this a go iterator? Instead of Next() it could be

func (b *HeightIterator) Range() iter.Seq2[flow.Identifier, error] { return func(yield func(flow.Identifier, error) bool) { for b.nextHeight <= b.endHeight { next, err := b.headers.BlockIDByHeight(b.nextHeight) if err != nil { yield(flow.ZeroID, fmt.Errorf("failed to fetch block at height %v: %w", b.nextHeight, err)) return } b.nextHeight++ if !yield(next, nil) { return } } } }

then

for blockID, err := range heightIterator.Range() { ... }

I like this idea!

This is a go 1.23 feature right? 1.22 seems doesn't have it yet.

Thanks, I will add a TODO once we upgrade to 1.23

janezpodhostnik · 2025-01-22T14:34:23Z

module/block_iterator/height_based/iterator.go

+
+type HeightIterator struct {
+	// dependencies
+	headers  storage.Headers


only the BlockIDByHeight function is needed from headers. Consider just using a func (height) flow.Identifier

That's a good idea. Actually if I change it into a GetBlockIDByHeight function, the whole HeightIterator will be almost identical to ViewIterator, meaning we could just use HeightIterator as ViewIterator by passing a GetBlockIDByView function as GetBlockIDByHeight.

janezpodhostnik · 2025-01-22T14:35:19Z

module/block_iterator.go

+	// if the iteration is interrupted (e.g. by a restart), the iterator can be
+	// resumed from the last checkpoint, which might result in the same block being
+	// iterated again.
+	Next() (blockID flow.Identifier, hasNext bool, exception error)


I like this idea!

janezpodhostnik · 2025-01-22T14:42:13Z

module/block_iterator.go

+	jobCreator     IteratorJobCreator
+}
+
+func NewIteratorFactory(


Why not just call this function NewBlockIterator and return a BlockIterator?

janezpodhostnik · 2025-01-22T14:42:56Z

module/block_iterator.go

+// the range could be either view based range or height based range.
+// when specifying the range, the start and end are inclusive, and the end must be greater than or
+// equal to the start
+type IterateJob struct {


suggestion:

Suggested change

type IterateJob struct {

type IterationRange struct {

janezpodhostnik · 2025-01-22T14:44:52Z

module/block_iterator.go

+	// ReadNext reads the next block to iterate
+	// caller must ensure the reader is created by the IterateProgressInitializer,
+	// otherwise ReadNext would return exception.
+	ReadNext() (uint64, error)


Suggested change

ReadNext() (uint64, error)

LoadState() (uint64, error)

janezpodhostnik · 2025-01-22T14:45:03Z

module/block_iterator.go

+// IterateProgressWriter saves the progress of the iterator
+type IterateProgressWriter interface {
+	// SaveNext persists the next block to be iterated
+	SaveNext(uint64) error


Suggested change

SaveNext(uint64) error

SaveState(uint64) error

janezpodhostnik · 2025-01-22T14:46:57Z

module/block_iterator.go

+}
+
+// IterateProgressWriter saves the progress of the iterator
+type IterateProgressWriter interface {


Maybe a bit overkill to have separate interfaces for read and save in this case, since you always need both for iterating.

You can see this PR , how the writer and reader are separated.

The reader is used by the Job creator to read the start height and creating a height range. And it doesn't need the writer to update progress.

The writer is used by the iterator for saving the iterated height. Since the iteration range is decided by the input (IteratorJob), the iterator doesn't need the reader to read progress from storage.

zhangchiqing requested review from fxamacker, janezpodhostnik and peterargue January 8, 2025 01:09

zhangchiqing commented Jan 8, 2025

View reviewed changes

zhangchiqing changed the base branch from leo/db-ops-dbstore to master January 9, 2025 17:25

zhangchiqing requested review from jordanschalm, AlexHentschel, durkmurder, Kay-Zee and tarakby as code owners January 9, 2025 17:25

zhangchiqing changed the base branch from master to leo/db-ops-dbstore January 10, 2025 17:18

zhangchiqing removed request for Kay-Zee, durkmurder, jordanschalm, AlexHentschel and tarakby January 10, 2025 17:18

zhangchiqing force-pushed the leo/db-ops-dbstore branch from 77fb95b to 5b15c09 Compare January 10, 2025 18:39

zhangchiqing marked this pull request as draft January 10, 2025 18:39

Base automatically changed from leo/db-ops-dbstore to master January 13, 2025 19:55

zhangchiqing added 8 commits January 15, 2025 08:21

add block iterator

82ca252

add height iterator

b597f4c

move SaveIterator

2ee5cbf

add test cases

d629039

rename to save next

6a47d19

refactor block iterator

59c84b5

update comments

bc825dc

update block iterator

0af09d6

zhangchiqing force-pushed the leo/cdp-prune-block-iterator branch from b688ac2 to 0af09d6 Compare January 15, 2025 16:21

zhangchiqing marked this pull request as ready for review January 15, 2025 16:23

zhangchiqing requested a review from a team as a code owner January 15, 2025 16:23

peterargue reviewed Jan 21, 2025

View reviewed changes

janezpodhostnik reviewed Jan 22, 2025

View reviewed changes

zhangchiqing added 4 commits January 22, 2025 08:22

remove context

b6ec7af

rename to IterateRange

5a6e18f

rename to SaveState/ReadState

4be86ff

fix test cases

a57ac00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Chunk Data Pack Pruner] Add Block Iterator #6858

[Chunk Data Pack Pruner] Add Block Iterator #6858

zhangchiqing commented Jan 8, 2025

zhangchiqing Jan 8, 2025

zhangchiqing Jan 8, 2025

janezpodhostnik Jan 22, 2025

zhangchiqing Jan 22, 2025

codecov-commenter commented Jan 15, 2025 •

edited

Loading

peterargue left a comment

peterargue Jan 21, 2025

peterargue Jan 21, 2025

zhangchiqing Jan 22, 2025

peterargue Jan 21, 2025

peterargue Jan 21, 2025

peterargue Jan 21, 2025

janezpodhostnik Jan 22, 2025

zhangchiqing Jan 22, 2025

janezpodhostnik Jan 22, 2025

zhangchiqing Jan 22, 2025

janezpodhostnik Jan 22, 2025

janezpodhostnik Jan 22, 2025

janezpodhostnik Jan 22, 2025

janezpodhostnik Jan 22, 2025

janezpodhostnik Jan 22, 2025

janezpodhostnik Jan 22, 2025

zhangchiqing Jan 22, 2025

[Chunk Data Pack Pruner] Add Block Iterator #6858

Are you sure you want to change the base?

[Chunk Data Pack Pruner] Add Block Iterator #6858

Conversation

zhangchiqing commented Jan 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 15, 2025 • edited Loading

Codecov Report

peterargue left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 15, 2025 •

edited

Loading