[dbnode] Use Documents Builder And Persist Index Data In Peer BS #2078

notbdu · 2019-12-21T07:53:58Z

What this PR does / why we need it:
Remove usage of mem segment in fs & peer bootstrapping.
Use documents builder and batch writes.
Multiplex building index blocks across threads in peer bootstrapping and persist index data after blocks are built.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

…lder and persist after finishing.

robskillington · 2019-12-22T01:44:14Z

src/dbnode/storage/index/allocator.go

 // allocator for a bootstrap result index block given index options.
-func NewBootstrapResultMutableSegmentAllocator(
+func NewBootstrapResultDocumentsBuilderAllocator(


nit: Should we rename this NewBootstrapDocumentsBuilderAllocator(...) to match the new type?

robskillington · 2019-12-22T01:55:02Z

src/dbnode/storage/bootstrap/bootstrapper/peers/source.go

@@ -642,73 +648,123 @@ func (s *peersSource) readIndex(
 		zap.Int("concurrency", concurrency),
 	)

+	groupFn := bootstrapper.NewShardTimeRangesTimeWindowGroups
+	groupedByBlockSize := groupFn(shardsTimeRanges, dataBlockSize)


Hm, should this be index block size instead of data block size? Just comparing to the fs/source.go implementation.

Hard to know, I see we did the data block size for fetching below.

src/dbnode/storage/bootstrap/bootstrapper/peers/source.go

robskillington · 2019-12-22T01:57:49Z

src/dbnode/storage/bootstrap/bootstrapper/peers/source.go

-						if !inserted {
-							// If the metadata wasn't inserted we finalize the metadata.
-							dataBlock.Finalize()
+						metadata, err := session.FetchBootstrapBlocksMetadataFromPeers(ns.ID(),


So, I think for the peers indexing read side we can just index from whatever we just persisted to disk.

notbdu · 2019-12-22T02:25:13Z

src/dbnode/storage/bootstrap/bootstrapper/peers/source.go

-	if exists {
-		return false, nil
-	}
-
 	d, err := convert.FromMetric(dataBlock.ID, dataBlock.Tags)


TODO: Look into potentially pooling structs here.

codecov · 2020-01-03T22:52:43Z

Codecov Report

❗ No coverage uploaded for pull request base (master@8ed08f1). Click here to learn what that means.
The diff coverage is 66.6%.

@@           Coverage Diff            @@
##             master   #2078   +/-   ##
========================================
  Coverage          ?   63.4%           
========================================
  Files             ?     853           
  Lines             ?   78234           
  Branches          ?       0           
========================================
  Hits              ?   49612           
  Misses            ?   24942           
  Partials          ?    3680

Flag	Coverage Δ
#aggregator	`57.9% <66.6%> (?)`
#cluster	`100% <ø> (?)`
#collector	`33.3% <ø> (?)`
#dbnode	`100% <ø> (?)`
#m3em	`100% <ø> (?)`
#m3ninx	`100% <ø> (?)`
#m3nsch	`100% <ø> (?)`
#metrics	`100% <ø> (?)`
#msg	`100% <ø> (?)`
#query	`100% <ø> (?)`
#x	`68.1% <ø> (?)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ed08f1...2dbaf80. Read the comment docs.

robskillington · 2020-01-07T14:13:52Z

src/dbnode/storage/bootstrap/bootstrapper/fs/source.go

 	)
+	defer func() {
+		docsPool.Put(batch)
+	}()


nit: Realize this can be abbreviated to defer docsPool.Put(batch) (think this code was elsewhere, but just FYI).

robskillington · 2020-01-07T15:44:59Z

src/dbnode/storage/bootstrap/bootstrapper/peers/source.go

-		wg            sync.WaitGroup
+		count          = len(shardsTimeRanges)
+		concurrency    = s.opts.DefaultShardConcurrency()
+		indexBlockSize = ns.Options().RetentionOptions().BlockSize()


This might be dataBlockSize? I think index block size would come from ns.Options().IndexOptions().BlockSize()?

robskillington · 2020-01-07T15:59:17Z

src/dbnode/storage/bootstrap/bootstrapper/peers/source.go

-					currRange := xtime.Range{
-						Start: blockStart,
-						End:   blockStart.Add(size),
+			var (


nit: Could we move this method into a sub-method? It's just a little harder to read when it's all in a workers.Go(func() { /* lots of content here */ })

Perhaps we could also deal with the error case inline in the inline func.

Something like:

workers.Go(func() { defer wg.Done() remainingRanges, timesWithErrors = s.processReaders(...) s.markRunResultErrorsAndUnfulfilled(resultLock, r, requestedRanges, remainingRanges, timesWithErrors) })

robskillington · 2020-01-07T17:01:55Z

src/dbnode/storage/bootstrap/bootstrapper/persist.go

+}
+
+// CreateFlushBatchFn creates a batch flushing fn for code reuse.
+func CreateFlushBatchFn(


Could we perhaps add this as a method to an IndexBuilder type that wraps the segment.DocumentsBuilder?

i.e.

type IndexBuilders struct { sync.Mutex builders map[xtime.UnixNano]*IndexBuilder } func (b *IndexBuilders) GetOrAddBuilder(st time.Time) *IndexBuilder { b.Lock() defer b.Unlock() // return a *IndexBuilder for the time } type IndexBuilder struct { sync.Mutex builder segment.DocumentsBuilder } func (b *IndexBuilder) FlushBatch(batch []doc.Document) ([]doc.Document, error) { if len(batch) == 0 { // Last flush might not have any docs enqueued return batch, nil } b.Lock() err := b.builder.InsertBatch(index.Batch{ Docs: batch, AllowPartialUpdates: true, }) b.Unlock() // deal with error and resetting batch return batch, nil }

robskillington · 2020-01-07T18:51:04Z

src/dbnode/storage/bootstrap/bootstrapper/readers.go

+		// Useful for tests.
+		return
+	}
+	p.Lock()


Somewhat related commentary: Hopefully with Go 1.14 we can just use defer everywhere to unlock locks with pretty much zero cost: https://twitter.com/bradfitz/status/1184547163235184640?lang=en

robskillington · 2020-01-07T18:56:28Z

src/dbnode/storage/bootstrap/bootstrapper/persist.go

+	builder, ok := indexBuilders[xtime.ToUnixNano(blockStart)]
+	if !ok {
+		// No-op if there are is no index builder for this time block (nothing to persist).
+		return nil


Hm wonder if we should error out in this case? I would expect if we call BuildBootstrapIndexSegment(...) we really do want to actually build the segment?

Maybe we should avoid this being a precondition.

Looks like we still need to return an error here I think?

robskillington · 2020-01-07T18:56:44Z

src/dbnode/storage/bootstrap/bootstrapper/persist.go

+	builder, ok := indexBuilders[xtime.ToUnixNano(blockStart)]
+	if !ok {
+		// No-op if there are no index builders for this time block (nothing to persist).
+		return nil


Same here, wonder if we should error out in this case? I would expect if we call PersistBootstrapIndexSegment(...) we really do want to actually build the segment?

Maybe we should avoid this being a precondition.

Looks like we still need to return an error here I think?

…er or not segment came from disk.

…te if set/not set

…docs-builder

…ex block.

src/dbnode/storage/bootstrap/bootstrapper/fs/source.go

robskillington · 2020-02-12T03:38:25Z

src/dbnode/storage/bootstrap/bootstrapper/peers/source.go

 			idxOpts,
 		)
 		s.markRunResultErrorsAndUnfulfilled(resultLock, r, timeWindowReaders.Ranges,
 			remainingRanges, timesWithErrors)
+		// NB(bodu): Since we are re-using the same builder for all bootstrapped index blocks,
+		// it is not thread safe and requires reset after every processed index block.
+		s.builder.Builder().Reset(0)


Here too, would call Reset just before processReaders(...)

src/dbnode/storage/bootstrap/bootstrapper/fs/source.go

robskillington · 2020-02-13T13:16:03Z

src/dbnode/storage/bootstrap/bootstrapper/peers/source.go

@@ -183,6 +207,7 @@ func (s *peersSource) readData(
 		shardRetrieverMgr block.DatabaseShardBlockRetrieverManager
 		persistFlush      persist.FlushPreparer
 		shouldPersist     = false
+		// TODO(bodu): We should migrate to series.CacheLRU only.


robskillington

LGTM

notbdu added 2 commits December 16, 2019 20:22

Switch mem segment usage over to documents builder.

5ba1455

Multiplex peer bootstrapping blocks across threads. Use documents bui…

d898ce8

…lder and persist after finishing.

notbdu requested a review from robskillington December 21, 2019 07:53

robskillington reviewed Dec 22, 2019

View reviewed changes

src/dbnode/storage/bootstrap/bootstrapper/peers/source.go Show resolved Hide resolved

robskillington reviewed Dec 22, 2019

View reviewed changes

notbdu commented Dec 22, 2019

View reviewed changes

Bootstrap from disk in the peers path.

ffcb32c

notbdu added 2 commits January 3, 2020 18:05

Fix method name.

aabe25a

Add compactor and split up index builders from results.

2270b67

robskillington reviewed Jan 7, 2020

View reviewed changes

notbdu added 9 commits January 7, 2020 14:43

Address PR comments.

543d4fc

Fix tests and use wrapper struct in bootstrap code to determine wheth…

b4c63d6

…er or not segment came from disk.

Generate.

849a350

Save progress on peers bootstrapping tests.

626e150

Remove index block segments logic. Fix tests.

871d69c

Fix tests.

3fcf2d8

Set peers bootstrapper compactor as well.

35663a1

Merge branch 'master' into bdu/bs-docs-builder

f064773

Fix tests.

bb4e119

robskillington and others added 4 commits February 10, 2020 15:24

Merge branch 'master' into bdu/bs-docs-builder

889e246

Reset terms isSorted field.

c7f5d20

Make setting fs options required for peers and fs bootstrappers.

e74744a

Remove pointer to interfaces, keep as just normal pointers and valida…

41afaff

…te if set/not set

robskillington changed the title ~~Use Documents Builder And Persist Index Data In Peer BS~~ [dbnode] Use Documents Builder And Persist Index Data In Peer BS Feb 11, 2020

notbdu and others added 7 commits February 11, 2020 10:32

Set filesystem options for peers bootstrapper in tests.

cb7ede2

Fix tests and mocks

3860f62

Merge branch 'bdu/bs-docs-builder' of github.com:m3db/m3 into bdu/bs-…

baaaba7

…docs-builder

Remove PodUID from sources for test cluster node IDs

f69d3c6

Set persist manager in tests.

505229b

Drop index builders usage and re-use same index builder for every ind…

2b61573

…ex block.

Add storage data volume templates

1aed1af

robskillington reviewed Feb 12, 2020

View reviewed changes

src/dbnode/storage/bootstrap/bootstrapper/fs/source.go Show resolved Hide resolved

robskillington reviewed Feb 12, 2020

View reviewed changes

notbdu and others added 3 commits February 12, 2020 00:38

Reset builder before processing readers.

f5cac0a

Merge branch 'master' into bdu/bs-docs-builder

8ed08f1

Merge branch 'master' into bdu/bs-docs-builder

db24713

robskillington reviewed Feb 13, 2020

View reviewed changes

src/dbnode/storage/bootstrap/bootstrapper/fs/source.go Show resolved Hide resolved

robskillington reviewed Feb 13, 2020

View reviewed changes

robskillington added 7 commits February 13, 2020 08:35

Return error when index block does not exist in results

adb12c6

Remove unused writeLock interface

42032af

Do not try to build empty segments

0291ccb

nit: Use camelcase in logs

2dbaf80

More robust loadblocks

260765d

Call inner loadblock

12df1ba

Encapsulate sorting to the terms struct

5d0cafc

robskillington approved these changes Feb 13, 2020

View reviewed changes

notbdu merged commit 914105c into master Feb 13, 2020

notbdu deleted the bdu/bs-docs-builder branch February 13, 2020 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dbnode] Use Documents Builder And Persist Index Data In Peer BS #2078

[dbnode] Use Documents Builder And Persist Index Data In Peer BS #2078

notbdu commented Dec 21, 2019 •

edited by robskillington

Loading

robskillington Dec 22, 2019

robskillington Dec 22, 2019

robskillington Dec 22, 2019

robskillington Dec 22, 2019

notbdu Dec 22, 2019

codecov bot commented Jan 3, 2020 •

edited

Loading

robskillington Jan 7, 2020

robskillington Jan 7, 2020

robskillington Jan 7, 2020 •

edited

Loading

robskillington Jan 7, 2020 •

edited

Loading

robskillington Jan 7, 2020

robskillington Jan 7, 2020

robskillington Feb 12, 2020

robskillington Jan 7, 2020 •

edited

Loading

robskillington Feb 12, 2020

robskillington Feb 12, 2020

robskillington Feb 13, 2020

robskillington left a comment

[dbnode] Use Documents Builder And Persist Index Data In Peer BS #2078

[dbnode] Use Documents Builder And Persist Index Data In Peer BS #2078

Conversation

notbdu commented Dec 21, 2019 • edited by robskillington Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 3, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington Jan 7, 2020 • edited Loading

Choose a reason for hiding this comment

robskillington Jan 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington Jan 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington left a comment

Choose a reason for hiding this comment

notbdu commented Dec 21, 2019 •

edited by robskillington

Loading

codecov bot commented Jan 3, 2020 •

edited

Loading

robskillington Jan 7, 2020 •

edited

Loading

robskillington Jan 7, 2020 •

edited

Loading

robskillington Jan 7, 2020 •

edited

Loading