[dbnode] Gracefully handle reads including documents with stale index state #3905

nbroyles · 2021-11-05T19:47:28Z

What this PR does / why we need it:

In db nodes, state is kept for cheap lookups to determine if a series is
indexed for a specific block start. This state exists on the document in the
index and in the shard object. On concurrent writes to the same time series,
it's possible for these two sources to get out of sync. Basically, the
state associated with the document on the index is a different instance
than the one that gets placed in the shard's lookup map. This can
result in issues where we don't accurately track that a series in
indexed for a specific block start as only the entry in the lookup
map will get updated in the future.

To resolve this, we introduce reconciliation methods when querying these
objects. If it's noticed on read that we're querying index state that
never got inserted into the shard lookup map, then we query the shard
for the appropriate entry and use the results on that object.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

Does this PR require updating code package or user-facing documentation?:

nbroyles · 2021-11-06T20:52:29Z

src/m3ninx/doc/types.go

+	// Attempts to retrieve the most recent index entry from the shard if this entry
+	// was never inserted there. If there is an error, returns the value for this entry.
+	// Additionally, returns a boolean to denote if result came from reconciliation.
+	ReconciledIndexedForBlockStart(blockStart xtime.UnixNano) (bool, bool)


FWIW, this API works, but it'd be much friendlier, I think, if we could just return the storage.Entry from the lookup map as an OnIndexSeries here. The problem is we'd need to expose methods on this interface for incrementing and decrementing the reader+writer count which ends up feeling equally unfriendly and more error prone.

True yeah.

Although things we could do to make that nicer:

type CleanupFn func() ReconciledOnIndexSeries() (OnIndexSeries, CleanupFn)

Then to call it:

onIndexSeries, cleanup := entry.ReconciledOnIndexSeries() defer cleanup() // whatever with onIndexSeries

Or something which could be more allocation intensive is passing a function/lambda and doing the lookup and the decrement/etc afterwards explicitly (which might need to be dynamically allocated):

WithReconciledOnIndexSeries(callback func (OnIndexSeries))

I like the cleanup option since that's a convention we follow in a few different places. Going to rework this PR to do that since I think it'll allow us to get rid of specific Reconciled... functions.

Just saw this comment now - but my vote for making this less error prone for the consumer of metadata.OnIndexSeries is to instead have a separate interface that just provides a method for properly retrieving OnIndexSeries so then whoever has Metadata is forced to go through this reconciliation path when consuming

robskillington · 2021-11-06T21:11:14Z

src/dbnode/integration/index_block_orphaned_entry_test.go

+	require.NoError(t, err)
+}
+
+func generateTestSetup(t *testing.T) TestSetup {


nit: Would be nice if we were able to namespace these method names. I guess if other tests want to use the same kind of thing they can in the future pass in their own overrides and we can move this to a shared place. e.g. func generateTestSetup(t *testing.T, customOpts func(storage.Options) storage.Options) { ... }.

src/dbnode/storage/entry.go

src/dbnode/storage/index/block.go

codecov · 2021-11-08T02:04:57Z

Codecov Report

Merging #3905 (9fdd964) into master (f79f7e6) will increase coverage by 0.0%.
The diff coverage is 95.4%.

@@          Coverage Diff           @@
##           master   #3905   +/-   ##
======================================
  Coverage    56.6%   56.6%           
======================================
  Files         553     553           
  Lines       63275   63296   +21     
======================================
+ Hits        35815   35863   +48     
+ Misses      24256   24238   -18     
+ Partials     3204    3195    -9

Flag	Coverage Δ
aggregator	`62.1% <ø> (-0.2%)`	⬇️
cluster	`∅ <ø> (∅)`
collector	`58.4% <ø> (ø)`
dbnode	`60.4% <95.4%> (+0.1%)`	⬆️
m3em	`46.4% <ø> (ø)`
metrics	`19.7% <ø> (ø)`
msg	`74.4% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f79f7e6...9fdd964. Read the comment docs.

rallen090 · 2021-11-09T14:45:35Z

src/dbnode/integration/index_helpers.go

-			StartInclusive: xtime.Now(),
-			EndExclusive:   xtime.Now(),
+			StartInclusive: queryTime,
+			EndExclusive:   queryTime.Add(time.Nanosecond),


Why does this have an additional nanosecond?

So that the query will actually yield results. If StartInclusive and EndExclusive are the exact same value then nothing will be returned. This was happening implicitly by calling xtime.Now twice in the previous form.

Ah I see so that double Now call was functionally doing the same thing. Kinda scary lol but agreed w/ making this more explicit.

rallen090 · 2021-11-09T14:46:42Z

src/dbnode/storage/entry.go

+// is an error during retrieval, simply returns the current entry. Additionally,
+// returns a cleanup function to run once finished using the reconciled entry and
+// a boolean value indicating whether the result came from reconciliation or not.
+func (entry *Entry) ReconciledOnIndexSeries() (doc.OnIndexSeries, doc.ReconciledOnIndexSeriesCleanupFn, bool) {


Maybe instead of a func + bool, we just return a func and caller checks if it is non-nil?

Or always return a func and caller always calls close (and something that func is just noop)

The bool is just so we can know if we need to increment the reconciled metric. The caller always calls cleanup which is a noop if there's nothing to reconcile.

It is a bit of a Go anti-pattern to check for nilability instead of just returning a bool as the last arg (since most folks will know based on Go APIs to check a bool if it's returned, where as they might not know whether to check if something is nil since it's not a common pattern).

rallen090 · 2021-11-09T14:49:31Z

src/dbnode/storage/index/block.go

@@ -543,15 +545,20 @@ func (b *block) queryWithSpan(
 		// Ensure that the block contains any of the relevant time segments for the query range.
 		doc := iter.Current()


This whole range-check code block has gotten pretty big so maybe split to another func. Then we can also defer / make more readable the cleanup func?

Good call. Refactored this a bit. PTAL

rallen090 · 2021-11-09T14:55:46Z

src/dbnode/storage/index/block.go

@@ -543,15 +545,20 @@ func (b *block) queryWithSpan(
 		// Ensure that the block contains any of the relevant time segments for the query range.
 		doc := iter.Current()
 		if md, ok := doc.Metadata(); ok && md.OnIndexSeries != nil {


I wonder if it'd be worth, for preventing mistaken calls directly to metadata.OnIndexSeries in the future, having a separate interface that Entry implements called something like OnIndexSeriesReference that just has a single func for GetOnIndexSeries (i.e. ReconciledOnIndexSeries). And then we have Metadata have that so a caller is forced to always go through this reconciled API instead of referencing the field directly

Do we always want to reconcile though? I'm not sure that's true from looking at other users of metadata.OnIndexSeries. For example, when we call NeedsIndexGarbageCollected it seems like it specifically doesn't reconcile because it's not worth the extra cost:

// NeedsIndexGarbageCollected checks if the entry is eligible to be garbage collected // from the index. Otherwise returns false. func (entry *Entry) NeedsIndexGarbageCollected() bool { // This is a cheaper check that loading the entry from the shard again // which makes it cheaper to run frequently. // It may not be as accurate, but it's fine for an approximation since // only a single series in a segment needs to return true to trigger an // index segment to be garbage collected. if entry.insertTime.Load() == 0 { return false // Not inserted, does not need garbage collection. } // Check that a write is not potentially pending and the series is empty. return entry.ReaderWriterCount() == 0 && entry.Series.IsEmpty() }

If we change the interface here, we're always going to reconcile when insertTime is 0.

Hm yeah good point that we intentionally don't want to reconcile in other cases.

rallen090 · 2021-11-09T14:56:00Z

src/dbnode/storage/index/block.go

@@ -543,15 +545,20 @@ func (b *block) queryWithSpan(
 		// Ensure that the block contains any of the relevant time segments for the query range.
 		doc := iter.Current()
 		if md, ok := doc.Metadata(); ok && md.OnIndexSeries != nil {
+			onIndexSeries, cleanup, reconciled := md.OnIndexSeries.ReconciledOnIndexSeries()
+			if reconciled {
+				b.metrics.reconciledEntryOnQuery.Inc(1)


Good call on the metric

… state In db nodes, state is kept for cheap lookups to determine if a series is indexed for a specific block start. This state exists on the document in the index and in the shard object. On concurrent writes to the same time series, it's possible for these two sources to get out of sync. Basically, the state associated with the document on the index is a different instance than the one that gets placed in the shard's lookup map. This can result in issues where we don't accurately track that a series in indexed for a specific block start as only the entry in the lookup map will get updated in the future. To resolve this, we introduce reconciliation methods when querying these objects. If it's noticed on read that we're querying index state that never got inserted into the shard lookup map, then we query the shard for the appropriate entry and use the results on that object.

robskillington · 2021-11-09T16:35:54Z

src/m3ninx/doc/types.go

+
+// ReconciledOnIndexSeriesCleanupFn is a function for performing cleanup when
+// ReconciledOnIndexSeries is called.
+type ReconciledOnIndexSeriesCleanupFn func()


nit: This could just be CleanupFn given it could be used for other returns/etc since it's just a func()?

To be even cleaner you could return a resource.SimpleCloser.
https://pkg.go.dev/github.com/m3db/[email protected]/src/x/resource#SimpleCloser

e.g.

import ( "github.com/m3db/m3/src/x/resource" ) // then definition ReconciledOnIndexSeries() (OnIndexSeries, resource.SimpleCloser, bool) // then when returning e.g. cast the func() to a resource.SimplerCloserFunc which implements resource.SimpleCloser return e, resource.SimpleCloserFn(func() { e.DecrementReaderWriterCount() }), true // then using the results entry, closer, ok := entry.ReconciledOnIndexSeries() defer closer.Close()

Ah, I like this

robskillington

LGTM other than minor nits to either address or not asynchronously

robskillington · 2021-11-09T16:37:58Z

src/dbnode/storage/index/block.go

+	onIndexSeries, cleanup, reconciled := md.OnIndexSeries.ReconciledOnIndexSeries()
+	if reconciled {
+		b.metrics.reconciledEntryOnQuery.Inc(1)
+	}


nit: Maybe increment a metric when it's not reconciled too? That way we can tell the magnitude of entries that are reconciled vs not-reconciled (i.e. is 1k/sec big or small number? hard to know if we don't know the total).

robskillington · 2021-11-09T16:40:16Z

src/dbnode/storage/entry.go

+// is an error during retrieval, simply returns the current entry. Additionally,
+// returns a cleanup function to run once finished using the reconciled entry and
+// a boolean value indicating whether the result came from reconciliation or not.
+func (entry *Entry) ReconciledOnIndexSeries() (doc.OnIndexSeries, doc.ReconciledOnIndexSeriesCleanupFn, bool) {


It is a bit of a Go anti-pattern to check for nilability instead of just returning a bool as the last arg (since most folks will know based on Go APIs to check a bool if it's returned, where as they might not know whether to check if something is nil since it's not a common pattern).

nbroyles requested a review from robskillington November 5, 2021 19:47

nbroyles force-pushed the nb/mac-cpu branch from 4164ad1 to d997e2f Compare November 6, 2021 19:31

Base automatically changed from nb/mac-cpu to master November 6, 2021 19:50

nbroyles force-pushed the nb/read-query-fix branch 2 times, most recently from 072a3c2 to b8e3286 Compare November 6, 2021 20:49

nbroyles commented Nov 6, 2021

View reviewed changes

robskillington reviewed Nov 6, 2021

View reviewed changes

src/dbnode/storage/entry.go Outdated Show resolved Hide resolved

robskillington reviewed Nov 6, 2021

View reviewed changes

src/dbnode/storage/entry.go Outdated Show resolved Hide resolved

robskillington reviewed Nov 6, 2021

View reviewed changes

src/dbnode/storage/index/block.go Outdated Show resolved Hide resolved

nbroyles force-pushed the nb/read-query-fix branch from b8e3286 to a07684d Compare November 8, 2021 01:47

rallen090 reviewed Nov 9, 2021

View reviewed changes

rallen090 approved these changes Nov 9, 2021

View reviewed changes

nbroyles force-pushed the nb/read-query-fix branch from a07684d to dd56923 Compare November 9, 2021 15:50

robskillington reviewed Nov 9, 2021

View reviewed changes

robskillington approved these changes Nov 9, 2021

View reviewed changes

nbroyles force-pushed the nb/read-query-fix branch from dd56923 to a5d486f Compare November 9, 2021 17:45

PR feedback

9fdd964

nbroyles force-pushed the nb/read-query-fix branch from a5d486f to 9fdd964 Compare November 9, 2021 18:34

nbroyles merged commit f03c059 into master Nov 9, 2021

nbroyles deleted the nb/read-query-fix branch November 9, 2021 18:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dbnode] Gracefully handle reads including documents with stale index state #3905

[dbnode] Gracefully handle reads including documents with stale index state #3905

nbroyles commented Nov 5, 2021

nbroyles Nov 6, 2021

robskillington Nov 6, 2021

nbroyles Nov 7, 2021

rallen090 Nov 9, 2021

robskillington Nov 6, 2021 •

edited

Loading

codecov bot commented Nov 8, 2021 •

edited

Loading

rallen090 Nov 9, 2021

nbroyles Nov 9, 2021

rallen090 Nov 9, 2021

rallen090 Nov 9, 2021

rallen090 Nov 9, 2021

nbroyles Nov 9, 2021

robskillington Nov 9, 2021

rallen090 Nov 9, 2021

nbroyles Nov 9, 2021

rallen090 Nov 9, 2021

nbroyles Nov 9, 2021

rallen090 Nov 9, 2021

rallen090 Nov 9, 2021

robskillington Nov 9, 2021 •

edited

Loading

nbroyles Nov 9, 2021

robskillington left a comment

robskillington Nov 9, 2021

robskillington Nov 9, 2021

		@@ -543,15 +545,20 @@ func (b *block) queryWithSpan(
		// Ensure that the block contains any of the relevant time segments for the query range.
		doc := iter.Current()

[dbnode] Gracefully handle reads including documents with stale index state #3905

[dbnode] Gracefully handle reads including documents with stale index state #3905

Conversation

nbroyles commented Nov 5, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington Nov 6, 2021 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Nov 8, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington Nov 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington Nov 6, 2021 •

edited

Loading

codecov bot commented Nov 8, 2021 •

edited

Loading

robskillington Nov 9, 2021 •

edited

Loading