Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbnode] Gracefully handle reads including documents with stale index state #3905

Merged
merged 2 commits into from
Nov 9, 2021

Conversation

nbroyles
Copy link
Collaborator

@nbroyles nbroyles commented Nov 5, 2021

What this PR does / why we need it:

In db nodes, state is kept for cheap lookups to determine if a series is
indexed for a specific block start. This state exists on the document in the
index and in the shard object. On concurrent writes to the same time series,
it's possible for these two sources to get out of sync. Basically, the
state associated with the document on the index is a different instance
than the one that gets placed in the shard's lookup map. This can
result in issues where we don't accurately track that a series in
indexed for a specific block start as only the entry in the lookup
map will get updated in the future.

To resolve this, we introduce reconciliation methods when querying these
objects. If it's noticed on read that we're querying index state that
never got inserted into the shard lookup map, then we query the shard
for the appropriate entry and use the results on that object.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:


Does this PR require updating code package or user-facing documentation?:


Base automatically changed from nb/mac-cpu to master November 6, 2021 19:50
@nbroyles nbroyles force-pushed the nb/read-query-fix branch 2 times, most recently from 072a3c2 to b8e3286 Compare November 6, 2021 20:49
// Attempts to retrieve the most recent index entry from the shard if this entry
// was never inserted there. If there is an error, returns the value for this entry.
// Additionally, returns a boolean to denote if result came from reconciliation.
ReconciledIndexedForBlockStart(blockStart xtime.UnixNano) (bool, bool)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this API works, but it'd be much friendlier, I think, if we could just return the storage.Entry from the lookup map as an OnIndexSeries here. The problem is we'd need to expose methods on this interface for incrementing and decrementing the reader+writer count which ends up feeling equally unfriendly and more error prone.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True yeah.

Although things we could do to make that nicer:

type CleanupFn func()

ReconciledOnIndexSeries() (OnIndexSeries, CleanupFn)

Then to call it:

onIndexSeries, cleanup := entry.ReconciledOnIndexSeries()
defer cleanup()

// whatever with onIndexSeries

Or something which could be more allocation intensive is passing a function/lambda and doing the lookup and the decrement/etc afterwards explicitly (which might need to be dynamically allocated):

WithReconciledOnIndexSeries(callback func (OnIndexSeries))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the cleanup option since that's a convention we follow in a few different places. Going to rework this PR to do that since I think it'll allow us to get rid of specific Reconciled... functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw this comment now - but my vote for making this less error prone for the consumer of metadata.OnIndexSeries is to instead have a separate interface that just provides a method for properly retrieving OnIndexSeries so then whoever has Metadata is forced to go through this reconciliation path when consuming

require.NoError(t, err)
}

func generateTestSetup(t *testing.T) TestSetup {
Copy link
Collaborator

@robskillington robskillington Nov 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would be nice if we were able to namespace these method names. I guess if other tests want to use the same kind of thing they can in the future pass in their own overrides and we can move this to a shared place. e.g. func generateTestSetup(t *testing.T, customOpts func(storage.Options) storage.Options) { ... }.

@codecov
Copy link

codecov bot commented Nov 8, 2021

Codecov Report

Merging #3905 (9fdd964) into master (f79f7e6) will increase coverage by 0.0%.
The diff coverage is 95.4%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #3905   +/-   ##
======================================
  Coverage    56.6%   56.6%           
======================================
  Files         553     553           
  Lines       63275   63296   +21     
======================================
+ Hits        35815   35863   +48     
+ Misses      24256   24238   -18     
+ Partials     3204    3195    -9     
Flag Coverage Δ
aggregator 62.1% <ø> (-0.2%) ⬇️
cluster ∅ <ø> (∅)
collector 58.4% <ø> (ø)
dbnode 60.4% <95.4%> (+0.1%) ⬆️
m3em 46.4% <ø> (ø)
metrics 19.7% <ø> (ø)
msg 74.4% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f79f7e6...9fdd964. Read the comment docs.

StartInclusive: xtime.Now(),
EndExclusive: xtime.Now(),
StartInclusive: queryTime,
EndExclusive: queryTime.Add(time.Nanosecond),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this have an additional nanosecond?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that the query will actually yield results. If StartInclusive and EndExclusive are the exact same value then nothing will be returned. This was happening implicitly by calling xtime.Now twice in the previous form.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see so that double Now call was functionally doing the same thing. Kinda scary lol but agreed w/ making this more explicit.

// is an error during retrieval, simply returns the current entry. Additionally,
// returns a cleanup function to run once finished using the reconciled entry and
// a boolean value indicating whether the result came from reconciliation or not.
func (entry *Entry) ReconciledOnIndexSeries() (doc.OnIndexSeries, doc.ReconciledOnIndexSeriesCleanupFn, bool) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of a func + bool, we just return a func and caller checks if it is non-nil?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or always return a func and caller always calls close (and something that func is just noop)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bool is just so we can know if we need to increment the reconciled metric. The caller always calls cleanup which is a noop if there's nothing to reconcile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit of a Go anti-pattern to check for nilability instead of just returning a bool as the last arg (since most folks will know based on Go APIs to check a bool if it's returned, where as they might not know whether to check if something is nil since it's not a common pattern).

@@ -543,15 +545,20 @@ func (b *block) queryWithSpan(
// Ensure that the block contains any of the relevant time segments for the query range.
doc := iter.Current()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole range-check code block has gotten pretty big so maybe split to another func. Then we can also defer / make more readable the cleanup func?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Refactored this a bit. PTAL

@@ -543,15 +545,20 @@ func (b *block) queryWithSpan(
// Ensure that the block contains any of the relevant time segments for the query range.
doc := iter.Current()
if md, ok := doc.Metadata(); ok && md.OnIndexSeries != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it'd be worth, for preventing mistaken calls directly to metadata.OnIndexSeries in the future, having a separate interface that Entry implements called something like OnIndexSeriesReference that just has a single func for GetOnIndexSeries (i.e. ReconciledOnIndexSeries). And then we have Metadata have that so a caller is forced to always go through this reconciled API instead of referencing the field directly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always want to reconcile though? I'm not sure that's true from looking at other users of metadata.OnIndexSeries. For example, when we call NeedsIndexGarbageCollected it seems like it specifically doesn't reconcile because it's not worth the extra cost:

// NeedsIndexGarbageCollected checks if the entry is eligible to be garbage collected
// from the index. Otherwise returns false.
func (entry *Entry) NeedsIndexGarbageCollected() bool {
	// This is a cheaper check that loading the entry from the shard again
	// which makes it cheaper to run frequently.
	// It may not be as accurate, but it's fine for an approximation since
	// only a single series in a segment needs to return true to trigger an
	// index segment to be garbage collected.
	if entry.insertTime.Load() == 0 {
		return false // Not inserted, does not need garbage collection.
	}
	// Check that a write is not potentially pending and the series is empty.
	return entry.ReaderWriterCount() == 0 && entry.Series.IsEmpty()
}

If we change the interface here, we're always going to reconcile when insertTime is 0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah good point that we intentionally don't want to reconcile in other cases.

@@ -543,15 +545,20 @@ func (b *block) queryWithSpan(
// Ensure that the block contains any of the relevant time segments for the query range.
doc := iter.Current()
if md, ok := doc.Metadata(); ok && md.OnIndexSeries != nil {
onIndexSeries, cleanup, reconciled := md.OnIndexSeries.ReconciledOnIndexSeries()
if reconciled {
b.metrics.reconciledEntryOnQuery.Inc(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on the metric

… state

In db nodes, state is kept for cheap lookups to determine if a series is
indexed for a specific block start. This state exists on the document in the
index and in the shard object. On concurrent writes to the same time series,
it's possible for these two sources to get out of sync. Basically, the
state associated with the document on the index is a different instance
than the one that gets placed in the shard's lookup map. This can
result in issues where we don't accurately track that a series in
indexed for a specific block start as only the entry in the lookup
map will get updated in the future.

To resolve this, we introduce reconciliation methods when querying these
objects. If it's noticed on read that we're querying index state that
never got inserted into the shard lookup map, then we query the shard
for the appropriate entry and use the results on that object.

// ReconciledOnIndexSeriesCleanupFn is a function for performing cleanup when
// ReconciledOnIndexSeries is called.
type ReconciledOnIndexSeriesCleanupFn func()
Copy link
Collaborator

@robskillington robskillington Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This could just be CleanupFn given it could be used for other returns/etc since it's just a func()?

To be even cleaner you could return a resource.SimpleCloser.
https://pkg.go.dev/github.com/m3db/[email protected]/src/x/resource#SimpleCloser

e.g.

import (
  "github.com/m3db/m3/src/x/resource"
)

// then definition
ReconciledOnIndexSeries() (OnIndexSeries, resource.SimpleCloser, bool)

// then when returning e.g. cast the func() to a resource.SimplerCloserFunc which implements resource.SimpleCloser
	return e, resource.SimpleCloserFn(func() {
		e.DecrementReaderWriterCount()
	}), true

// then using the results
entry, closer, ok := entry.ReconciledOnIndexSeries()
defer closer.Close()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I like this

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than minor nits to either address or not asynchronously

onIndexSeries, cleanup, reconciled := md.OnIndexSeries.ReconciledOnIndexSeries()
if reconciled {
b.metrics.reconciledEntryOnQuery.Inc(1)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe increment a metric when it's not reconciled too? That way we can tell the magnitude of entries that are reconciled vs not-reconciled (i.e. is 1k/sec big or small number? hard to know if we don't know the total).

// is an error during retrieval, simply returns the current entry. Additionally,
// returns a cleanup function to run once finished using the reconciled entry and
// a boolean value indicating whether the result came from reconciliation or not.
func (entry *Entry) ReconciledOnIndexSeries() (doc.OnIndexSeries, doc.ReconciledOnIndexSeriesCleanupFn, bool) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit of a Go anti-pattern to check for nilability instead of just returning a bool as the last arg (since most folks will know based on Go APIs to check a bool if it's returned, where as they might not know whether to check if something is nil since it's not a common pattern).

@nbroyles nbroyles merged commit f03c059 into master Nov 9, 2021
@nbroyles nbroyles deleted the nb/read-query-fix branch November 9, 2021 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants