[index] Aggregating results on storage side #1463

arnikola · 2019-03-15T17:47:20Z

~~TODO:~~

~~Test coverage~~
~~Rebase on Use a single index Results when querying across blocks #1474~~

arnikola · 2019-03-15T17:53:15Z

src/dbnode/generated-source-files.mk

+	genny-map-storage-namespace-metadata        \
+	genny-map-storage-repair                    \
+	genny-map-storage-index-results             \
+	genny-map-storage-index-search-results      \


oops obviously doesn't exist

prateek · 2019-03-15T18:06:10Z

src/dbnode/generated-source-files.mk

+# Map generation rule for storage/index/AggregationResultsMap
+.PHONY: genny-map-storage-index-aggregation-results
+genny-map-storage-index-aggregation-results: install-m3x-repo
+	cd $(m3x_package_path) && make hashmap-gen                \


s/hashmap-gen/idhashmap-gen

It’s the target for genny maps with ident.ID keys, so then you can get rid of the key_type specifier below.

prateek · 2019-03-18T17:40:22Z

src/dbnode/storage/types.go

+		ctx context.Context,
+		query index.Query,
+		opts index.QueryOptions,
+	) (index.QueryResults, error)


need to update the return value

prateek · 2019-03-18T17:40:26Z

src/dbnode/storage/types.go

+		ctx context.Context,
+		query index.Query,
+		opts index.QueryOptions,
+	) (index.QueryResults, error)


src/dbnode/storage/types.go

prateek · 2019-03-18T17:41:54Z

src/dbnode/storage/index/types.go

@@ -65,6 +65,15 @@ type QueryOptions struct {
 	StartInclusive time.Time
 	EndExclusive   time.Time
 	Limit          int
+
+	// Optional param to filter aggregate values.
+	TermFilter *AggregateValuesMap


break into separate struct - i.e. AggregateQueryOptions v QueryOptions should be separate

prateek · 2019-03-18T17:42:38Z

src/dbnode/generated-source-files.mk

@@ -157,20 +157,6 @@ genny-map-storage-index-results: install-m3x-repo
 	# Rename generated map file
 	mv -f $(m3db_package_path)/src/dbnode/storage/index/map_gen.go $(m3db_package_path)/src/dbnode/storage/index/results_map_gen.go

-# Map generation rule for storage/index/AggregationValuesMap
-.PHONY: genny-map-storage-index-aggregation-values


why delete this?

I think this one was github weirdness, right? Or still a valid comment?

prateek · 2019-03-18T17:47:10Z

src/dbnode/generated-source-files.mk

+# Map generation rule for dependent generated maps which are built on top of a
+# generated map
+.PHONY: genny-map-dependent-all
+genny-map-dependent-all:                      \


don't think you need to break it up like this. i.e. you can just have

genny-map-all: ... genny-map-storage-index-aggregation-results genny-map-storage-index-aggregation-results: genny-map-whichever-one-this-depends-on ...

prateek · 2019-03-18T17:54:41Z

src/dbnode/storage/index/aggregate_results.go

+	errUnableToAddAggregateResultMissingID = errors.New("no id for result")
+)
+
+type aggregatedResults struct {


need tests for this struct verifying a few different things:

aggregation results

copying/non-copying behaviour

finalizing/non-finalizing behaviour

Yeah, this is missing tests in general at the moment; looking to verify structure of the feature in general before writing them up

prateek · 2019-03-18T17:55:25Z

src/dbnode/storage/index/aggregate_results.go

+	document doc.Document,
+	opts QueryOptions,
+) error {
+	// TODO: is this neccessary to check for document correctness?


why do this?

I guess there's my answer :p

Was going to check with someone more versed in the code than I to see if this represents an error case

prateek · 2019-03-18T17:55:51Z

src/dbnode/storage/index/aggregate_results.go

+	return nil
+}
+
+func (r *aggregatedResults) AggregateDocument(


why have AggregateDocument and AddIDAndValues separate?

This is to match the flow of the regular Query case; when it initially matches the block, it will be added through the AggregateDocument path, then when the results from multiple blocks are being merged, one is selected as the merging AggregateResult, and the rest are iterated through and call AddIDAndValues on that one to merge the results

hm i think you can do AggregateDocument or AddAggregatedValues (doesn't need ID in the second method)

prateek · 2019-03-18T17:58:57Z

src/dbnode/storage/index/aggregate_results.go

+	}
+
+	// if this term hasn't been seen, ensure it should be included in output.
+	if !opts.TermFilter.Contains(termID) {


hm should this be the first thing you do in the function?

So I went back and forth on this; if it's already been added to the map, that indicates it's a valid value and there's no reason to check against the term filter

I guess this should be arranged by what's more likely; failing the filter lookup, or succeeding (the current way is biased towards succeeding)

Mainly suggesting to do it early cause the flow made more sense, early terminating and all that. Don't think it makes a difference for perf really

prateek · 2019-03-18T18:00:01Z

src/dbnode/storage/index/types.go

+	// including returning it to a backing pool.
+	Finalize()
+
+	// NoFinalize marks the AggregateResults such that a subsequent call to


do you need this method?

Likely not; will remove

prateek · 2019-03-18T18:00:09Z

src/dbnode/storage/index/types.go

+	// Reset resets the AggregateResults object to initial state.
+	Reset(nsID ident.ID)
+
+	// Finalize releases any resources held by the AggregateResults object,


same here, is this needed?

Likely not; will remove

prateek · 2019-03-18T18:01:17Z

src/dbnode/storage/index/types.go

+}
+
+// AggregateValues is a collection of values for an aggregation query.
+type AggregateValues interface {


is this just a wrapper on a code-gen'd type, if so - can we use the struct where needed instead?

Good call, I'll give it a shot, hopefully the code gen will still be happy with it

prateek · 2019-03-18T18:01:35Z

src/dbnode/storage/index/types.go

@@ -169,6 +271,13 @@ type Block interface {
 		results Results,
 	) (exhaustive bool, err error)

+	// AggregateQuery resolves the given query into aggregated tags.
+	AggregateQuery(


same nit about opts type

prateek · 2019-03-18T19:23:16Z

src/dbnode/storage/index.go

+			// possible this block may get closed if it slides out of retention, in
+			// that case those results are no longer considered valid and outside of
+			// retention regardless, so this is a non-issue.
+			err = nil


can you just return early here?

Yeah, seems to be the case

prateek · 2019-03-18T19:23:39Z

src/dbnode/storage/index.go

+		wg       sync.WaitGroup
+
+		// Results contains all concurrent mutable state below.
+		results = struct {


can you extract this into a separate struct and reuse it across both places

Sure, had something like that originally while trying to jam it into a combined function :)

prateek · 2019-03-18T19:25:49Z

src/dbnode/storage/index.go

+			mergedResult = true
+			for _, entry := range aggregateResults.Map().Iter() {
+				// Append to merged results.
+				id, tags := entry.Key(), entry.Value()


why is the ID needed in aggregation results?

Probably a bad name; id here represents unique tag names

prateek · 2019-03-18T19:26:51Z

src/dbnode/storage/index.go

@@ -1436,3 +1440,230 @@ func (shards dbShards) IDs() []uint32 {
 	}
 	return ids
 }
+
+func (i *nsIndex) AggregateQuery(


nit: refactor this so that Query() and AggregateQuery() share the structural components of the method.

Was trying that but it got a little messy, will take another stab at it

robskillington · 2019-03-26T12:14:30Z

src/dbnode/storage/index/aggregate_results.go

+	valueBytes.IncRef()
+	valueBytes.AppendAll(value)
+	valueBytes.DecRef()
+	valueID := r.idPool.BinaryID(valueBytes)


Sorry, I realize you can actually use a helper function on the ID pool to make this code a little simpler:

valueID := r.idPool.Clone(ident.BytesID(value))

robskillington · 2019-03-26T14:49:19Z

src/dbnode/storage/index/aggregate_values.go

+
+	// NB: fine to overwrite the values here.
+	v.valuesMap.Set(bytesID, struct{}{})
+	return nil


Can just condense this (since .Set(...) takes an ident.ID and will copy it) into:

if len(value.Bytes()) == 0 { return errUnableToAddValueMissingID } // NB: fine to overwrite the values here. v.valuesMap.Set(value, struct{}{}) return nil

Yeah already have sorry, latest push was halfway consolidated :p

prateek · 2019-03-26T16:43:38Z

src/dbnode/storage/namespace.go

@@ -598,24 +598,49 @@ func (n *dbNamespace) QueryIDs(
 	ctx context.Context,
 	query index.Query,
 	opts index.QueryOptions,
-) (index.QueryResults, error) {
+) (index.QueryResult, error) {


+1 for rename

prateek · 2019-03-26T16:44:10Z

src/dbnode/storage/namespace.go

+	}
+
+	res, err := n.reverseIndex.AggregateQuery(ctx, query, opts, aggResultOpts)
+	n.metrics.queryIDs.ReportSuccessOrError(err, n.nowFn().Sub(callStart))


nit: add a metric for aggregateQuery (similar to queryIDs)

(and reuse in all calls in this function)

Nice one 👍

prateek · 2019-03-26T16:47:00Z

src/dbnode/storage/index_block_test.go

@@ -566,26 +566,41 @@ func TestNamespaceIndexBlockQuery(t *testing.T) {
 		StartInclusive: t0,
 		EndExclusive:   now.Add(time.Minute),
 	}
-	b0.EXPECT().Query(gomock.Any(), q, qOpts, gomock.Any()).Return(true, nil)
+	aggOpts := index.AggregateResultsOptions{}


nit: could you decouple the tests for the Query and Aggregate. One simple way - copy paste the test you have twice, and make it only do the mocks/calls for Query in one, and Aggregate in the other. That way each is independent of the other. Makes maintenance easier in the long run

Yeah, for sure. I didn't want to go ahead and c/p larger tests so figured would be fine to package them; will refactor to use a test setup method

Tried with a setup method and it looked really bad; did the test twice instead

prateek · 2019-03-26T17:04:35Z

src/dbnode/storage/index/aggregate_values.go

+	return v.valuesMap.Len()
+}
+
+func (v *AggregateValues) reset() {


more a question: is it intentional these methods are un-exported?

Yeah, wanted to restrict these methods to this package so callers don't accidentally close the AggregateValues

prateek · 2019-03-26T17:09:33Z

src/dbnode/storage/index/aggregate_results.go

+) error {
+	for _, field := range document.Fields {
+		if err := r.addFieldWithLock(field.Name, field.Value); err != nil {
+			return err


nit: would be useful to bundle with document here too, something like:
return fmt.Errorf("unable to add document [%+v]: %v", document, err)

prateek · 2019-03-26T17:12:49Z

src/dbnode/storage/index/aggregate_results.go

+
+	// if a term filter is provided, ensure this field matches the filter,
+	// otherwise ignore it.
+	if len(r.aggregateOpts.TermFilter) > 0 {


nit: mind making this a function on the type? something like

type AggregateResultsOptions struct { ... TermFilter [][]byte } type AggregateTermFilter [][]byte func (a AggregateTermFilter) Filter(term []byte) bool { if len(a) == 0 { return false } for ... return true }

SGTM, guess then we can use it elsewhere :)

prateek · 2019-03-26T17:13:58Z

src/dbnode/storage/index/aggregate_results.go

+		return fmt.Errorf(missingDocumentFields, "value")
+	}
+
+	// NB: can cast the []byte -> ident.ID to avoid an alloc


nit: shouldn't this below the filtering code?

as in, termID isn't used until after the filtering. would be better to group along with the other vars there.

Ah yeah good one; the filter function previously required a ident.ID so had to define this earlier, but missed changing it after refactor

prateek · 2019-03-26T17:18:45Z

src/dbnode/storage/index/aggregate_results.go

+func (r *aggregatedResults) Finalize() {
+	r.Lock()
+
+	r.aggregateOpts = AggregateResultsOptions{}


hm could you reuse r.Reset(nil, AggregateResultsOptions{}) here? looks like they're both doing the same thing except the pool return

Hm, seems aggressive for now, we could potentially consolidate this later to unblock find endpoint working?

Sorry, not sure I follow. I was suggesting changing the implementation of Finalize to the following:

func (r *aggregatedResults) Finalize() { r.Reset(nil, AggregateResultsOptions{}) if r.pool == nil { return } r.pool.Put(r) }

Hey, so this method is a little different in that it needs to call finalize on the AggregatedValues map; I tried refactoring it so they'd both call a common function with a flag to determine if it should finalize or reset, but it ended up being a bit awkward so preferred this approach

talked offline: made sense to merge the behaviours

robskillington · 2019-03-26T17:41:47Z

src/dbnode/server/server.go

@@ -1202,8 +1202,11 @@ func withEncodingAndPoolingOptions(
 	postingsListOpts := poolOptions(policy.PostingsListPool, scope.SubScope("postingslist-pool"))
 	postingsList := postings.NewPool(postingsListOpts, roaring.NewPostingsList)

-	resultsPool := index.NewResultsPool(
-		poolOptions(policy.IndexResultsPool, scope.SubScope("index-results-pool")))
+	// Need to actually set pools


nit: Remove this now that the pools are set?

robskillington · 2019-03-26T17:48:42Z

src/dbnode/storage/index/aggregate_values_pool.go

@@ -0,0 +1,47 @@
+// Copyright (c) 2018 Uber Technologies, Inc.


robskillington · 2019-03-26T17:48:54Z

src/dbnode/storage/index/aggregate_results_pool.go

@@ -0,0 +1,47 @@
+// Copyright (c) 2018 Uber Technologies, Inc.


prateek · 2019-03-26T21:33:38Z

src/dbnode/storage/index/aggregate_results_test.go

+	contains(t, expected, res.Map())
+}
+
+func toFilter(strs ...string) AggregateTermFilter {


prateek · 2019-03-26T21:34:08Z

src/dbnode/storage/index/aggregate_results_test.go

+	assert.True(t, aggVals.Map().Contains(ident.StringID("biz")))
+}
+
+func contains(t *testing.T, ex map[string][]string, ac *AggregateResultsMap) {


nit: assertContains() would be clearer

prateek · 2019-03-26T21:34:53Z

src/dbnode/storage/index/aggregate_results_test.go

+
+	found := false
+
+	// our genny generated maps don't provide access to MapEntry directly,


+1, v nice test.

prateek · 2019-03-26T22:37:19Z

src/dbnode/storage/index/types.go

+// BaseResults is a collection of basic results for a generic query, it is
+// synchronized when access to the results set is used as documented by the
+// methods.
+type BaseResults interface {


prateek

LGTM

robskillington

LGTM

arnikola commented Mar 15, 2019

View reviewed changes

prateek reviewed Mar 15, 2019

View reviewed changes

arnikola changed the title ~~[wip] initial interfaces for aggregate results on storage side~~ [index] Aggregating results on storage side Mar 18, 2019

arnikola marked this pull request as ready for review March 18, 2019 17:30

prateek reviewed Mar 18, 2019

View reviewed changes

src/dbnode/storage/types.go Show resolved Hide resolved

prateek reviewed Mar 18, 2019

View reviewed changes

arnikola mentioned this pull request Mar 21, 2019

Poor performance for tag completion endpoints #1453

Closed

6 tasks

arnikola added 7 commits March 22, 2019 17:44

[wip] initial interfaces for aggregate results on storage side

d99c840

wip

aa47361

wip2

9e0bd6b

wip

69b86f0

Initial version of AggregateResult return types

d3f1398

WIP: plumbing

8a83700

Plumbing aggregate query up to database layer

07f9534

robskillington reviewed Mar 26, 2019

View reviewed changes

PR update, test fix

a9eca18

robskillington reviewed Mar 26, 2019

View reviewed changes

arnikola and others added 3 commits March 26, 2019 11:31

Add tests to aggregate query

ca24dff

Fix to big test

79dc84c

Merge branch 'master' into arnikola/index_tag_search

84ee78a

prateek reviewed Mar 26, 2019

View reviewed changes

Refactor filter to be a [][]byte instead of a map

9ac4d6d

prateek reviewed Mar 26, 2019

View reviewed changes

robskillington reviewed Mar 26, 2019

View reviewed changes

arnikola added 2 commits March 26, 2019 16:45

PR response

759e6df

Pr response

14f37d6

prateek reviewed Mar 26, 2019

View reviewed changes

prateek approved these changes Mar 26, 2019

View reviewed changes

robskillington approved these changes Mar 26, 2019

View reviewed changes

Merge branch 'master' into arnikola/index_tag_search

b005a84

arnikola merged commit 5e4d4f6 into master Mar 26, 2019

arnikola deleted the arnikola/index_tag_search branch March 26, 2019 23:28

		@@ -0,0 +1,47 @@
		// Copyright (c) 2018 Uber Technologies, Inc.


		found := false

		// our genny generated maps don't provide access to MapEntry directly,

[index] Aggregating results on storage side #1463

[index] Aggregating results on storage side #1463

Conversation

arnikola commented Mar 15, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington Mar 26, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prateek left a comment

Choose a reason for hiding this comment

robskillington left a comment

Choose a reason for hiding this comment

arnikola commented Mar 15, 2019 •

edited

Loading

robskillington Mar 26, 2019 •

edited

Loading