-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[index] Aggregating results on storage side #1463
Conversation
src/dbnode/generated-source-files.mk
Outdated
genny-map-storage-namespace-metadata \ | ||
genny-map-storage-repair \ | ||
genny-map-storage-index-results \ | ||
genny-map-storage-index-search-results \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops obviously doesn't exist
src/dbnode/generated-source-files.mk
Outdated
# Map generation rule for storage/index/AggregationResultsMap | ||
.PHONY: genny-map-storage-index-aggregation-results | ||
genny-map-storage-index-aggregation-results: install-m3x-repo | ||
cd $(m3x_package_path) && make hashmap-gen \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/hashmap-gen/idhashmap-gen
It’s the target for genny maps with ident.ID keys, so then you can get rid of the key_type specifier below.
src/dbnode/storage/types.go
Outdated
ctx context.Context, | ||
query index.Query, | ||
opts index.QueryOptions, | ||
) (index.QueryResults, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to update the return value
src/dbnode/storage/types.go
Outdated
ctx context.Context, | ||
query index.Query, | ||
opts index.QueryOptions, | ||
) (index.QueryResults, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too
src/dbnode/storage/index/types.go
Outdated
@@ -65,6 +65,15 @@ type QueryOptions struct { | |||
StartInclusive time.Time | |||
EndExclusive time.Time | |||
Limit int | |||
|
|||
// Optional param to filter aggregate values. | |||
TermFilter *AggregateValuesMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break into separate struct - i.e. AggregateQueryOptions v QueryOptions should be separate
src/dbnode/generated-source-files.mk
Outdated
@@ -157,20 +157,6 @@ genny-map-storage-index-results: install-m3x-repo | |||
# Rename generated map file | |||
mv -f $(m3db_package_path)/src/dbnode/storage/index/map_gen.go $(m3db_package_path)/src/dbnode/storage/index/results_map_gen.go | |||
|
|||
# Map generation rule for storage/index/AggregationValuesMap | |||
.PHONY: genny-map-storage-index-aggregation-values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one was github weirdness, right? Or still a valid comment?
src/dbnode/generated-source-files.mk
Outdated
# Map generation rule for dependent generated maps which are built on top of a | ||
# generated map | ||
.PHONY: genny-map-dependent-all | ||
genny-map-dependent-all: \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think you need to break it up like this. i.e. you can just have
genny-map-all:
...
genny-map-storage-index-aggregation-results
genny-map-storage-index-aggregation-results: genny-map-whichever-one-this-depends-on
...
errUnableToAddAggregateResultMissingID = errors.New("no id for result") | ||
) | ||
|
||
type aggregatedResults struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need tests for this struct verifying a few different things:
- aggregation results
- copying/non-copying behaviour
- finalizing/non-finalizing behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is missing tests in general at the moment; looking to verify structure of the feature in general before writing them up
document doc.Document, | ||
opts QueryOptions, | ||
) error { | ||
// TODO: is this neccessary to check for document correctness? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there's my answer :p
Was going to check with someone more versed in the code than I to see if this represents an error case
return nil | ||
} | ||
|
||
func (r *aggregatedResults) AggregateDocument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have AggregateDocument
and AddIDAndValues
separate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to match the flow of the regular Query
case; when it initially matches the block, it will be added through the AggregateDocument
path, then when the results from multiple blocks are being merged, one is selected as the merging AggregateResult
, and the rest are iterated through and call AddIDAndValues
on that one to merge the results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm i think you can do AggregateDocument
or AddAggregatedValues
(doesn't need ID in the second method)
} | ||
|
||
// if this term hasn't been seen, ensure it should be included in output. | ||
if !opts.TermFilter.Contains(termID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm should this be the first thing you do in the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I went back and forth on this; if it's already been added to the map, that indicates it's a valid value and there's no reason to check against the term filter
I guess this should be arranged by what's more likely; failing the filter lookup, or succeeding (the current way is biased towards succeeding)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly suggesting to do it early cause the flow made more sense, early terminating and all that. Don't think it makes a difference for perf really
src/dbnode/storage/index/types.go
Outdated
// including returning it to a backing pool. | ||
Finalize() | ||
|
||
// NoFinalize marks the AggregateResults such that a subsequent call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely not; will remove
src/dbnode/storage/index/types.go
Outdated
// Reset resets the AggregateResults object to initial state. | ||
Reset(nsID ident.ID) | ||
|
||
// Finalize releases any resources held by the AggregateResults object, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely not; will remove
src/dbnode/storage/index/types.go
Outdated
} | ||
|
||
// AggregateValues is a collection of values for an aggregation query. | ||
type AggregateValues interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just a wrapper on a code-gen'd type, if so - can we use the struct where needed instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I'll give it a shot, hopefully the code gen will still be happy with it
src/dbnode/storage/index/types.go
Outdated
@@ -169,6 +271,13 @@ type Block interface { | |||
results Results, | |||
) (exhaustive bool, err error) | |||
|
|||
// AggregateQuery resolves the given query into aggregated tags. | |||
AggregateQuery( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same nit about opts type
src/dbnode/storage/index.go
Outdated
// possible this block may get closed if it slides out of retention, in | ||
// that case those results are no longer considered valid and outside of | ||
// retention regardless, so this is a non-issue. | ||
err = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just return early here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seems to be the case
src/dbnode/storage/index.go
Outdated
wg sync.WaitGroup | ||
|
||
// Results contains all concurrent mutable state below. | ||
results = struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you extract this into a separate struct and reuse it across both places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, had something like that originally while trying to jam it into a combined function :)
src/dbnode/storage/index.go
Outdated
mergedResult = true | ||
for _, entry := range aggregateResults.Map().Iter() { | ||
// Append to merged results. | ||
id, tags := entry.Key(), entry.Value() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the ID needed in aggregation results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a bad name; id
here represents unique tag names
src/dbnode/storage/index.go
Outdated
@@ -1436,3 +1440,230 @@ func (shards dbShards) IDs() []uint32 { | |||
} | |||
return ids | |||
} | |||
|
|||
func (i *nsIndex) AggregateQuery( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: refactor this so that Query()
and AggregateQuery()
share the structural components of the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was trying that but it got a little messy, will take another stab at it
valueBytes.IncRef() | ||
valueBytes.AppendAll(value) | ||
valueBytes.DecRef() | ||
valueID := r.idPool.BinaryID(valueBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I realize you can actually use a helper function on the ID pool to make this code a little simpler:
valueID := r.idPool.Clone(ident.BytesID(value))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍
|
||
// NB: fine to overwrite the values here. | ||
v.valuesMap.Set(bytesID, struct{}{}) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just condense this (since .Set(...)
takes an ident.ID and will copy it) into:
if len(value.Bytes()) == 0 {
return errUnableToAddValueMissingID
}
// NB: fine to overwrite the values here.
v.valuesMap.Set(value, struct{}{})
return nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah already have sorry, latest push was halfway consolidated :p
@@ -598,24 +598,49 @@ func (n *dbNamespace) QueryIDs( | |||
ctx context.Context, | |||
query index.Query, | |||
opts index.QueryOptions, | |||
) (index.QueryResults, error) { | |||
) (index.QueryResult, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for rename
src/dbnode/storage/namespace.go
Outdated
} | ||
|
||
res, err := n.reverseIndex.AggregateQuery(ctx, query, opts, aggResultOpts) | ||
n.metrics.queryIDs.ReportSuccessOrError(err, n.nowFn().Sub(callStart)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a metric for aggregateQuery
(similar to queryIDs
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and reuse in all calls in this function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one 👍
@@ -566,26 +566,41 @@ func TestNamespaceIndexBlockQuery(t *testing.T) { | |||
StartInclusive: t0, | |||
EndExclusive: now.Add(time.Minute), | |||
} | |||
b0.EXPECT().Query(gomock.Any(), q, qOpts, gomock.Any()).Return(true, nil) | |||
aggOpts := index.AggregateResultsOptions{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you decouple the tests for the Query and Aggregate. One simple way - copy paste the test you have twice, and make it only do the mocks/calls for Query in one, and Aggregate in the other. That way each is independent of the other. Makes maintenance easier in the long run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for sure. I didn't want to go ahead and c/p larger tests so figured would be fine to package them; will refactor to use a test setup method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried with a setup method and it looked really bad; did the test twice instead
return v.valuesMap.Len() | ||
} | ||
|
||
func (v *AggregateValues) reset() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more a question: is it intentional these methods are un-exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, wanted to restrict these methods to this package so callers don't accidentally close the AggregateValues
) error { | ||
for _, field := range document.Fields { | ||
if err := r.addFieldWithLock(field.Name, field.Value); err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be useful to bundle with document here too, something like:
return fmt.Errorf("unable to add document [%+v]: %v", document, err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
|
||
// if a term filter is provided, ensure this field matches the filter, | ||
// otherwise ignore it. | ||
if len(r.aggregateOpts.TermFilter) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: mind making this a function on the type? something like
type AggregateResultsOptions struct {
...
TermFilter [][]byte
}
type AggregateTermFilter [][]byte
func (a AggregateTermFilter) Filter(term []byte) bool {
if len(a) == 0 {
return false
}
for ...
return true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, guess then we can use it elsewhere :)
return fmt.Errorf(missingDocumentFields, "value") | ||
} | ||
|
||
// NB: can cast the []byte -> ident.ID to avoid an alloc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shouldn't this below the filtering code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in, termID isn't used until after the filtering. would be better to group along with the other vars there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah good one; the filter function previously required a ident.ID
so had to define this earlier, but missed changing it after refactor
func (r *aggregatedResults) Finalize() { | ||
r.Lock() | ||
|
||
r.aggregateOpts = AggregateResultsOptions{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm could you reuse r.Reset(nil, AggregateResultsOptions{})
here? looks like they're both doing the same thing except the pool return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, seems aggressive for now, we could potentially consolidate this later to unblock find endpoint working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not sure I follow. I was suggesting changing the implementation of Finalize to the following:
func (r *aggregatedResults) Finalize() {
r.Reset(nil, AggregateResultsOptions{})
if r.pool == nil {
return
}
r.pool.Put(r)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, so this method is a little different in that it needs to call finalize
on the AggregatedValues
map; I tried refactoring it so they'd both call a common function with a flag to determine if it should finalize or reset, but it ended up being a bit awkward so preferred this approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
talked offline: made sense to merge the behaviours
src/dbnode/server/server.go
Outdated
@@ -1202,8 +1202,11 @@ func withEncodingAndPoolingOptions( | |||
postingsListOpts := poolOptions(policy.PostingsListPool, scope.SubScope("postingslist-pool")) | |||
postingsList := postings.NewPool(postingsListOpts, roaring.NewPostingsList) | |||
|
|||
resultsPool := index.NewResultsPool( | |||
poolOptions(policy.IndexResultsPool, scope.SubScope("index-results-pool"))) | |||
// Need to actually set pools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Remove this now that the pools are set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call
@@ -0,0 +1,47 @@ | |||
// Copyright (c) 2018 Uber Technologies, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 2019.
@@ -0,0 +1,47 @@ | |||
// Copyright (c) 2018 Uber Technologies, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 2019.
contains(t, expected, res.Map()) | ||
} | ||
|
||
func toFilter(strs ...string) AggregateTermFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
assert.True(t, aggVals.Map().Contains(ident.StringID("biz"))) | ||
} | ||
|
||
func contains(t *testing.T, ex map[string][]string, ac *AggregateResultsMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: assertContains()
would be clearer
|
||
found := false | ||
|
||
// our genny generated maps don't provide access to MapEntry directly, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, v nice test.
// BaseResults is a collection of basic results for a generic query, it is | ||
// synchronized when access to the results set is used as documented by the | ||
// methods. | ||
type BaseResults interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
TODO:Test coverageRebase on Use a single index Results when querying across blocks #1474