Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor FetchTagged to return an Iterator of results #3141

Merged
merged 5 commits into from
Jan 31, 2021

Conversation

ryanhall07
Copy link
Collaborator

@ryanhall07 ryanhall07 commented Jan 29, 2021

This is the first step in several refactorings to limit how many series blocks can be loaded at once. This will prevent
large queries from overwhelming the system and give fair access to all queries in the system.

This first refactoring creates the interface for callers to use to iterate through series blocks one at a
time. The series blocks are still all loaded at once and this will be fixed in a future PR with another Iterator.

What this PR does / why we need it:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:


Does this PR require updating code package or user-facing documentation?:


@ryanhall07 ryanhall07 force-pushed the rhall-fetch-results-iterator branch from a4a111f to 3175ecd Compare January 29, 2021 23:38
This is the first step in several refactorings to limit how many series blocks can be loaded at once. This will prevent
large queries from overwhelming the system and give more fair access to all queries in the system.

This first refactoring creates the interface for callers to use to iterate through series blocks one at a
time. The series blocks are still all loaded at once and this will be fixed in a future PR with another Iterator.
@ryanhall07 ryanhall07 force-pushed the rhall-fetch-results-iterator branch from be04580 to 2633aec Compare January 30, 2021 01:01
Elements: make([]*rpc.FetchTaggedIDResult_, 0, results.Size()),
}
nsIDBytes := ns.Bytes()
tagEncoder := s.pools.tagEncoder.Get()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robskillington does this make sense for getting a tag encoder?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is best way to do it and return at end of request.

}
tagBytes := make([]byte, len(encodedTags.Bytes()))
copy(tagBytes, encodedTags.Bytes())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robskillington had to copy the bytes cause the same tag encoder is used for the entire request.

if err != nil {
return nil, err
}
ctx.RegisterCloser(xresource.SimpleCloserFn(func() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robskillington does this do what I think it does? call the complete function when the rpc is closed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this should work just fine 👍

var encodedDataResults [][][]xio.BlockReader
if fetchData {
encodedDataResults = make([][][]xio.BlockReader, results.Size())
return newFetchTaggedResultsIter(&fetchTaggedResultsIterOpts{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to pass fetchTaggedResultsIterOpts as a pointer value?

I wouldn't worry about using these values to pass along on the stack considering they already are on the stack in the current function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if you passed each of these one by one in the method call gocritic wouldn't complain... either way I wouldn't worry about growing the stack here, it's much cheaper in general to grow the stack that to heap allocate as a general rule of thumb.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was just avoiding the gocritic linter. so just add nolint:gocritic ?

Comment on lines 820 to 825
// HasNextIDIter returns true if there is another series ID to process.
HasNextIDIter() bool

// NextIDIter returns an iterator to process the results for a series ID.
// HasNextIDIter must be called before each call of this method.
NextIDIter(ctx context.Context) (IDIter, error)
Copy link
Collaborator

@robskillington robskillington Jan 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If you check everywhere else in the code pass our iterators usually use the naming of:

Next() bool // or NextIDIter() for this current example
Current() MyNextResult // or CurrentIDIter() for this current example

Perhaps for consistency better to use Next..() and Current..() instead of HasNext..() and Next..()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can leave java, but java can't leave you

Comment on lines 836 to 841
// HasNextSegments returns true if there are more Segments to process.
HasNextSegments() bool

// NextSegments returns the next Segments.
// HasNextSegments must be called before each call of this method.
NextSegments(ctx context.Context) (*rpc.Segments, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Same about hasnext and next vs rest of codebase with next and current.

Comment on lines 867 to 868
tagBytes := make([]byte, len(encodedTags.Bytes()))
copy(tagBytes, encodedTags.Bytes())
Copy link
Collaborator

@robskillington robskillington Jan 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent in speed btw and with compiler optimizations most often comes out better (and @arnikola will attest with the many discussions and benchmarks we've walked through) with the more clean version:

tagBytes := append(make([]byte, 0, len(encodedTags.Bytes())), encodedTags.Bytes()...)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// TODO(rhall): don't request all series blocks at once.
if i.idx == 0 && i.fetchData {
for _, idResult := range i.idResults {
id := ident.BytesID(idResult.queryResult.Key())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you copy the old comment to here? Starts with:

// NB(r): Use a bytes ID here so that this ID doesn't need to be...

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than minor nits

Comment on lines 961 to 963
iter.idResults = append(iter.idResults, &idResult{
queryResult: &entry,
})
Copy link
Collaborator

@robskillington robskillington Jan 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make both idResult and index.ResultsMapEntry not use pointers here?

Both will cause these structs to be individually heap allocated (due to escape analysis unsure of the lifetimes of these two structs).

Using by value and non-pointer types will ensure that (1) idResult in the idResults slice can be just part of the allocation of the slice and (2) index.ResultsMapEntry won't need to be heap allocated and a memcpy can move the struct from the results map to this value.

@ryanhall07 ryanhall07 marked this pull request as ready for review January 30, 2021 22:34
@ryanhall07
Copy link
Collaborator Author

@robskillington addressed comments in 40e2322

@codecov
Copy link

codecov bot commented Jan 30, 2021

Codecov Report

Merging #3141 (635c973) into master (bef2564) will increase coverage by 0.0%.
The diff coverage is 90.4%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3141   +/-   ##
=======================================
  Coverage    72.2%    72.2%           
=======================================
  Files        1084     1084           
  Lines      100236   100279   +43     
=======================================
+ Hits        72428    72497   +69     
+ Misses      22755    22739   -16     
+ Partials     5053     5043   -10     
Flag Coverage Δ
aggregator 75.9% <ø> (+0.1%) ⬆️
cluster 84.8% <ø> (ø)
collector 84.3% <ø> (ø)
dbnode 78.7% <90.4%> (+<0.1%) ⬆️
m3em 74.4% <ø> (ø)
m3ninx 73.1% <ø> (-0.1%) ⬇️
metrics 20.0% <ø> (ø)
msg 74.1% <ø> (+0.2%) ⬆️
query 67.2% <ø> (ø)
x 80.3% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bef2564...635c973. Read the comment docs.

Comment on lines 737 to 741
if err != nil {
s.metrics.fetchTagged.ReportError(s.nowFn().Sub(callStart))
} else {
s.metrics.fetchTagged.ReportSuccess(s.nowFn().Sub(callStart))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use .ReportSuccessOrError(err, s.nowFn().Sub(callStart)) here and it will do the nil check of the error for you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. just had the code from before.

Comment on lines 756 to 757
for iter.Next(ctx) {
if iter.Err() != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm usually we have Next(...) return false if there's an iterator error, then we just check the if err := iter.Err(); err != nil after the iterator has finished.

i.e.

for iter.Next() { // return false if no more or an error

}
if err := iter.Err(); err != nil {
  return nil, err
}

Comment on lines 761 to 762
tagBytes := make([]byte, 0)
tagBytes, err = cur.WriteTags(tagBytes)
Copy link
Collaborator

@robskillington robskillington Jan 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically if you're not reusing a byte slice being passed into a method taking a dst []byte to write to we just pass nil. That way it's not allocated before calling the function (i.e. reduces two allocs to just one alloc inside of the WriteTags(...)).

e.g.

tagBytes, err := cur.WriteTags(nil)
// ...

Comment on lines 773 to 774
for segIter.Next(ctx) {
if segIter.Err() != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here would opt to more consistently check for the err after and have the .Next(...) call return false if there is an error to make it break.

for segIter.Next(ctx) {
  // inner
}
if err := segIter.Err(); err != nil {
  return nil, err
}

id := ident.BytesID(result.queryResult.Key())
result.blockReaders, i.err = i.db.ReadEncoded(ctx, i.nsID, id, i.startInclusive, i.endExclusive)
if i.err != nil {
return true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah to be consistent with our other iterators I would make an error return false to break the for loop then allow the caller to check the error after the for loop has broken.

This is consistent with our other iterators (easiest way to find them is to search for for iter.Next() { or for it.Next() { and should surface quite a few results).

if err != nil { // This is an invariant, should never happen
return nil, tterrors.NewInternalError(err)
}
result = append(result, encodedTags.Bytes()...)
Copy link
Collaborator

@robskillington robskillington Jan 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: A slightly more defensive programming approach here is to slice the result into result[:0] so that if caller accidently passed a full buffer for reuse that hadn't been resized it will overwrite with the length reset but still able to use the allocated capacity. Also very nitty, we usually the name dst for a byte slice that is to be written into.

e.g.

func (i *IDResult) WriteTags(dst []byte) ([]byte, error) {
  // .... other
  dst = append(dst[:0], encodedTags.Bytes()...)

}

type fetchTaggedResultsIter struct {
queryResults map[index.ResultsMapHash]index.ResultsMapEntry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why not take the *index.ResultsMap itself here? Not that this isn't wrong it just breaks the abstraction a little bit of our map type wrapping the underlying map itself (in case we ever wanted to change the way the .Iter() method worked for the map type).

@ryanhall07 ryanhall07 merged commit 25fbe60 into master Jan 31, 2021
@ryanhall07 ryanhall07 deleted the rhall-fetch-results-iterator branch January 31, 2021 22:51
soundvibe added a commit that referenced this pull request Feb 1, 2021
* master:
  Refactor FetchTagged to return an Iterator of results (#3141)
soundvibe added a commit that referenced this pull request Feb 1, 2021
* master:
  [dtest] endpoint to fetch tagged (#3138)
  Refactor FetchTagged to return an Iterator of results (#3141)
  [dbnode] Add aggregate term limit regression test (#3135)
  [DOCS] Adding Prometheus steps to quickstart (#3043)
  [dbnode] Revert AggregateQuery changes (#3133)
  Fix TestSessionFetchIDs flaky test (#3132)
  [dbnode] Alter multi-segments builder to order by size before processing (#3128)
  [dbnode] Emit aggregate usage metrics (#3123)
  [dbnode] Add Shard.OpenStreamingReader method (#3119)
SokolAndrey pushed a commit to SokolAndrey/m3 that referenced this pull request Feb 2, 2021
This is the first step in several refactorings to limit how many series blocks can be loaded at once. This will prevent
large queries from overwhelming the system and give more fair access to all queries in the system.

This first refactoring creates the interface for callers to use to iterate through series blocks one at a
time. The series blocks are still all loaded at once and this will be fixed in a future PR with another Iterator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants