Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning bytesRead as part of searchResult #1752

Merged
merged 17 commits into from
Nov 11, 2022

Conversation

Thejas-bhat
Copy link
Member

@Thejas-bhat Thejas-bhat commented Nov 4, 2022

Refactoring the bytesRead accounting to now be accounted at a per query level. The searchResult returned by SearchInContext now has an extra field called bytesRead which helps to track the cost associated with the query.

The implementation to achieve this basically involves passing a context.Context which has a callback that is used to track or update the bytes read associated with some operation. This change of adding an extra parameter context to the searcher is an invasive one, and hence involves updating a lot of existing files to accommodate the same. However, the good thing is that the adding of context to searcher is in a way a future proof change, which hopefully might be useful to do any more useful things down at the searcher level and relay them back to the indexReader level.

- bug fixes with respect to bytes read tracking in
  collector, tfr and docValueReader
- fixing boolean searcher's BytesRead() API
  with numbers after the bug fixes.
  with a SetBytesRead() API. Currently the only valid
  implmentations is with disjunct and conjunct searchers.
- introducing BytesRead for newSnapshotIndexFieldDict.
  This is necessary for fuzzy, regex like queries.
- updating the tests with the changes
- updating rest of interface dependencies with
  no-op APIs in their implementations
  regexp queries in the seacher
- minor code refactor with respect to tfr
  the concept of context right at the searcher
  level. this will essentially help in bytesRead
  tracking as well.
  and reporting the same as part of search result.
- and yes, it works :D
- removed the BytesRead and SetBytesRead API from searcher
- fixed a race condition around the newSnapshotIndexFieldDict
- using the callbacks to return the bytes read value with respect
  to fieldDict in regexp, fuzzy etc type of queries, where there is
  a index.FieldDict creation involved (which loads the termDictionary
  for that field)
- updated the tests reflecting the bug fixes.
- removing the BytesRead API for index.TermFieldreader
@Thejas-bhat Thejas-bhat marked this pull request as ready for review November 4, 2022 16:32
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits, I'll make another pass over this soon.
Meanwhile you should get some perf results with these changes.

storedFieldsBytes uint64
}

func (d *Document) SetStoredFieldsBytes(val uint64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this method you can updated storedFieldsBytes by simply exporting it (like CompositeFields above).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

go.mod Outdated
@@ -40,3 +40,17 @@ require (
github.com/spf13/pflag v1.0.5 // indirect
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a // indirect
)

replace github.com/blevesearch/zapx/v15 => ../zapx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all these replaces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just got pushed to the review as part of "git add .". Removing them.

@@ -430,13 +432,17 @@ func (i *IndexSnapshot) Document(id string) (rv index.Document, err error) {
segmentIndex, localDocNum := i.segmentIndexAndLocalDocNumFromGlobal(docNum)

rvd := document.NewDocument(id)
prevBytesRead := i.segment[segmentIndex].segment.BytesRead()
var totalStoredBytes int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// track uncompressed stored fields bytes as part of IO stats.
// However, ideally we'd need to track the compressed on-disk value
// Keeping that TODO for now until we have a cleaner way.
totalStoredBytes += len(val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rvd.StoredFieldsBytes += len(val)

Separate question - shouldn't you be doing this only when the field is actually "stored"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This callback is called only for the stored fields. The VisitDocument API ends up calling the VisitStoredFields in the zap layer which loads the bytes pertaining to the stored fields, uncompresses them and iterates over each of these fields and applies this callback.

"reflect"

"github.com/blevesearch/bleve/v2/document"
index "github.com/blevesearch/bleve_index_api"
"github.com/blevesearch/upsidedown_store_api"
store "github.com/blevesearch/upsidedown_store_api"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes seems unrelated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This popped up because of the vscode extension I believe, removing it.

@@ -22,7 +22,7 @@ import (

"github.com/blevesearch/bleve/v2/size"
index "github.com/blevesearch/bleve_index_api"
"github.com/blevesearch/upsidedown_store_api"
store "github.com/blevesearch/upsidedown_store_api"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This popped up because of the vscode extension I believe, removing it.

d.storedFieldsBytes = val
}

func (d *Document) GetStoredFieldsBytes() uint64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can go away too, with storedFieldsBytes exported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Get type of API might be necessary since the struct is hidden behind the Document interface.

@abhinavdangeti abhinavdangeti added this to the v2.3.6 milestone Nov 8, 2022
@abhinavdangeti
Copy link
Member

@Thejas-bhat

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, couple minor comments.

@@ -37,6 +38,7 @@ import (
// re usable, threadsafe levenshtein builders
var lb1, lb2 *lev.LevenshteinAutomatonBuilder

type diskStatsReporter segment.DiskStatsReporter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this step necessary, can't we just type assert to segment.DiskStatsReporter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah at a couple of places, there are library name and variable name conflicts, for eg. indexSnapshot.segment.segment .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah variable and import conflict. Should update the variable name then (it looks quite confusing :)). I'll make that change separately.

index_impl.go Outdated
// 2. the docvalues portion (accounted in collector) and the retrieval
// of stored fields bytes (by LoadAndHighlightFields)
var totalBytesRead uint64
SendBytesRead := func(bytesRead uint64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this closure need to be exported? Would the compiler complain on sendBytesRead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, my bad.

@@ -37,6 +38,7 @@ import (
// re usable, threadsafe levenshtein builders
var lb1, lb2 *lev.LevenshteinAutomatonBuilder

type diskStatsReporter segment.DiskStatsReporter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah variable and import conflict. Should update the variable name then (it looks quite confusing :)). I'll make that change separately.

@abhinavdangeti abhinavdangeti merged commit 1643dc3 into blevesearch:master Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants