-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Returning bytesRead as part of searchResult #1752
Returning bytesRead as part of searchResult #1752
Conversation
- bug fixes with respect to bytes read tracking in collector, tfr and docValueReader - fixing boolean searcher's BytesRead() API
with numbers after the bug fixes.
with a SetBytesRead() API. Currently the only valid implmentations is with disjunct and conjunct searchers. - introducing BytesRead for newSnapshotIndexFieldDict. This is necessary for fuzzy, regex like queries. - updating the tests with the changes - updating rest of interface dependencies with no-op APIs in their implementations
regexp queries in the seacher - minor code refactor with respect to tfr
the concept of context right at the searcher level. this will essentially help in bytesRead tracking as well.
and reporting the same as part of search result. - and yes, it works :D - removed the BytesRead and SetBytesRead API from searcher
- fixed a race condition around the newSnapshotIndexFieldDict - using the callbacks to return the bytes read value with respect to fieldDict in regexp, fuzzy etc type of queries, where there is a index.FieldDict creation involved (which loads the termDictionary for that field) - updated the tests reflecting the bug fixes. - removing the BytesRead API for index.TermFieldreader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits, I'll make another pass over this soon.
Meanwhile you should get some perf results with these changes.
document/document.go
Outdated
storedFieldsBytes uint64 | ||
} | ||
|
||
func (d *Document) SetStoredFieldsBytes(val uint64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop this method you can updated storedFieldsBytes by simply exporting it (like CompositeFields above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
go.mod
Outdated
@@ -40,3 +40,17 @@ require ( | |||
github.com/spf13/pflag v1.0.5 // indirect | |||
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a // indirect | |||
) | |||
|
|||
replace github.com/blevesearch/zapx/v15 => ../zapx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove all these replaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah just got pushed to the review as part of "git add .". Removing them.
index/scorch/snapshot_index.go
Outdated
@@ -430,13 +432,17 @@ func (i *IndexSnapshot) Document(id string) (rv index.Document, err error) { | |||
segmentIndex, localDocNum := i.segmentIndexAndLocalDocNumFromGlobal(docNum) | |||
|
|||
rvd := document.NewDocument(id) | |||
prevBytesRead := i.segment[segmentIndex].segment.BytesRead() | |||
var totalStoredBytes int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
index/scorch/snapshot_index.go
Outdated
// track uncompressed stored fields bytes as part of IO stats. | ||
// However, ideally we'd need to track the compressed on-disk value | ||
// Keeping that TODO for now until we have a cleaner way. | ||
totalStoredBytes += len(val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rvd.StoredFieldsBytes += len(val)
Separate question - shouldn't you be doing this only when the field is actually "stored"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This callback is called only for the stored fields. The VisitDocument API ends up calling the VisitStoredFields in the zap layer which loads the bytes pertaining to the stored fields, uncompresses them and iterates over each of these fields and applies this callback.
index/upsidedown/index_reader.go
Outdated
"reflect" | ||
|
||
"github.com/blevesearch/bleve/v2/document" | ||
index "github.com/blevesearch/bleve_index_api" | ||
"github.com/blevesearch/upsidedown_store_api" | ||
store "github.com/blevesearch/upsidedown_store_api" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes seems unrelated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This popped up because of the vscode extension I believe, removing it.
index/upsidedown/reader.go
Outdated
@@ -22,7 +22,7 @@ import ( | |||
|
|||
"github.com/blevesearch/bleve/v2/size" | |||
index "github.com/blevesearch/bleve_index_api" | |||
"github.com/blevesearch/upsidedown_store_api" | |||
store "github.com/blevesearch/upsidedown_store_api" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This popped up because of the vscode extension I believe, removing it.
document/document.go
Outdated
d.storedFieldsBytes = val | ||
} | ||
|
||
func (d *Document) GetStoredFieldsBytes() uint64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can go away too, with storedFieldsBytes exported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Get type of API might be necessary since the struct is hidden behind the Document interface.
611130d
to
6d4cf3f
Compare
- removing bytesRead from search.DocumentMatch
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, couple minor comments.
@@ -37,6 +38,7 @@ import ( | |||
// re usable, threadsafe levenshtein builders | |||
var lb1, lb2 *lev.LevenshteinAutomatonBuilder | |||
|
|||
type diskStatsReporter segment.DiskStatsReporter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this step necessary, can't we just type assert to segment.DiskStatsReporter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah at a couple of places, there are library name and variable name conflicts, for eg. indexSnapshot.segment.segment
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah variable and import conflict. Should update the variable name then (it looks quite confusing :)). I'll make that change separately.
index_impl.go
Outdated
// 2. the docvalues portion (accounted in collector) and the retrieval | ||
// of stored fields bytes (by LoadAndHighlightFields) | ||
var totalBytesRead uint64 | ||
SendBytesRead := func(bytesRead uint64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this closure need to be exported? Would the compiler complain on sendBytesRead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, my bad.
@@ -37,6 +38,7 @@ import ( | |||
// re usable, threadsafe levenshtein builders | |||
var lb1, lb2 *lev.LevenshteinAutomatonBuilder | |||
|
|||
type diskStatsReporter segment.DiskStatsReporter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah variable and import conflict. Should update the variable name then (it looks quite confusing :)). I'll make that change separately.
Refactoring the bytesRead accounting to now be accounted at a per query level. The searchResult returned by SearchInContext now has an extra field called bytesRead which helps to track the cost associated with the query.
The implementation to achieve this basically involves passing a context.Context which has a callback that is used to track or update the bytes read associated with some operation. This change of adding an extra parameter context to the searcher is an invasive one, and hence involves updating a lot of existing files to accommodate the same. However, the good thing is that the adding of context to searcher is in a way a future proof change, which hopefully might be useful to do any more useful things down at the searcher level and relay them back to the indexReader level.