-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix postings list cache causing incorrect results #1461
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1461 +/- ##
=========================================
+ Coverage 48.2% 66.9% +18.6%
=========================================
Files 736 741 +5
Lines 62963 64770 +1807
=========================================
+ Hits 30384 43355 +12971
+ Misses 29787 18499 -11288
- Partials 2792 2916 +124
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #1461 +/- ##
======================================
Coverage 70.9% 70.9%
======================================
Files 841 841
Lines 72006 72006
======================================
Hits 51108 51108
Misses 17557 17557
Partials 3341 3341
Continue to review full report at Codecov.
|
// PostingsListCacheQuery represents a query that we want to cache the result of | ||
// for a given segment. Note that it include the field in the segment that the | ||
// query was executed on, as well as the pattern of the query itself. | ||
type PostingsListCacheQuery struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm i think you need to include query/pattern type here. i.e. a term query of "abc:1*" is not the same as regexp query of "abc:1*".
Could you ensure the tests catch this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is handled. They go through separate APIs and end up with separate keys in the LRU map. See postings_list_cache_lru.go key struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Mind including pattern here anyway? A query isn’t completely defined without it so this struct feels incomplete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d have to change the APIs though and then you have a potentially invalid iota coming from the caller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you just opposing the name? We could call it TermPattern or FieldPattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you just opposing the name? We could call it TermPattern or FieldPattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s not the name as much as it is the struct itself. Like without the pattern included, the struct is an incomplete description of a query.
How would you feel about migrating to using search.Query here? The thought is if we add new types of queries we’d still get this part for free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Search.Query is an interface :/ I don't think that makes a lot of sense.
I don't like the struct either, I'd prefer to keep all three fields as arguments to the methods in the postings list cache. I only did it this way because you suggested a struct in our previous discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like there's a failure case that still needs to be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me in general, but take that with a grain of salt since not exactly familiar with all the implications here yet
@@ -63,6 +63,14 @@ type PostingsListCache struct { | |||
metrics *postingsListCacheMetrics | |||
} | |||
|
|||
// PostingsListCacheQuery represents a query that we want to cache the result of | |||
// for a given segment. Note that it include the field in the segment that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: include
->includes
patternType PatternType, | ||
) (postings.List, bool) { | ||
newKey := newKey(query, patternType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't you create this here for no reason if uuidArray
is not present in items
? Looks like in Add
you always have to generate newKey
so it's ok there, but might be unnecessary both here and in Remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah but its a stack alloc so I think it doesnt matter much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no alloc > stack alloc ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me as long as Prateek is happy :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending test change
9663fd2
to
3cfcf50
Compare
3cfcf50
to
fad0239
Compare
The postings list cache was broken in that the key did not include the field that the query was executed against. This meant that for a given segment UUID the results of a query for one term could be mixed up with the results of the same query but for a different term. This would lead to incorrect results being returned when the index block was queried.
This P.R introduces the field as part of the key in the postings list cache to resolve the issue. In addition, it introduces a property test that generates two index blocks, one with the postings list cache enabled and one without, and then executes hundreds of different queries against both blocks and makes sure that they return the exact same result in all cases.