-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding serialization for filter field in KnnQueryBuilder #564
Adding serialization for filter field in KnnQueryBuilder #564
Conversation
Codecov Report
@@ Coverage Diff @@
## feature/efficient-filtering #564 +/- ##
=================================================================
+ Coverage 83.72% 84.10% +0.38%
- Complexity 1035 1049 +14
=================================================================
Files 148 149 +1
Lines 4282 4316 +34
Branches 379 384 +5
=================================================================
+ Hits 3585 3630 +45
+ Misses 520 509 -11
Partials 177 177
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@@ -181,6 +184,12 @@ protected void doWriteTo(StreamOutput out) throws IOException { | |||
out.writeString(fieldName); | |||
out.writeFloatArray(vector); | |||
out.writeInt(k); | |||
if (filter != null) { | |||
out.writeBoolean(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this boolean required? can we somehow skip it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use writeOptionalNamedWriteable? I think it abstracts this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Jack, will update to writeOptionalNamedWriteable
@@ -109,6 +109,9 @@ public KNNQueryBuilder(StreamInput in) throws IOException { | |||
fieldName = in.readString(); | |||
vector = in.readFloatArray(); | |||
k = in.readInt(); | |||
if (in.readBoolean()) { | |||
filter = in.readNamedWriteable(QueryBuilder.class); | |||
} | |||
} catch (IOException ex) { | |||
throw new RuntimeException("[KNN] Unable to create KNNQueryBuilder: " + ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we fix this error also. This exception will eat up the stack trace. I know its not part of your change.
throw new RuntimeException("[KNN] Unable to create KNNQueryBuilder: " + ex); | |
throw new RuntimeException("[KNN] Unable to create KNNQueryBuilder ", ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
|
||
final KNNQueryBuilder knnQueryBuilder = new KNNQueryBuilder(FIELD_NAME, queryVector, K); | ||
|
||
try (BytesStreamOutput output = new BytesStreamOutput()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract this block as a private method? I see the only difference is checking if getFilter return null or expected filter which can be handled by parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me think on proper refactoring, it should be possible. I'm adding one more similar block so code duplication is definitely there
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
bf7469c
to
c385555
Compare
Signed-off-by: Martin Gaievski <[email protected]>
195f025
to
536ed11
Compare
} | ||
} | ||
|
||
private void assertSerialization(final QueryBuilder deserializedQuery, boolean assertFilterIsNull) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TERM_QUERY is decided by KNNQueryBuilder but this method does not get the value from the KNNQueryBuilder.
Also I see there is a room for further optimization.
public void testSerialization() throws Exception {
testSerialization(Version.CURRENT, null);
testSerialization(Version.CURRENT, TERM_QUERY);
testSerialization(Version.V_2_3_0, null);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
ImmutableOpenMap<String, DiscoveryNode> clusterDiscoveryNodes = ImmutableOpenMap.of(); | ||
log.debug("Reading cluster min version"); | ||
try { | ||
clusterDiscoveryNodes = this.clusterService.state().getNodes().getNodes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method call cheap? Otherwise, calling this method every time when we serialize/deserialize query might lead to performance degradation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be cheap, state is cached locally and updated asynchronously if nodes are joining/leaving cluster. I want to run perf tests on multi-node cluster once this is merged to feature branch and compare latencies with previous solution.
Signed-off-by: Martin Gaievski <[email protected]>
255edae
to
af6ad48
Compare
Signed-off-by: Martin Gaievski <[email protected]>
af6ad48
to
c314b11
Compare
1729748
into
opensearch-project:feature/efficient-filtering
* Adding serialization/deserialization for filter field in Lucene knn query Signed-off-by: Martin Gaievski <[email protected]>
…project#564) * Adding serialization/deserialization for filter field in Lucene knn query Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski [email protected]
Description
Adding serialization for filter in Knn query. Without the change multi-node cluster may be in inconsistent state, meaning some shards will execute knn query without the filter and search result will be irrelevant.
Special case for rolling and restart upgrades. Let's say we do have existing cluster on 2.3 (we call it old) and we want to upgrade to 3.0 (we call it new). Nodes need to handle following cases for success run:
The easiest way to satisfy all criteria is to only write and read filter field in the last case of new->new. In all other cases input stream will be corrupted and logic after knn query will fail. To address this we introducing the cluster min deployed version concept that is set as a lowest out of all versions deployed in the cluster.
One more added check is for manual k-NN search query, it minimal cluster version is below one that supports filtering feature (3.0.0 for this branch), query will be rejected with 4xx Bad request error. This is to address cluster upgrades, we need to avoid accepting queries with filter until upgrade is completed. Otherwise only some of the cluster nodes will use filter and this affects recall/relevance of the result.
Issues Resolved
#376
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.