Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge efficient filtering from feature branch #588

Merged
merged 8 commits into from
Oct 25, 2022

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Oct 21, 2022

Description

Merge from feature branch to main as we got a signoff from PM team.
PRs included in this change:

Issues Resolved

#376

Check List

  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

* Add initial support for filtering 

Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
* Adding serialization/deserialization for filter field in Lucene knn query

Signed-off-by: Martin Gaievski <[email protected]>
* Simplify min cluster version lookup

Signed-off-by: Martin Gaievski <[email protected]>
* Refactor codec related classes, create KNNCodecVersion abstraction

Signed-off-by: Martin Gaievski <[email protected]>
@martin-gaievski martin-gaievski added Enhancements Increases software capabilities beyond original client specifications backport 2.x labels Oct 21, 2022
@codecov-commenter
Copy link

codecov-commenter commented Oct 21, 2022

Codecov Report

Merging #588 (d015c35) into main (6d77882) will increase coverage by 0.60%.
The diff coverage is 84.51%.

@@             Coverage Diff              @@
##               main     #588      +/-   ##
============================================
+ Coverage     83.91%   84.51%   +0.60%     
- Complexity     1033     1054      +21     
============================================
  Files           148      149       +1     
  Lines          4233     4301      +68     
  Branches        373      382       +9     
============================================
+ Hits           3552     3635      +83     
+ Misses          507      489      -18     
- Partials        174      177       +3     
Impacted Files Coverage Δ
...ec/KNN920Codec/KNN920PerFieldKnnVectorsFormat.java 50.00% <33.33%> (+46.15%) ⬆️
.../knn/index/codec/BasePerFieldKnnVectorsFormat.java 58.33% <58.33%> (ø)
...ec/KNN940Codec/KNN940PerFieldKnnVectorsFormat.java 75.00% <66.66%> (+16.66%) ⬆️
...rg/opensearch/knn/index/codec/KNNCodecVersion.java 80.76% <80.76%> (ø)
...rg/opensearch/knn/index/query/KNNQueryBuilder.java 84.21% <90.47%> (+13.41%) ⬆️
...rg/opensearch/knn/index/query/KNNQueryFactory.java 90.00% <91.66%> (+4.28%) ⬆️
.../java/org/opensearch/knn/index/KNNClusterUtil.java 100.00% <100.00%> (ø)
...earch/knn/index/codec/KNN910Codec/KNN910Codec.java 100.00% <100.00%> (ø)
...earch/knn/index/codec/KNN920Codec/KNN920Codec.java 90.90% <100.00%> (+0.90%) ⬆️
...earch/knn/index/codec/KNN940Codec/KNN940Codec.java 100.00% <100.00%> (ø)
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@martin-gaievski martin-gaievski changed the title Adding efficient filtering feature Merge efficient filtering from feature branch Oct 21, 2022
@martin-gaievski martin-gaievski marked this pull request as ready for review October 24, 2022 16:00
@martin-gaievski martin-gaievski requested a review from a team October 24, 2022 16:00
return this.clusterService.state().getNodes().getMinNodeVersion();
} catch (Exception exception) {
log.error(
String.format("Failed to get cluster minimum node version, returning current node version %s instead.", Version.CURRENT),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reasoning here to return Current as opposed to propagating the error up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking on keeping system running, the only reason I can imagine for this.clusterService.state().getNodes().getMinNodeVersion(); call to fail is related to network/transport issues that are stoping one or more nodes from submitting their version info to the cluster state. As we're not changing system state in knn query workflow and we're doing this check for all knn queries even without filter field, I think it's better to assume current version rather than failing.

}
}

protected String createKnnIndexMappingWithLuceneField(final String fieldName, int dimension) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, let me switch to existing method

validateSearchKNNIndexFailed(testIndex, new KNNQueryBuilder(TEST_FIELD, queryVector, K, TERM_QUERY), K);
break;
case MIXED:
validateSearchKNNIndexFailed(testIndex, new KNNQueryBuilder(TEST_FIELD, queryVector, K, TERM_QUERY), K);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when we start upgrading from 2.4 to 2.5 or 3.x?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need to disable this test for higher versions similarly to what we're doing for some other IT, this will work for cases when previous version doesn't have filtering and next does have it

request.addParameter("search_type", "query_then_fetch");
request.setJsonEntity(Strings.toString(builder));

expectThrows(ResponseException.class, () -> client().performRequest(request));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this fails with another unrelated exception? i.e. are we sure this will work in a valid case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to get same exception type for a different cause. Let me add one more assert for the error message, I think that should be specific to our case

*/
@NoArgsConstructor(access = AccessLevel.PRIVATE)
@Log4j2
public class KNNClusterContext {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me this is a utility or a helper. Context objects from what I have seen in OpenSearch usually have state associated with them. Is there a better name for this class? Maybe like KNNClusterUtility?

Copy link
Member Author

@martin-gaievski martin-gaievski Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, we can change/extend it in case we have more functionality or state in future. I like KNNClusterUtility, once small doubt is that in existing k-nn codebase we do have prefix as "Util", so I'll go with KNNClusterUtil

@martin-gaievski martin-gaievski force-pushed the feature/efficient-filtering branch from 2608ef5 to 60a27bf Compare October 24, 2022 19:03
@@ -101,8 +112,11 @@ public KNNQueryBuilder(StreamInput in) throws IOException {
fieldName = in.readString();
vector = in.readFloatArray();
k = in.readInt();
if (isClusterOnOrAfterMinRequiredVersion()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a comment somewhere describing what we are doing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, there isn't one, let me add it

Signed-off-by: Martin Gaievski <[email protected]>
@martin-gaievski martin-gaievski merged commit f332ccb into main Oct 25, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 25, 2022
* Adding efficient filtering

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit f332ccb)
martin-gaievski pushed a commit that referenced this pull request Oct 25, 2022
* Merge efficient filtering from feature branch (#588)

* Adding efficient filtering

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit f332ccb)
@heemin32 heemin32 added 2.4.0 v2.4.0 'Issues and PRs related to version v2.4.0' and removed 2.4.0 labels Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Enhancements Increases software capabilities beyond original client specifications v2.4.0 'Issues and PRs related to version v2.4.0'
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants