Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor KNNCodec to use new extension point #319

Merged
merged 12 commits into from
Mar 18, 2022

Conversation

jmazanec15
Copy link
Member

Description

This is a fairly large refactor of our current KNNCodec system.

First, it deletes unused code. For instance, we had KNN84Codec and KNN86Codec that we were not using. The reason to keep them had been potential assistance with backwards compatibility. However, given that we do not use them now, I think it makes sense to just delete.

Second, the plugin is refactored to implement getCustomCodecServiceFactory. This is a new extension point in OpenSearch and will allow multiple EnginePlugins to be used on the same index. For instance, the CCR plugin and k-NN. This allows us to delete the EngineFactory code. Along with this, I moved KNNCodecService to the codec folder for consistency.

Third, the KNN87Codec was refactored to implement the FilterCodec. The KNN87Codec uses the delegate pattern to override the DocValuesFormat. The rest of the formats are delegated to a delegate. This is what the FilterCodec is intended to be used for. Along with this, I extended the delegate reach further so that the KNN87Codec could implement the pattern to its fullest potential.

Fourth, I added several unit tests for the individual components of the KNN87Codec as well as a util class. This will allow us to cover to a fuller extent the test coverage around these components. I have not yet added unit test coverage for merging or for the KNNCodecUtil. I will do this in the future.

Note - I expect checkstyle to fail. My plan is to run ./gradlew spotlessApply after I get review so as to not distract reviewers from the substantive changes.

Issues Resolved

#146 #147

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Initial commit for plugin to return CodecServiceFactory as opposed to
CodecService. This will allow the plugin to make decisions based on
Mapper Service.

Signed-off-by: John Mazanec <[email protected]>
Refactors the KNN87Codec to implement FilterCodec. This allows the codec
to automatically/flexibly delegate operations it does not override to an
arbitrary Codec. Additionally cleans up some code around the Codec

Signed-off-by: John Mazanec <[email protected]>
Adds unit tests that map to each codec component. Did not add tests for
merging and codec utils. This can be undertaken later. Adds a utils
folder for sharing testing functionality between codec tests. Cleans up
a few minor details around codec source code.

Signed-off-by: John Mazanec <[email protected]>
@jmazanec15 jmazanec15 added Refactoring Improve the design, structure, and implementation while preserving its functionality v2.0.0 labels Mar 11, 2022
@jmazanec15 jmazanec15 requested a review from a team March 11, 2022 19:04

if (field.attributes().containsKey(MODEL_ID)) {
if (field.attributes().containsKey(MODEL_ID)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need null check for attributes before calling contains?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*/
@Override
public Codec codec(String name) {
return new KNN87Codec(super.codec(name));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some sort of factory here that will allow to select exact k-NN Codec implementation. I'm thinking of a use case for 2.0/lucene 9, we'll need to use Lucene90 codec for it, and probably will create new kNN wrapper class, something like KNN90Codec. Although it might be too much for this PR, we can do it in next one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure of a case where we will want to build a Codec for an older version. For instance, when we upgrade to Lucene90, I dont think we will want to allow the option to build KNN87Codec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see, so essentially we change it with every next Lucene version but keep older codecs for bwc. Seems that this method (codec(String name)) is the factory method itself. Do we have our abstraction for knn codec apart from lucene abstract Codec class? It may give us more flexibility with testing, although I'm not sure it's that easy to pass our interface to the OpenSearch through plugin related service classes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so we keep older Codecs so that older indices can be read by old Codecs via SPI.

BWC is also why we need to version our Codecs (KNN87Codec). This is so that older indices can be read with old codecs.

I think this is something that could be investigated more in a future review.

Comment on lines 1 to 3
org.opensearch.knn.index.codec.KNN80Codec.KNN80Codec
org.opensearch.knn.index.codec.KNN84Codec.KNN84Codec
org.opensearch.knn.index.codec.KNN86Codec.KNN86Codec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we do not use these codecs in any older opendistro version? If these codecs were used to create segments in some older version then assuming they upgrade to latest OpenSearch version, it might fail as they do not find respective codec for reading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think this could have problems for SPI.

Let me add back these Codecs back.

@jmazanec15 jmazanec15 force-pushed the codec-service-refactor branch 3 times, most recently from 11085ec to 8e99dd4 Compare March 17, 2022 16:31
@jmazanec15 jmazanec15 force-pushed the codec-service-refactor branch from 8e99dd4 to 0845b3d Compare March 17, 2022 16:32
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
@jmazanec15 jmazanec15 force-pushed the codec-service-refactor branch from df7ae05 to 7f5bf23 Compare March 17, 2022 16:46
Signed-off-by: John Mazanec <[email protected]>
@jmazanec15
Copy link
Member Author

@martin-gaievski @vamshin Failure related to spotless check. Please review again - Ill fix spotless after I get approval

Comment on lines 50 to 52
// If the delegate is an instanceof Lucene87, we can get a field specific doc values format
// delegate
if (delegate instanceof Lucene87Codec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not doing this for every code >= Lucene87? Wondering why this should be only specific to Lucene87

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so Codec does not have the method getDocValuesFormatForField. Lucene8XCodec's do, however. So, I felt like it made sense to map this to the LuceneCodec version our KNN codec version maps to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can pull this to the separate DocValuesFormat class and inject to knn codec during construction time. We can also make getDocValuesFormatForField part of our codec facade abstraction, this should help to decrease coupling between our and lucene classes. These ideas are more for next refactoring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-gaievski what would this look like?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually was able to remove this code. The reason beingthat the delegate could have the per field functionality and we dont need to do anything to get it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. My initial idea was to create a facade interface for knn codec and then inject it to KNNCodecService using DI similar to spring or guice. I may try to play with it while working on integration with opensearch 2.0/lucene 9.1, we need a codec based on lucene 9.1

@jmazanec15 jmazanec15 requested a review from vamshin March 17, 2022 19:17
Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Signed-off-by: John Mazanec <[email protected]>
vamshin
vamshin previously approved these changes Mar 17, 2022
Copy link
Member

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

Signed-off-by: John Mazanec <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Mar 17, 2022

Codecov Report

Merging #319 (47747e4) into main (bb28393) will increase coverage by 0.24%.
The diff coverage is 91.13%.

@@             Coverage Diff              @@
##               main     #319      +/-   ##
============================================
+ Coverage     83.49%   83.74%   +0.24%     
+ Complexity      889      885       -4     
============================================
  Files           127      126       -1     
  Lines          3829     3801      -28     
  Branches        361      360       -1     
============================================
- Hits           3197     3183      -14     
+ Misses          470      457      -13     
+ Partials        162      161       -1     
Impacted Files Coverage Δ
...n/index/codec/KNN80Codec/KNN80BinaryDocValues.java 100.00% <ø> (+28.57%) ⬆️
...n/index/codec/KNN80Codec/KNN80DocValuesReader.java 75.00% <ø> (ø)
...earch/knn/index/codec/util/BinaryDocValuesSub.java 71.42% <ø> (ø)
...index/codec/KNN80Codec/KNN80DocValuesConsumer.java 88.23% <87.27%> (+6.23%) ⬆️
...nn/index/codec/KNN80Codec/KNN80CompoundFormat.java 100.00% <100.00%> (ø)
...n/index/codec/KNN80Codec/KNN80DocValuesFormat.java 100.00% <100.00%> (+16.66%) ⬆️
...nsearch/knn/index/codec/KNN87Codec/KNN87Codec.java 100.00% <100.00%> (+4.00%) ⬆️
...rg/opensearch/knn/index/codec/KNNCodecService.java 100.00% <100.00%> (ø)
...main/java/org/opensearch/knn/plugin/KNNPlugin.java 100.00% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb28393...47747e4. Read the comment docs.

Copy link
Member

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@jmazanec15 jmazanec15 merged commit d531b3c into opensearch-project:main Mar 18, 2022
martin-gaievski pushed a commit to martin-gaievski/k-NN that referenced this pull request Mar 30, 2022
Refactor plugin to return CodecServiceFactory as opposed to
CodecService. This will allow the plugin to make decisions based on
Mapper Service.

Refactors the KNN87Codec to implement FilterCodec. This allows the codec
to automatically/flexibly delegate operations it does not override to an
arbitrary Codec. Additionally cleans up some code around the Codec

Adds unit tests that map to each codec component. Did not add tests for
merging and codec utils. This can be undertaken later. Adds a utils
folder for sharing testing functionality between codec tests. Cleans up
a few minor details around codec source code.

Signed-off-by: John Mazanec <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactoring Improve the design, structure, and implementation while preserving its functionality v2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants