Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually add footer to engine files #327

Merged
merged 3 commits into from
Mar 21, 2022

Conversation

jmazanec15
Copy link
Member

Description

Manually adds Lucene footer to engine files to prevent an unnecessary copy from one file to another.

A few things had to be changed to get this to work. First, before creating the enginefile, an IndexOutput had to be created and then closed so that the TrackingDirectoryWrapper would be able to track the file.

Next, the footer magic number and crc algorithm index (0) had to be appended to the end of the file. More details on how Lucene does this can be found here.

Lastly, the checksum gets computed and added to the end of the file.

No new tests were added because all of our existing codec test cases check the footer validity. Also, our integration tests make sure that the change does not break anything.

Issues Resolved

#326

Check List

  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Manually adds Lucene footer to engine files to prevent an unnecessary
copy from one file to another.

Signed-off-by: John Mazanec <[email protected]>
@jmazanec15 jmazanec15 added the Enhancements Increases software capabilities beyond original client specifications label Mar 18, 2022
@jmazanec15 jmazanec15 requested a review from a team March 18, 2022 23:58
@martin-gaievski
Copy link
Member

could you please share reference to test that check that checksum integrity?

@codecov-commenter
Copy link

Codecov Report

Merging #327 (d023f10) into main (c63bef5) will increase coverage by 0.05%.
The diff coverage is 90.90%.

@@             Coverage Diff              @@
##               main     #327      +/-   ##
============================================
+ Coverage     83.74%   83.79%   +0.05%     
- Complexity      885      888       +3     
============================================
  Files           126      126              
  Lines          3801     3807       +6     
  Branches        360      361       +1     
============================================
+ Hits           3183     3190       +7     
+ Misses          457      455       -2     
- Partials        161      162       +1     
Impacted Files Coverage Δ
...index/codec/KNN80Codec/KNN80DocValuesConsumer.java 89.81% <90.90%> (+1.57%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c63bef5...d023f10. Read the comment docs.

@jmazanec15
Copy link
Member Author

long value = checksumIndexInput.getChecksum();
checksumIndexInput.close();

if ((value & 0xFFFFFFFF00000000L) != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we use a constant here? also it would make code more readable if condition will be in a separate method with name that describes the idea of the check, e.g. invalidChecksum()

Signed-off-by: John Mazanec <[email protected]>
private boolean isChecksumValid(long value) {
// Check pulled from
// https://github.com/apache/lucene/blob/branch_9_0/lucene/core/src/java/org/apache/lucene/codecs/CodecUtil.java#L630-L632
return (value & 0xFFFFFFFF00000000L) != 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for pulling this into a separate method. Can we also use constant instead of direct magic number?

Signed-off-by: John Mazanec <[email protected]>
Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

Copy link
Member

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that writing Footer logic could diverge from Lucene incase they wish to change in future which is dangerous. Given we have test cases to catch the checksum validations part of the releases and since this change helps improve indexing latencies especially with forcemerges, I am approving these changes.

Thanks for the improvements!

@jmazanec15 jmazanec15 merged commit 87bc388 into opensearch-project:main Mar 21, 2022
martin-gaievski pushed a commit to martin-gaievski/k-NN that referenced this pull request Mar 30, 2022
Manually adds Lucene footer to engine files to prevent an unnecessary
copy from one file to another.

Signed-off-by: John Mazanec <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancements Increases software capabilities beyond original client specifications v2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants