-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manually add footer to engine files #327
Conversation
Manually adds Lucene footer to engine files to prevent an unnecessary copy from one file to another. Signed-off-by: John Mazanec <[email protected]>
could you please share reference to test that check that checksum integrity? |
Codecov Report
@@ Coverage Diff @@
## main #327 +/- ##
============================================
+ Coverage 83.74% 83.79% +0.05%
- Complexity 885 888 +3
============================================
Files 126 126
Lines 3801 3807 +6
Branches 360 361 +1
============================================
+ Hits 3183 3190 +7
+ Misses 457 455 -2
- Partials 161 162 +1
Continue to review full report at Codecov.
|
@martin-gaievski https://github.com/opensearch-project/k-NN/blob/main/src/test/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumerTests.java#L185. A couple more of those tests have the same footer check at the bottom as well. |
long value = checksumIndexInput.getChecksum(); | ||
checksumIndexInput.close(); | ||
|
||
if ((value & 0xFFFFFFFF00000000L) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use a constant here? also it would make code more readable if condition will be in a separate method with name that describes the idea of the check, e.g. invalidChecksum()
Signed-off-by: John Mazanec <[email protected]>
private boolean isChecksumValid(long value) { | ||
// Check pulled from | ||
// https://github.com/apache/lucene/blob/branch_9_0/lucene/core/src/java/org/apache/lucene/codecs/CodecUtil.java#L630-L632 | ||
return (value & 0xFFFFFFFF00000000L) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for pulling this into a separate method. Can we also use constant instead of direct magic number?
Signed-off-by: John Mazanec <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that writing Footer logic could diverge from Lucene incase they wish to change in future which is dangerous. Given we have test cases to catch the checksum validations part of the releases and since this change helps improve indexing latencies especially with forcemerges, I am approving these changes.
Thanks for the improvements!
Manually adds Lucene footer to engine files to prevent an unnecessary copy from one file to another. Signed-off-by: John Mazanec <[email protected]>
Description
Manually adds Lucene footer to engine files to prevent an unnecessary copy from one file to another.
A few things had to be changed to get this to work. First, before creating the enginefile, an IndexOutput had to be created and then closed so that the TrackingDirectoryWrapper would be able to track the file.
Next, the footer magic number and crc algorithm index (0) had to be appended to the end of the file. More details on how Lucene does this can be found here.
Lastly, the checksum gets computed and added to the end of the file.
No new tests were added because all of our existing codec test cases check the footer validity. Also, our integration tests make sure that the change does not break anything.
Issues Resolved
#326
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.