-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-991: Fix the bug of encrypted column read crash #905
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, @guiyanakuang .
cc @omalley
@@ -279,12 +280,13 @@ private void buildEncodings(OrcProto.StripeFooter footer, | |||
private long handleStream(long offset, | |||
boolean[] columnInclude, | |||
OrcProto.Stream stream, | |||
StreamName.Area area, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add area
into the function description, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in cb829a1. But the javadoc generation failed. It seems to be caused by network instability.
|
||
import static org.junit.jupiter.api.Assertions.assertEquals; | ||
|
||
public class TestEncryption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this.
@dongjoon-hyun Thanks for the review. I added an additional fix: Decimal64TreeWriter data stream are not bound to encryption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you so much, @guiyanakuang .
Merged to main/1.7/1.6.
### What changes were proposed in this pull request? 1. RowIndex is never marked as encrypted. The StreamName constructor adds encryption to make rowIndex encrypted when needed. TreeWriterBase.java ```java public void writeStripe(int requiredIndexEntries) throws IOException { ..... - context.writeIndex(new StreamName(id, OrcProto.Stream.Kind.ROW_INDEX), rowIndex); + context.writeIndex(new StreamName(id, OrcProto.Stream.Kind.ROW_INDEX, encryption), rowIndex); ..... } ``` 2. findStreams in StripePlanner.java total offset order and write inconsistency. finalizeStripe In PhysicalFsWriter.java > 1. write the unencrypted index streams > 2. write the encrypted index streams > 3. write the unencrypted data streams > 4. write the encrypted data streams ```java @OverRide public void finalizeStripe(OrcProto.StripeFooter.Builder footerBuilder, OrcProto.StripeInformation.Builder dirEntry ) throws IOException { ..... // write the unencrypted index streams unencrypted.writeStreams(StreamName.Area.INDEX, rawWriter); // write the encrypted index streams for (VariantTracker variant: variants.values()) { variant.writeStreams(StreamName.Area.INDEX, rawWriter); } // write the unencrypted data streams unencrypted.writeStreams(StreamName.Area.DATA, rawWriter); // write out the encrypted data streams for (VariantTracker variant: variants.values()) { variant.writeStreams(StreamName.Area.DATA, rawWriter); } ..... } ``` findStreams in StripePlanner.java > 1. total offset the unencrypted index/data streams > 2. total offset encrypted index/data streams ```java private void findStreams(long streamStart, OrcProto.StripeFooter footer, boolean[] columnInclude) throws IOException { long currentOffset = streamStart; Arrays.fill(bloomFilterKinds, null); for(OrcProto.Stream stream: footer.getStreamsList()) { currentOffset += handleStream(currentOffset, columnInclude, stream, null); } // Add the encrypted streams that we are using for(ReaderEncryptionVariant variant: encryption.getVariants()) { int variantId = variant.getVariantId(); OrcProto.StripeEncryptionVariant stripeVariant = footer.getEncryption(variantId); for(OrcProto.Stream stream: stripeVariant.getStreamsList()) { currentOffset += handleStream(currentOffset, columnInclude, stream, variant); } } } ``` Causes misalignment of data reading. This pr ensures that the read offset is consistent with the write. 3. Fix Decimal64TreeWriter stream not binding encryption. ### Why are the changes needed? Fix the bug of encrypted column read crash. ### How was this patch tested? Added unit test for reading encrypted columns. (cherry picked from commit 792c3f8) Signed-off-by: Dongjoon Hyun <[email protected]>
For branch-1.6, I changed the test suite into old JUnit style during backporting. |
1. RowIndex is never marked as encrypted. The StreamName constructor adds encryption to make rowIndex encrypted when needed. TreeWriterBase.java ```java public void writeStripe(int requiredIndexEntries) throws IOException { ..... - context.writeIndex(new StreamName(id, OrcProto.Stream.Kind.ROW_INDEX), rowIndex); + context.writeIndex(new StreamName(id, OrcProto.Stream.Kind.ROW_INDEX, encryption), rowIndex); ..... } ``` 2. findStreams in StripePlanner.java total offset order and write inconsistency. finalizeStripe In PhysicalFsWriter.java > 1. write the unencrypted index streams > 2. write the encrypted index streams > 3. write the unencrypted data streams > 4. write the encrypted data streams ```java @OverRide public void finalizeStripe(OrcProto.StripeFooter.Builder footerBuilder, OrcProto.StripeInformation.Builder dirEntry ) throws IOException { ..... // write the unencrypted index streams unencrypted.writeStreams(StreamName.Area.INDEX, rawWriter); // write the encrypted index streams for (VariantTracker variant: variants.values()) { variant.writeStreams(StreamName.Area.INDEX, rawWriter); } // write the unencrypted data streams unencrypted.writeStreams(StreamName.Area.DATA, rawWriter); // write out the encrypted data streams for (VariantTracker variant: variants.values()) { variant.writeStreams(StreamName.Area.DATA, rawWriter); } ..... } ``` findStreams in StripePlanner.java > 1. total offset the unencrypted index/data streams > 2. total offset encrypted index/data streams ```java private void findStreams(long streamStart, OrcProto.StripeFooter footer, boolean[] columnInclude) throws IOException { long currentOffset = streamStart; Arrays.fill(bloomFilterKinds, null); for(OrcProto.Stream stream: footer.getStreamsList()) { currentOffset += handleStream(currentOffset, columnInclude, stream, null); } // Add the encrypted streams that we are using for(ReaderEncryptionVariant variant: encryption.getVariants()) { int variantId = variant.getVariantId(); OrcProto.StripeEncryptionVariant stripeVariant = footer.getEncryption(variantId); for(OrcProto.Stream stream: stripeVariant.getStreamsList()) { currentOffset += handleStream(currentOffset, columnInclude, stream, variant); } } } ``` Causes misalignment of data reading. This pr ensures that the read offset is consistent with the write. 3. Fix Decimal64TreeWriter stream not binding encryption. Fix the bug of encrypted column read crash. Added unit test for reading encrypted columns. (cherry picked from commit 792c3f8) Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit a3de6da) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
TreeWriterBase.java
finalizeStripe In PhysicalFsWriter.java
findStreams in StripePlanner.java
Causes misalignment of data reading.
This pr ensures that the read offset is consistent with the write.
Why are the changes needed?
Fix the bug of encrypted column read crash.
How was this patch tested?
Added unit test for reading encrypted columns.