Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-991: Fix the bug of encrypted column read crash #905

Merged
merged 5 commits into from
Sep 12, 2021

Conversation

guiyanakuang
Copy link
Member

@guiyanakuang guiyanakuang commented Sep 11, 2021

What changes were proposed in this pull request?

  1. RowIndex is never marked as encrypted. The StreamName constructor adds encryption to make rowIndex encrypted when needed.

TreeWriterBase.java

  public void writeStripe(int requiredIndexEntries) throws IOException {
      .....
-    context.writeIndex(new StreamName(id, OrcProto.Stream.Kind.ROW_INDEX), rowIndex);
+    context.writeIndex(new StreamName(id, OrcProto.Stream.Kind.ROW_INDEX, encryption), rowIndex);
      .....
  }
  1. findStreams in StripePlanner.java total offset order and write inconsistency.
    finalizeStripe In PhysicalFsWriter.java
  1. write the unencrypted index streams
  2. write the encrypted index streams
  3. write the unencrypted data streams
  4. write the encrypted data streams
  @Override
  public void finalizeStripe(OrcProto.StripeFooter.Builder footerBuilder,
                             OrcProto.StripeInformation.Builder dirEntry
                             ) throws IOException {
    .....
    // write the unencrypted index streams
    unencrypted.writeStreams(StreamName.Area.INDEX, rawWriter);
    // write the encrypted index streams
    for (VariantTracker variant: variants.values()) {
      variant.writeStreams(StreamName.Area.INDEX, rawWriter);
    }

    // write the unencrypted data streams
    unencrypted.writeStreams(StreamName.Area.DATA, rawWriter);
    // write out the encrypted data streams
    for (VariantTracker variant: variants.values()) {
      variant.writeStreams(StreamName.Area.DATA, rawWriter);
    }
    .....
  }

findStreams in StripePlanner.java

  1. total offset the unencrypted index/data streams
  2. total offset encrypted index/data streams
  private void findStreams(long streamStart,
                           OrcProto.StripeFooter footer,
                           boolean[] columnInclude) throws IOException {
    long currentOffset = streamStart;
    Arrays.fill(bloomFilterKinds, null);
    for(OrcProto.Stream stream: footer.getStreamsList()) {
      currentOffset += handleStream(currentOffset, columnInclude, stream, null);
    }

    // Add the encrypted streams that we are using
    for(ReaderEncryptionVariant variant: encryption.getVariants()) {
      int variantId = variant.getVariantId();
      OrcProto.StripeEncryptionVariant stripeVariant =
          footer.getEncryption(variantId);
      for(OrcProto.Stream stream: stripeVariant.getStreamsList()) {
        currentOffset += handleStream(currentOffset, columnInclude, stream, variant);
      }
    }
  }

Causes misalignment of data reading.
This pr ensures that the read offset is consistent with the write.

  1. Fix Decimal64TreeWriter stream not binding encryption.

Why are the changes needed?

Fix the bug of encrypted column read crash.

How was this patch tested?

Added unit test for reading encrypted columns.

@github-actions github-actions bot added the JAVA label Sep 11, 2021
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @guiyanakuang .

cc @omalley

@@ -279,12 +280,13 @@ private void buildEncodings(OrcProto.StripeFooter footer,
private long handleStream(long offset,
boolean[] columnInclude,
OrcProto.Stream stream,
StreamName.Area area,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add area into the function description, too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in cb829a1. But the javadoc generation failed. It seems to be caused by network instability.


import static org.junit.jupiter.api.Assertions.assertEquals;

public class TestEncryption {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this.

@guiyanakuang
Copy link
Member Author

@dongjoon-hyun Thanks for the review.

I added an additional fix: Decimal64TreeWriter data stream are not bound to encryption

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you so much, @guiyanakuang .
Merged to main/1.7/1.6.

@dongjoon-hyun dongjoon-hyun merged commit 792c3f8 into apache:main Sep 12, 2021
dongjoon-hyun pushed a commit that referenced this pull request Sep 12, 2021
### What changes were proposed in this pull request?

1. RowIndex is never marked as encrypted.  The StreamName constructor adds encryption to make rowIndex encrypted when needed.

TreeWriterBase.java
```java
  public void writeStripe(int requiredIndexEntries) throws IOException {
      .....
-    context.writeIndex(new StreamName(id, OrcProto.Stream.Kind.ROW_INDEX), rowIndex);
+    context.writeIndex(new StreamName(id, OrcProto.Stream.Kind.ROW_INDEX, encryption), rowIndex);
      .....
  }
```

2.  findStreams in StripePlanner.java total offset order and write inconsistency.
finalizeStripe In PhysicalFsWriter.java
> 1. write the unencrypted index streams
> 2. write the encrypted index streams
> 3. write the unencrypted data streams
> 4. write the encrypted data streams
```java
  @OverRide
  public void finalizeStripe(OrcProto.StripeFooter.Builder footerBuilder,
                             OrcProto.StripeInformation.Builder dirEntry
                             ) throws IOException {
    .....
    // write the unencrypted index streams
    unencrypted.writeStreams(StreamName.Area.INDEX, rawWriter);
    // write the encrypted index streams
    for (VariantTracker variant: variants.values()) {
      variant.writeStreams(StreamName.Area.INDEX, rawWriter);
    }

    // write the unencrypted data streams
    unencrypted.writeStreams(StreamName.Area.DATA, rawWriter);
    // write out the encrypted data streams
    for (VariantTracker variant: variants.values()) {
      variant.writeStreams(StreamName.Area.DATA, rawWriter);
    }
    .....
  }
```

findStreams in StripePlanner.java
> 1. total offset the unencrypted index/data streams
> 2. total offset encrypted index/data streams

```java
  private void findStreams(long streamStart,
                           OrcProto.StripeFooter footer,
                           boolean[] columnInclude) throws IOException {
    long currentOffset = streamStart;
    Arrays.fill(bloomFilterKinds, null);
    for(OrcProto.Stream stream: footer.getStreamsList()) {
      currentOffset += handleStream(currentOffset, columnInclude, stream, null);
    }

    // Add the encrypted streams that we are using
    for(ReaderEncryptionVariant variant: encryption.getVariants()) {
      int variantId = variant.getVariantId();
      OrcProto.StripeEncryptionVariant stripeVariant =
          footer.getEncryption(variantId);
      for(OrcProto.Stream stream: stripeVariant.getStreamsList()) {
        currentOffset += handleStream(currentOffset, columnInclude, stream, variant);
      }
    }
  }
```

Causes misalignment of data reading.
This pr ensures that the read offset is consistent with the write.

3. Fix Decimal64TreeWriter stream not binding encryption.

### Why are the changes needed?

Fix the bug of encrypted column read crash.

### How was this patch tested?

Added unit test for reading encrypted columns.

(cherry picked from commit 792c3f8)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

For branch-1.6, I changed the test suite into old JUnit style during backporting.

dongjoon-hyun pushed a commit that referenced this pull request Sep 12, 2021
1. RowIndex is never marked as encrypted.  The StreamName constructor adds encryption to make rowIndex encrypted when needed.

TreeWriterBase.java
```java
  public void writeStripe(int requiredIndexEntries) throws IOException {
      .....
-    context.writeIndex(new StreamName(id, OrcProto.Stream.Kind.ROW_INDEX), rowIndex);
+    context.writeIndex(new StreamName(id, OrcProto.Stream.Kind.ROW_INDEX, encryption), rowIndex);
      .....
  }
```

2.  findStreams in StripePlanner.java total offset order and write inconsistency.
finalizeStripe In PhysicalFsWriter.java
> 1. write the unencrypted index streams
> 2. write the encrypted index streams
> 3. write the unencrypted data streams
> 4. write the encrypted data streams
```java
  @OverRide
  public void finalizeStripe(OrcProto.StripeFooter.Builder footerBuilder,
                             OrcProto.StripeInformation.Builder dirEntry
                             ) throws IOException {
    .....
    // write the unencrypted index streams
    unencrypted.writeStreams(StreamName.Area.INDEX, rawWriter);
    // write the encrypted index streams
    for (VariantTracker variant: variants.values()) {
      variant.writeStreams(StreamName.Area.INDEX, rawWriter);
    }

    // write the unencrypted data streams
    unencrypted.writeStreams(StreamName.Area.DATA, rawWriter);
    // write out the encrypted data streams
    for (VariantTracker variant: variants.values()) {
      variant.writeStreams(StreamName.Area.DATA, rawWriter);
    }
    .....
  }
```

findStreams in StripePlanner.java
> 1. total offset the unencrypted index/data streams
> 2. total offset encrypted index/data streams

```java
  private void findStreams(long streamStart,
                           OrcProto.StripeFooter footer,
                           boolean[] columnInclude) throws IOException {
    long currentOffset = streamStart;
    Arrays.fill(bloomFilterKinds, null);
    for(OrcProto.Stream stream: footer.getStreamsList()) {
      currentOffset += handleStream(currentOffset, columnInclude, stream, null);
    }

    // Add the encrypted streams that we are using
    for(ReaderEncryptionVariant variant: encryption.getVariants()) {
      int variantId = variant.getVariantId();
      OrcProto.StripeEncryptionVariant stripeVariant =
          footer.getEncryption(variantId);
      for(OrcProto.Stream stream: stripeVariant.getStreamsList()) {
        currentOffset += handleStream(currentOffset, columnInclude, stream, variant);
      }
    }
  }
```

Causes misalignment of data reading.
This pr ensures that the read offset is consistent with the write.

3. Fix Decimal64TreeWriter stream not binding encryption.

Fix the bug of encrypted column read crash.

Added unit test for reading encrypted columns.

(cherry picked from commit 792c3f8)
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit a3de6da)
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants