Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2050: Expose repetition & definition level from ColumnIO #908

Merged
merged 3 commits into from
May 19, 2021

Conversation

sunchao
Copy link
Member

@sunchao sunchao commented May 14, 2021

Make sure you have checked all steps below.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

@sunchao
Copy link
Member Author

sunchao commented May 14, 2021

For rationale, please check the JIRA https://issues.apache.org/jira/browse/PARQUET-2050. Not sure where should I put those in the PR description.

@sunchao
Copy link
Member Author

sunchao commented May 14, 2021

@shangxinli @ggershinsky @gszadovszky could you review this? Thanks!

@shangxinli
Copy link
Contributor

@gszadovszky Is there any concern from you?

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine having these methods public if it is required for our users. Meanwhile, since it is public, it would be great having the proper javadoc comments for them.

@sunchao
Copy link
Member Author

sunchao commented May 17, 2021

Thanks for the review @gszadovszky . Added.

@shangxinli
Copy link
Contributor

LGTM. After fixing the checks failure, we can merge.

@sunchao
Copy link
Member Author

sunchao commented May 18, 2021

Thanks @shangxinli and @gszadovszky . Do you know how to see the details for the check failures? I opened the link and it just shows "This check failed".

@sunchao
Copy link
Member Author

sunchao commented May 18, 2021

Let me trigger the CI again to find out why.

@gszadovszky
Copy link
Contributor

I don't know why we had such failures but they were not only for this PR. As they are passing now I'm merging this PR.

@gszadovszky gszadovszky merged commit 10794e6 into apache:master May 19, 2021
@sunchao sunchao deleted the PARQUET-2050 branch May 19, 2021 19:24
elikkatz added a commit to TheWeatherCompany/parquet-mr that referenced this pull request Jun 2, 2021
* 'master' of https://github.com/apache/parquet-mr: (222 commits)
  PARQUET-2052: Integer overflow when writing huge binary using dictionary encoding (apache#910)
  PARQUET-2041: Add zstd to `parquet.compression` description of ParquetOutputFormat Javadoc (apache#899)
  PARQUET-2050: Expose repetition & definition level from ColumnIO (apache#908)
  PARQUET-1761: Lower Logging Level in ParquetOutputFormat (apache#745)
  PARQUET-2046: Upgrade Apache POM to 23 (apache#904)
  PARQUET-2048: Deprecate BaseRecordReader (apache#906)
  PARQUET-1922: Deprecate IOExceptionUtils (apache#825)
  PARQUET-2037: Write INT96 with parquet-avro (apache#901)
  PARQUET-2044: Enable ZSTD buffer pool by default (apache#903)
  PARQUET-2038: Upgrade Jackson version used in parquet encryption. (apache#898)
  Revert "[WIP] Refactor GroupReadSupport to unuse deprecated api (apache#894)"
  PARQUET-2027: Fix calculating directory offset for merge (apache#896)
  [WIP] Refactor GroupReadSupport to unuse deprecated api (apache#894)
  PARQUET-2030: Expose page size row check configurations to ParquetWriter.Builder (apache#895)
  PARQUET-2031: Upgrade to parquet-format 2.9.0 (apache#897)
  PARQUET-1448: Review of ParquetFileReader (apache#892)
  PARQUET-2020: Remove deprecated modules (apache#888)
  PARQUET-2025: Update Snappy version to 1.1.8.3 (apache#893)
  PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream` (apache#889)
  PARQUET-1982: Random access to row groups in ParquetFileReader (apache#871)
  ...

# Conflicts:
#	parquet-column/src/main/java/org/apache/parquet/example/data/simple/SimpleGroup.java
#	parquet-hadoop/pom.xml
#	parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
#	parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants