Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try removing hadoop-common dependencies from parquet #5517

Closed
malhotrashivam opened this issue May 22, 2024 · 2 comments
Closed

Try removing hadoop-common dependencies from parquet #5517

malhotrashivam opened this issue May 22, 2024 · 2 comments
Assignees
Labels
feature request New feature or request triage
Milestone

Comments

@malhotrashivam
Copy link
Contributor

malhotrashivam commented May 22, 2024

In the latest v2.0 release of parquet-mr (issue PARQUET-1822), they have added a number of wrapper classes which should allow users to use parquet-hadoop without depending on hadoop-common.
We should work with these new wrappers to avoid the dependency in our code. Note that parquet-hadoop might still internally use hadoop-common though.

Found during #5469

@malhotrashivam malhotrashivam added feature request New feature or request triage labels May 22, 2024
@malhotrashivam malhotrashivam added this to the 3. Triage milestone May 22, 2024
@malhotrashivam malhotrashivam self-assigned this May 22, 2024
@malhotrashivam
Copy link
Contributor Author

Some notes:
Important PR : https://github.com/apache/parquet-mr/pull/1141/files#diff-b044ae9879a94e2b8a49d6e6911ea5498ef162df1373cc049ded6256980a7248

One interesting class they have now added are org.apache.parquet.conf.PlainParquetConfiguration to replace org.apache.hadoop.conf.Configuration.

Some other interesting classes:

  • org.apache.parquet.hadoop.CodecFactory which can potentially replace the usage of org.apache.hadoop.io.compress.CompressionCodecFactory.
  • org.apache.parquet.hadoop.CodecFactory.HeapBytesCompressor which can replace org.apache.hadoop.io.compress.Compressor.
  • org.apache.parquet.hadoop.CodecFactory.HeapBytesDecompressor which can replace org.apache.hadoop.io.compress.Decompressor.

@malhotrashivam
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request triage
Projects
None yet
Development

No branches or pull requests

1 participant