-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgraded version for parquet-hadoop #5469
Upgraded version for parquet-hadoop #5469
Conversation
@@ -289,7 +289,7 @@ class Classpaths { | |||
|
|||
static void inheritParquetHadoop(Project p, String configName = JavaPlugin.IMPLEMENTATION_CONFIGURATION_NAME) { | |||
Configuration config = p.configurations.getByName(configName) | |||
addDependency(config, 'org.apache.parquet', 'parquet-hadoop', '1.13.1') | |||
addDependency(config, 'org.apache.parquet', 'parquet-hadoop', '1.14.0') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should investigate what the release notes means for:
PARQUET-1822 - Parquet without Hadoop dependencies
that would be quite nice if we don't need some of the hadoop deps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at the linked changes and the main commit seems like this: https://github.com/apache/parquet-mr/pull/1141/files#diff-b044ae9879a94e2b8a49d6e6911ea5498ef162df1373cc049ded6256980a7248
One interesting class they have now added are org.apache.parquet.conf.PlainParquetConfiguration
to replace org.apache.hadoop.conf.Configuration
.
Some other interesting classes that I noticed are
org.apache.parquet.hadoop.CodecFactory
which can potentially replace the usage oforg.apache.hadoop.io.compress.CompressionCodecFactory
.org.apache.parquet.hadoop.CodecFactory.HeapBytesCompressor
which can replaceorg.apache.hadoop.io.compress.Compressor
org.apache.parquet.hadoop.CodecFactory.HeapBytesDecompressor
which can replaceorg.apache.hadoop.io.compress.Decompressor
I think there is enough here so we can get rid of the hadoop-common
dependencies. We would still need parquet-hadoop
though which internally would be using hadoop-common
only.
Do you think that would be beneficial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Is it okay if I make a separate issue for this?
This would require some dedicated effort for the change as well as for the benchmarking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created this issue: #5517
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create and link issue to here for follow up wrt https://issues.apache.org/jira/browse/PARQUET-1822
Upgrading to the latest parquet-mr release: https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1140