-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update iceberg to 0.13.1 #11032
Update iceberg to 0.13.1 #11032
Conversation
@@ -104,7 +104,8 @@ public Metrics getMetrics() | |||
private static Metrics computeMetrics(MetricsConfig metricsConfig, Schema icebergSchema, ColumnMetadata<OrcType> orcColumns, long fileRowCount, Optional<ColumnMetadata<ColumnStatistics>> columnStatistics) | |||
{ | |||
if (columnStatistics.isEmpty()) { | |||
return new Metrics(fileRowCount, null, null, null, null, null); | |||
// TODO add implementation of Map<Integer, Long> nanValueCounts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it a Note, not TODO. CBO has no use for NaN counts today
@@ -46,7 +46,7 @@ public void testRoundTrip() | |||
Map<Integer, ByteBuffer> lowerBounds = ImmutableMap.of(13, ByteBuffer.wrap(new byte[] {0, 8, 9})); | |||
Map<Integer, ByteBuffer> upperBounds = ImmutableMap.of(17, ByteBuffer.wrap(new byte[] {5, 4, 0})); | |||
|
|||
Metrics expected = new Metrics(recordCount, columnSizes, valueCounts, nullValueCounts, lowerBounds, upperBounds); | |||
Metrics expected = new Metrics(recordCount, columnSizes, valueCounts, nullValueCounts, null, lowerBounds, upperBounds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty map instead?
@homar are you aware of any changes impacting users that flows from this upgrade? |
I am not. Main reason for this update is that it should allow us to implement row level deletes and updates for orc files. |
60df744
to
c8dcd1e
Compare
@@ -40,6 +41,7 @@ public MetricsWrapper( | |||
columnSizes, | |||
valueCounts, | |||
nullValueCounts, | |||
ImmutableMap.of(), // Trino CBO doesn't utilise nanValueCounts at all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are Iceberg metrics that are part of the Iceberg specification, and thus we should be writing them. The purpose of MetricsWrapper
is to allow JSON serialization of Iceberg Metrics
, so we should pass the value through here. The Parquet writer already collects NaN counts, and the ORC writer should be updated to collect them in the future.
Saying "Trino doesn't use them so we don't care" is hostile.
@@ -104,7 +104,8 @@ public Metrics getMetrics() | |||
private static Metrics computeMetrics(MetricsConfig metricsConfig, Schema icebergSchema, ColumnMetadata<OrcType> orcColumns, long fileRowCount, Optional<ColumnMetadata<ColumnStatistics>> columnStatistics) | |||
{ | |||
if (columnStatistics.isEmpty()) { | |||
return new Metrics(fileRowCount, null, null, null, null, null); | |||
// Trino CBO doesn't utilise nanValueCounts at all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove comment
@@ -153,6 +154,7 @@ private static Metrics computeMetrics(MetricsConfig metricsConfig, Schema iceber | |||
null, // TODO: Add column size accounting to ORC column writers | |||
valueCounts.isEmpty() ? null : valueCounts, | |||
nullCounts.isEmpty() ? null : nullCounts, | |||
null, // Trino CBO doesn't utilise nanValueCounts at all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace comment with
// TODO: Add nanValueCounts to ORC writer
44e5b7c
to
922ea65
Compare
@electrum could you please take another look ? |
One of the changes in 0.13 was to deprecate the Can you add an appropriate test to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, otherwise looks good
@@ -237,7 +237,7 @@ public void testBasic() | |||
|
|||
// Check NaN value count | |||
// TODO: add more checks after NaN info is collected | |||
assertNull(datafile.getNanValueCounts()); | |||
assertEquals(datafile.getNanValueCounts(), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert
@@ -328,7 +328,7 @@ public void testWithNaNs() | |||
datafile.getValueCounts().values().forEach(valueCount -> assertEquals(valueCount, (Long) 3L)); | |||
|
|||
// TODO: add more checks after NaN info is collected | |||
assertNull(datafile.getNanValueCounts()); | |||
assertEquals(datafile.getNanValueCounts(), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert
@@ -46,7 +46,7 @@ public void testRoundTrip() | |||
Map<Integer, ByteBuffer> lowerBounds = ImmutableMap.of(13, ByteBuffer.wrap(new byte[] {0, 8, 9})); | |||
Map<Integer, ByteBuffer> upperBounds = ImmutableMap.of(17, ByteBuffer.wrap(new byte[] {5, 4, 0})); | |||
|
|||
Metrics expected = new Metrics(recordCount, columnSizes, valueCounts, nullValueCounts, lowerBounds, upperBounds); | |||
Metrics expected = new Metrics(recordCount, columnSizes, valueCounts, nullValueCounts, ImmutableMap.of(), lowerBounds, upperBounds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add values so we test the serialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also test NaN counts for Parquet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any tests for metrics for Parquet format. Nothing like TestIcebergOrcMetricsCollection
exists for parquet. I will add a ticket to create such a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case that this PR gets merged only after #11062 please do remove the Iceberg related code for deleting table metadata files in |
922ea65
to
33685cd
Compare
33685cd
to
d032b67
Compare
Description
Updates apache iceberg version to 0.13.1
General information
Related issues, pull requests, and links
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: