Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update iceberg to 0.13.1 #11032

Merged
merged 2 commits into from
Feb 24, 2022
Merged

Update iceberg to 0.13.1 #11032

merged 2 commits into from
Feb 24, 2022

Conversation

homar
Copy link
Member

@homar homar commented Feb 14, 2022

Description

Updates apache iceberg version to 0.13.1

General information

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`5678`)

@cla-bot cla-bot bot added the cla-signed label Feb 14, 2022
@@ -104,7 +104,8 @@ public Metrics getMetrics()
private static Metrics computeMetrics(MetricsConfig metricsConfig, Schema icebergSchema, ColumnMetadata<OrcType> orcColumns, long fileRowCount, Optional<ColumnMetadata<ColumnStatistics>> columnStatistics)
{
if (columnStatistics.isEmpty()) {
return new Metrics(fileRowCount, null, null, null, null, null);
// TODO add implementation of Map<Integer, Long> nanValueCounts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it a Note, not TODO. CBO has no use for NaN counts today

@@ -46,7 +46,7 @@ public void testRoundTrip()
Map<Integer, ByteBuffer> lowerBounds = ImmutableMap.of(13, ByteBuffer.wrap(new byte[] {0, 8, 9}));
Map<Integer, ByteBuffer> upperBounds = ImmutableMap.of(17, ByteBuffer.wrap(new byte[] {5, 4, 0}));

Metrics expected = new Metrics(recordCount, columnSizes, valueCounts, nullValueCounts, lowerBounds, upperBounds);
Metrics expected = new Metrics(recordCount, columnSizes, valueCounts, nullValueCounts, null, lowerBounds, upperBounds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty map instead?

@findepi
Copy link
Member

findepi commented Feb 14, 2022

@homar are you aware of any changes impacting users that flows from this upgrade?

@homar
Copy link
Member Author

homar commented Feb 14, 2022

@homar are you aware of any changes impacting users that flows from this upgrade?

I am not. Main reason for this update is that it should allow us to implement row level deletes and updates for orc files.

@homar homar force-pushed the homar/update_iceberg branch 2 times, most recently from 60df744 to c8dcd1e Compare February 15, 2022 09:39
@@ -40,6 +41,7 @@ public MetricsWrapper(
columnSizes,
valueCounts,
nullValueCounts,
ImmutableMap.of(), // Trino CBO doesn't utilise nanValueCounts at all
Copy link
Member

@electrum electrum Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are Iceberg metrics that are part of the Iceberg specification, and thus we should be writing them. The purpose of MetricsWrapper is to allow JSON serialization of Iceberg Metrics, so we should pass the value through here. The Parquet writer already collects NaN counts, and the ORC writer should be updated to collect them in the future.

Saying "Trino doesn't use them so we don't care" is hostile.

@@ -104,7 +104,8 @@ public Metrics getMetrics()
private static Metrics computeMetrics(MetricsConfig metricsConfig, Schema icebergSchema, ColumnMetadata<OrcType> orcColumns, long fileRowCount, Optional<ColumnMetadata<ColumnStatistics>> columnStatistics)
{
if (columnStatistics.isEmpty()) {
return new Metrics(fileRowCount, null, null, null, null, null);
// Trino CBO doesn't utilise nanValueCounts at all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment

@@ -153,6 +154,7 @@ private static Metrics computeMetrics(MetricsConfig metricsConfig, Schema iceber
null, // TODO: Add column size accounting to ORC column writers
valueCounts.isEmpty() ? null : valueCounts,
nullCounts.isEmpty() ? null : nullCounts,
null, // Trino CBO doesn't utilise nanValueCounts at all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace comment with

// TODO: Add nanValueCounts to ORC writer

@homar homar changed the title Update iceberg to 0.13.0 Update iceberg to 0.13.1 Feb 16, 2022
@homar homar force-pushed the homar/update_iceberg branch 2 times, most recently from 44e5b7c to 922ea65 Compare February 16, 2022 14:21
@homar
Copy link
Member Author

homar commented Feb 18, 2022

@electrum could you please take another look ?

@alexjo2144
Copy link
Member

One of the changes in 0.13 was to deprecate the write.folder-storage.path and write.object-storage.path properties, replacing them with write.data.path
apache/iceberg#3094

Can you add an appropriate test to TestIcebergSparkCompatibility similar to the existing one for the deprecated properties?
https://github.com/trinodb/trino/blob/master/testing/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java#L822

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, otherwise looks good

@@ -237,7 +237,7 @@ public void testBasic()

// Check NaN value count
// TODO: add more checks after NaN info is collected
assertNull(datafile.getNanValueCounts());
assertEquals(datafile.getNanValueCounts(), null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert

@@ -328,7 +328,7 @@ public void testWithNaNs()
datafile.getValueCounts().values().forEach(valueCount -> assertEquals(valueCount, (Long) 3L));

// TODO: add more checks after NaN info is collected
assertNull(datafile.getNanValueCounts());
assertEquals(datafile.getNanValueCounts(), null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert

@@ -46,7 +46,7 @@ public void testRoundTrip()
Map<Integer, ByteBuffer> lowerBounds = ImmutableMap.of(13, ByteBuffer.wrap(new byte[] {0, 8, 9}));
Map<Integer, ByteBuffer> upperBounds = ImmutableMap.of(17, ByteBuffer.wrap(new byte[] {5, 4, 0}));

Metrics expected = new Metrics(recordCount, columnSizes, valueCounts, nullValueCounts, lowerBounds, upperBounds);
Metrics expected = new Metrics(recordCount, columnSizes, valueCounts, nullValueCounts, ImmutableMap.of(), lowerBounds, upperBounds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add values so we test the serialization

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test NaN counts for Parquet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any tests for metrics for Parquet format. Nothing like TestIcebergOrcMetricsCollection exists for parquet. I will add a ticket to create such a test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath
Copy link
Contributor

In case that this PR gets merged only after #11062 please do remove the Iceberg related code for deleting table metadata files in io.trino.plugin.iceberg.TrinoHiveCatalog#dropTable

@homar homar force-pushed the homar/update_iceberg branch from 922ea65 to 33685cd Compare February 24, 2022 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants