Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate Iceberg when reading partition columns only #19303

Merged
merged 1 commit into from
Oct 16, 2023

Conversation

findepi
Copy link
Member

@findepi findepi commented Oct 6, 2023

Description

Avoid data files I/O when

  • reading only partitioning columns
  • doing count(*) queries on Iceberg tables (with no group by, or grouping by partitioning columns)

@findepi findepi requested review from ebyhr and alexjo2144 October 6, 2023 21:27
@cla-bot cla-bot bot added the cla-signed label Oct 6, 2023
@findepi findepi force-pushed the findepi/iceberg-count-only branch from 7f1a960 to 7a10c4b Compare October 6, 2023 21:28
@findepi findepi changed the title Findepi/iceberg count only Process count(*) on Iceberg without opening data files Oct 6, 2023
@findepi
Copy link
Member Author

findepi commented Oct 6, 2023

cc @osscm

@@ -275,6 +273,7 @@ public void testReadWholePartition()
.addCopies(new FileOperation(METADATA_JSON, INPUT_FILE_NEW_STREAM), 1)
Copy link
Contributor

@findinpath findinpath Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests when doing count(*) from the whole table.

Add a test involving a filter on a non partition column on a partitioned table.

Also important check the file system accesses for min(C)/max(C) .

Also add another test when there are delete files for the table.

@findepi findepi changed the title Process count(*) on Iceberg without opening data files Accelerate Iceberg when reading partition columns only Oct 7, 2023
@findepi findepi force-pushed the findepi/iceberg-count-only branch from 7a10c4b to b587f51 Compare October 9, 2023 11:48
Manifests contain trustworthy information about record count, so it can
be used to answer the count(*) queries.
@findepi findepi force-pushed the findepi/iceberg-count-only branch from b587f51 to b756564 Compare October 16, 2023 09:48
@findepi findepi merged commit c2a30f8 into trinodb:master Oct 16, 2023
43 checks passed
@findepi findepi deleted the findepi/iceberg-count-only branch October 16, 2023 12:07
@github-actions github-actions bot added this to the 430 milestone Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

3 participants