Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Iceberg cost-based plans with small files on TPC-DS #16645

Merged

Conversation

krvikash
Copy link
Contributor

@krvikash krvikash commented Mar 21, 2023

Test Iceberg cost-based plans with small files on TPC-DS. Test against unpartitioned small Parquet files. The total metadata file size added here is 2.4 MB.

The TPC-DS tables were generated by setting iceberg.target_max_file_size = '50MB' session property.

Data Location for tables and their respective data file count and size:

============== NDV: [iceberg-50MB-files-tpcds-sf1000-PARQUET] ============== iceberg.target_max_file_size = '50MB'; ==============
01. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/call_center/data], fileCount: [1], totalContentSize: [10.2 kB], averageFileSize: [10.2 kB]
02. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/catalog_page/data], fileCount: [1], totalContentSize: [932.3 kB], averageFileSize: [932.3 kB]
03. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/catalog_returns/data], fileCount: [183], totalContentSize: [8.4 GB], averageFileSize: [45.9 MB]
04. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/catalog_sales/data], fileCount: [1715], totalContentSize: [83.8 GB], averageFileSize: [48.9 MB]
05. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/customer/data], fileCount: [12], totalContentSize: [447.1 MB], averageFileSize: [37.3 MB]
06. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/customer_address/data], fileCount: [3], totalContentSize: [71.4 MB], averageFileSize: [23.8 MB]
07. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/customer_demographics/data], fileCount: [1], totalContentSize: [2.1 MB], averageFileSize: [2.1 MB]
08. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/date_dim/data], fileCount: [1], totalContentSize: [965.4 kB], averageFileSize: [965.4 kB]
09. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/household_demographics/data], fileCount: [1], totalContentSize: [9.4 kB], averageFileSize: [9.4 kB]
10. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/income_band/data], fileCount: [1], totalContentSize: [743 B], averageFileSize: [743 B]
11. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/inventory/data], fileCount: [46], totalContentSize: [1.9 GB], averageFileSize: [40.6 MB]
12. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/item/data], fileCount: [1], totalContentSize: [17.5 MB], averageFileSize: [17.5 MB]
13. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/promotion/data], fileCount: [1], totalContentSize: [46.4 kB], averageFileSize: [46.4 kB]
14. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/reason/data], fileCount: [1], totalContentSize: [1.3 kB], averageFileSize: [1.3 kB]
15. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/ship_mode/data], fileCount: [1], totalContentSize: [1.7 kB], averageFileSize: [1.7 kB]
16. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/store/data], fileCount: [1], totalContentSize: [63.7 kB], averageFileSize: [63.7 kB]
17. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/store_returns/data], fileCount: [261], totalContentSize: [12.6 GB], averageFileSize: [48.2 MB]
18. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/store_sales/data], fileCount: [2086], totalContentSize: [104.6 GB], averageFileSize: [50.1 MB]
19. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/time_dim/data], fileCount: [1], totalContentSize: [405.4 kB], averageFileSize: [405.4 kB]
20. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/warehouse/data], fileCount: [1], totalContentSize: [3.8 kB], averageFileSize: [3.8 kB]
21. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/web_page/data], fileCount: [1], totalContentSize: [36.7 kB], averageFileSize: [36.7 kB]
22. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/web_returns/data], fileCount: [94], totalContentSize: [4.1 GB], averageFileSize: [43.9 MB]
23. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/web_sales/data], fileCount: [875], totalContentSize: [38.9 GB], averageFileSize: [44.5 MB]
24. Location: [s3://starburst-benchmarks-data/iceberg-50MB-files-tpcds-sf1000-PARQUET/web_site/data], fileCount: [1], totalContentSize: [10.9 kB], averageFileSize: [10.9 kB]
============== Total Size: [254.8 GB] ==============

Test against unpartitioned small Parquet files
@cla-bot cla-bot bot added the cla-signed label Mar 21, 2023
@krvikash krvikash self-assigned this Mar 21, 2023
@krvikash krvikash added the no-release-notes This pull request does not require release notes entry label Mar 21, 2023
@findepi
Copy link
Member

findepi commented Mar 21, 2023

Please add in PR description what is the total file size being added here.

@krvikash
Copy link
Contributor Author

Please add in PR description what is the total file size being added here.

Updated.

@findepi findepi merged commit 8eda7a5 into trinodb:master Mar 22, 2023
@github-actions github-actions bot added this to the 411 milestone Mar 23, 2023
@krvikash krvikash deleted the iceberg-small-files-cost-based-plan branch March 23, 2023 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed no-release-notes This pull request does not require release notes entry
Development

Successfully merging this pull request may close these issues.

3 participants