exploit iceberg row-count metadata #74

peterboncz · 2024-10-09T12:15:44Z

Examine the row-counts in the manifest and count how many rows are in the (existing or added) data resp. deletion files.

Use these two counts to pass the new named_parameter "explicit_cardinality" into the generated parquet_scans.

Note that because iceberg is passing in an explicit schema into the parquet_scan, it does not open any file during bind. That means the parquet_scans it generates did not have any cardinality information during query optimization.

That can cause rather bad query plans. This PR (pb/explicit-iceberg-cardinality) fixes that. Note that this PR needs the DuckDB PR pb/explicit-parquet-cardinality that adds the "explicit_cardinality" named_parameter to parquet_scan.

Examine the row-counts in the manifest and count how many rows are in the (existing or added) data an deletion files. Use these two counts to pass the new named_parameter "explicit_cardinality" into the generated parquet_scans. Note that because iceberg is passing in an explicit schema into the parquet_scan, it does not open any file during bind. That means the parquet_scans it generates did not have any cardinality information during query optimization. That can cause rather bad query plans. This PR (pb/explicit-iceberg-cardinality) fixes that. Note that this PR needs the DuckDB PR pb/exicit-parquet-cardinality that adds the "explicit_cardinality" named_parameter to parquet_scan.

peterboncz · 2024-10-09T13:20:31Z

This CI is expected to fail (unknown named_parameter 'explicit_cardinality') until duckdb/duckdb#14292 would have been merged

- add test that checks that there are cardinalities in the generated parquet_scans

samansmink

lgtm, thanks!

…rdinality exploit iceberg row-count metadata

peter added 2 commits October 9, 2024 14:09

std::move

bc2b45c

peterboncz marked this pull request as draft October 11, 2024 17:59

peterboncz marked this pull request as ready for review October 11, 2024 17:59

peterboncz marked this pull request as draft October 11, 2024 18:00

peterboncz marked this pull request as ready for review October 11, 2024 18:00

bogus commit to trigger CI

136480d

peterboncz marked this pull request as draft October 14, 2024 12:29

peterboncz marked this pull request as ready for review October 14, 2024 12:29

peter added 3 commits October 14, 2024 14:30

bugus commit to trigger CI

30a42d9

Merge branch 'pb/explicit-iceberg-cardinality' into HEAD

ac75f00

- correct test name

826397d

- add test that checks that there are cardinalities in the generated parquet_scans

samansmink approved these changes Oct 15, 2024

View reviewed changes

samansmink merged commit d62d91d into duckdb:main Oct 15, 2024
16 checks passed

mike-luabase pushed a commit to definite-app/duckdb_iceberg that referenced this pull request Oct 27, 2024

Merge pull request duckdb#74 from motherduckdb/pb/explicit-iceberg-ca…

0de4979

…rdinality exploit iceberg row-count metadata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exploit iceberg row-count metadata #74

exploit iceberg row-count metadata #74

peterboncz commented Oct 9, 2024 •

edited

Loading

peterboncz commented Oct 9, 2024

samansmink left a comment

exploit iceberg row-count metadata #74

exploit iceberg row-count metadata #74

Conversation

peterboncz commented Oct 9, 2024 • edited Loading

peterboncz commented Oct 9, 2024

samansmink left a comment

Choose a reason for hiding this comment

peterboncz commented Oct 9, 2024 •

edited

Loading