Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect partition pushdown when no partitionSpec exists in Iceberg #21253

Conversation

jinyangli34
Copy link
Contributor

Description

canEnforceColumnConstraintInSpecs may incorrectly return True when no partitionSpec match found. This is because stream.allMatch always returns true on empty stream.

Additional context and related issues

May create incorrect query plan:
without data, plan treats a as partition column.

trino:default> explain select * FROM iceberg.tmp."jl_iceberg_0325" where a = 'a';
                                            Query Plan
---------------------------------------------------------------------------------------------------
 Trino version: 439.0
 Fragment 0 [SOURCE]
     Output layout: [a, ds]
     Output partitioning: SINGLE []
     Output[columnNames = [a, ds]]
     │   Layout: [a:varchar, ds:varchar]
     │   Estimates: {rows: 0 (0B), cpu: 0, memory: 0B, network: 0B}
     └─ TableScan[table = iceberg:tmp.jl_iceberg_0325$dataFOR VERSION AS OF 6043893226421131063]
            Layout: [a:varchar, ds:varchar]
            Estimates: {rows: 0 (0B), cpu: 0, memory: 0B, network: 0B}
            a := 1:a:varchar
                :: [[a]]
            ds := 2:ds:varchar

Issue disappear after a write:

trino:default> explain select * FROM iceberg.tmp."jl_iceberg_0325" where a = 'a';
                                                                Query Plan
------------------------------------------------------------------------------------------------------------------------------------------
 Trino version: 439.0
 Fragment 0 [SOURCE]
     Output layout: [a, ds]
     Output partitioning: SINGLE []
     Output[columnNames = [a, ds]]
     │   Layout: [a:varchar, ds:varchar]
     │   Estimates: {rows: 1 (230B), cpu: 0, memory: 0B, network: 0B}
     └─ ScanFilter[table = iceberg:tmp.jl_iceberg_0325$dataFOR VERSION AS OF 637562949208903568, filterPredicate = ("a" = VARCHAR 'a')]
            Layout: [a:varchar, ds:varchar]
            Estimates: {rows: 1 (230B), cpu: 230, memory: 0B, network: 0B}/{rows: 1 (230B), cpu: 230, memory: 0B, network: 0B}
            a := 1:a:varchar
            ds := 2:ds:varchar

Release notes

(X) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

Copy link

cla-bot bot commented Mar 25, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: jinyang_li.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@github-actions github-actions bot added the iceberg Iceberg connector label Mar 26, 2024
Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jinyangli34 for spotting the issue and contributing the fix ❤️

@findinpath findinpath added the bug Something isn't working label Mar 26, 2024
@jinyangli34 jinyangli34 force-pushed the jinyang-stream_allmatch_iceberg_enforce_column_constraint branch from d8785fa to 4ab6766 Compare March 26, 2024 17:59
@cla-bot cla-bot bot added the cla-signed label Mar 26, 2024
@findinpath
Copy link
Contributor

findinpath commented Mar 26, 2024

One task was interrupted.
Rerun the tests

git commit --allow-empty  -m "empty" && git push origin HEAD

Also keep the commit message up to 80 chars per line.

The constraint check was previously misbehaving when dealing with empty tables
due to empty_stream.allMatch(...) always returns True.
@jinyangli34 jinyangli34 force-pushed the jinyang-stream_allmatch_iceberg_enforce_column_constraint branch from 4ab6766 to 27060c3 Compare March 26, 2024 22:07
@raunaqmorarka raunaqmorarka merged commit 03b2c72 into trinodb:master Mar 27, 2024
42 checks passed
@github-actions github-actions bot added this to the 444 milestone Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

3 participants