-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config to reject Iceberg queries without partition pruning #20118
Add config to reject Iceberg queries without partition pruning #20118
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Previous commit cannot prevent all the issue queries like below.
So, I added other options to force filtering on the specific partition fields.
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
@cla-bot check |
The cla-bot has been summoned, and re-checked this pull request! |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
@raunaqmorarka you merged a related PR .. could you take a look .. also do we really want to go down this approach of more and more policies for query shape? |
For the provided example of
Partition pruning should take place in iceberg using the code at trino/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java Line 464 in 9730845
as long as log_ts is a partitioning column.Can you verify that this code is not performing pruning in your case and check why it's not working ? |
@mosabua @raunaqmorarka We currently deploy Trino based on version On the vanilla version
I have observed partition pruning is utilized only if Honestly, I didn't check the root cause, why it's not working, but it looks like just a reverse issue case of #12925, fixed with #13567 by @findepi |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
Description
We adopt the new catalog option
iceberg.query-partition-filter-required
introduced by #17263 (sincev430
)But, we realized this option cannot prevent all the full scan queries with some edge cases.
We migrated some tables from Hive format to Iceberg format, and Trino users in our company usually submit queries to those new Iceberg format tables, just simply ported with
cast(date(log_ts) as varchar)
from their existing queries to legacy Hive format table (w/ string-typed date partition field, ex:log_dt="2023-12-14"
), like below.Although the queries like above end up with trying full-scan on the table, but those queries passed the validation-checks of
iceberg.query-partition-filter-required=true
. and I found that validation logic allow the case of partitioning field in just constraint columns of query plans.How about adding more strict constraint option
iceberg.query-partition-pruning-required
to prevent those edge cases and ensure partition-pruning on the query plan..?I tried some queries to test this new option working well, but honestly.. I cannot make sure, there is no side-effect.
@zhangminglei , Could you review this minor update to your nice contribution..?
This looks like just a reverse issue case of #12925, fixed with #13567 by @findepi
If there is more fancy way to cover these edge cases, please fix it.. or tell me that alternative solution..!
Related PR