Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional check for query partition filter for Delta #18345

Conversation

marcinsbd
Copy link
Contributor

@marcinsbd marcinsbd commented Jul 19, 2023

Description

For Delta partitioned tables, we should reject the table scan produced by the planner when the query does not have partition field.

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# DeltaLake
* Add option to enforce that a filter on a partition key be present in the query. This can be enabled by setting the 
  ``delta.query-partition-filter-required`` config property or the ``query_partition_filter_required`` session property 
  to ``true``.

@cla-bot cla-bot bot added the cla-signed label Jul 19, 2023
@github-actions github-actions bot added tests:hive delta-lake Delta Lake connector labels Jul 19, 2023
@marcinsbd marcinsbd force-pushed the enforce_filter_on_partitioned_columnn_in_deltalake branch from 4dfc3f3 to 5eb28fd Compare July 19, 2023 13:51
@marcinsbd marcinsbd force-pushed the enforce_filter_on_partitioned_columnn_in_deltalake branch 2 times, most recently from 998119d to 914337f Compare July 21, 2023 08:36
@marcinsbd marcinsbd marked this pull request as draft July 21, 2023 09:19
@ebyhr
Copy link
Member

ebyhr commented Jul 21, 2023

cc: @findepi @findinpath

@marcinsbd marcinsbd force-pushed the enforce_filter_on_partitioned_columnn_in_deltalake branch 3 times, most recently from 2d099ed to 4d08f8a Compare July 24, 2023 11:01
@marcinsbd marcinsbd marked this pull request as ready for review July 25, 2023 08:23
@marcinsbd marcinsbd force-pushed the enforce_filter_on_partitioned_columnn_in_deltalake branch from 4d08f8a to 5a11a61 Compare July 26, 2023 09:46
@marcinsbd marcinsbd force-pushed the enforce_filter_on_partitioned_columnn_in_deltalake branch from 4d21724 to 9d33b9a Compare August 1, 2023 12:49
@marcinsbd marcinsbd requested a review from findinpath August 1, 2023 12:51
Copy link
Member

@Praveen2112 Praveen2112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% changes

List<String> partitionColumns = deltaLakeTableHandle.getMetadataEntry().getCanonicalPartitionColumns();
if (!partitionColumns.isEmpty()) {
if (deltaLakeTableHandle.getAnalyzeHandle().isPresent()) {
throw new TrinoException(QUERY_REJECTED, "The session with enabled property delta.query-partition-filter-required does not work with ANALYZE statement.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we re-word the error as Session with delta.query-partition-filter-required enabled does not work with ANALYZE statement

WDYT @findinpath

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delta.query-partition-filter-required is not a session option, but rather a configuration option.

cc @marcinsbd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevertheless, @Praveen2112 , @findinpath , is the exception message OK? if not pls provide how it's supposed be formulated

Copy link
Contributor

@findinpath findinpath Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take on the exception message:

Suggested change
throw new TrinoException(QUERY_REJECTED, "The session with enabled property delta.query-partition-filter-required does not work with ANALYZE statement.");
throw new TrinoException(QUERY_REJECTED, "ANALYZE statement can't be performed on partitioned tables because filtering is required on at least one partition. The partition filtering check can be however disabled with the catalog session property 'query_partition_filter_required'.");

Consider that the analysts interacting with Trino don't necessarily have access to the configuration properties (but they do have access to the session properties).

@marcinsbd marcinsbd force-pushed the enforce_filter_on_partitioned_columnn_in_deltalake branch from 9d33b9a to 64bc8fc Compare August 1, 2023 17:58
@marcinsbd marcinsbd requested a review from Praveen2112 August 1, 2023 17:59
Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments

@marcinsbd marcinsbd force-pushed the enforce_filter_on_partitioned_columnn_in_deltalake branch from 64bc8fc to 49a4cf1 Compare August 1, 2023 21:41
@marcinsbd marcinsbd force-pushed the enforce_filter_on_partitioned_columnn_in_deltalake branch from 49a4cf1 to 12c6518 Compare August 2, 2023 00:00
@marcinsbd marcinsbd requested review from ebyhr and findinpath August 2, 2023 00:02
@marcinsbd marcinsbd force-pushed the enforce_filter_on_partitioned_columnn_in_deltalake branch from 12c6518 to d0e750b Compare August 2, 2023 08:08
@marcinsbd marcinsbd requested a review from findinpath August 2, 2023 08:08
@marcinsbd marcinsbd force-pushed the enforce_filter_on_partitioned_columnn_in_deltalake branch from d0e750b to 5b2bb16 Compare August 2, 2023 11:01
@Praveen2112 Praveen2112 merged commit 1ba82f8 into trinodb:master Aug 2, 2023
@github-actions github-actions bot added this to the 423 milestone Aug 2, 2023
@Praveen2112
Copy link
Member

Thanks for working on this.

@mosabua
Copy link
Member

mosabua commented Aug 2, 2023

Does this really not need a release notes entry?

@Praveen2112
Copy link
Member

@mosabua Updated the release notes

if (deltaLakeTableHandle.getAnalyzeHandle().isPresent()) {
throw new TrinoException(
QUERY_REJECTED,
"ANALYZE statement can not be performed on partitioned tables because filtering is required on at least one partition. However, the partition filtering check can be disabled with the catalog session property 'query_partition_filter_required'.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a code comment that Delta ANAlyze currently does not support specificying partitioning predicates (unlike eg Hive's)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

8 participants