Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle repeated predicate pushdown into Hive connector #984

Closed
wants to merge 3 commits into from

Conversation

martint
Copy link
Member

@martint martint commented Jun 13, 2019

The previous implementation was only considering the first
attempt where a filter is pushed down into the Hive connector.
As a result, for a query like this, the partition filter above
the bottommost filter would be ignored:

SELECT * FROM (
    SELECT * FROM t WHERE a in (1, 2)
) u
WHERE u.pk = 'b';

@cla-bot cla-bot bot added the cla-signed label Jun 13, 2019
@findepi
Copy link
Member

findepi commented Jun 13, 2019

@martint make sure this change doesn't increase the number of times we query the Metastore for partitions.
Each call may be costly (in seconds or sometimes even worse).
You can easily track this with #946

Also, make sure that queries like

SELECT * FROM (
    SELECT * FROM t WHERE pk > 0
) u
WHERE u.pk = 'b';

do not hit the hive.max-partitions-per-scan limit.

@martint martint added the WIP label Jun 13, 2019
@martint martint force-pushed the hive-pruning branch 3 times, most recently from b3d70fe to 7615b47 Compare June 14, 2019 01:29
@martint martint removed the WIP label Jun 14, 2019
martint added 2 commits June 14, 2019 00:12
Execute the rules after all the predicate pushdowns, decorrelations and
other simplifications have executed to avoid calling into the connectors
multiple times if possible (these actions can be expensive for some connectors
such as Hive).
NONE means no rows would be produced, which is incorrect.
The previous implementation was only considering the first
attempt where a filter is pushed down into the Hive connector.
As a result, for a query like this, the partition filter above
the bottommost filter would be ignored:

    SELECT * FROM (
        SELECT * FROM t WHERE a in (1, 2)
    ) u
    WHERE u.pk = 'b';
@electrum electrum closed this Jun 14, 2019
@electrum
Copy link
Member

Merged as 74b7a79

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants