Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant table scan predicate during predicate pushdown #1516

Merged
merged 7 commits into from
Oct 2, 2019

Conversation

sopel39
Copy link
Member

@sopel39 sopel39 commented Sep 13, 2019

depends on: #1515

@sopel39 sopel39 requested a review from martint September 13, 2019 12:38
@cla-bot cla-bot bot added the cla-signed label Sep 13, 2019

// table scans with none domain should be converted to ValuesNode
checkState(node.getEnforcedConstraint().getDomains().isPresent());
Map<ColumnHandle, Domain> enforcedColumnDomains = node.getEnforcedConstraint().getDomains().get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO similar to the one in EffectivePredicateExtractor: // TODO: replace with metadata.getTableProperties() when table layouts are fully removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm planning to restore using of metadata.getTableProperties() in predicate pushdown.
Previously, Metadata#getTableProperties was called in PushPredicateIntoTableScan#pushFilterIntoTableScan and people didn't complain.
So I think that predate pushdown should use enforced predicate when called before PushPredicateIntoTableScan optimization and Metadata#getTableProperties after PushPredicateIntoTableScan optimization

if (predicateDomain.getDomains().isPresent()) {
Map<ColumnHandle, Domain> predicateColumnDomains = predicateDomain.getDomains().get();

// table scans with none domain should be converted to ValuesNode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there's no guarantee this will happen before this optimizer runs, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only PushPredicateIntoTableScan#pushFilterIntoTableScan can narrow down enforced predicate (and it replaces table scan with values when predicate is none). Is this correct?

@@ -1329,9 +1341,58 @@ public PlanNode visitSample(SampleNode node, RewriteContext<Expression> context)
public PlanNode visitTableScan(TableScanNode node, RewriteContext<Expression> context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this in PredicatePushDown? It seems like something we could just add as a separate rule that matches Filter(TableScan) and simplifies the filter based on the "guaranteed predicate from the table scan"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's separate rule, then we need to fire such rule always after PredicatePushDown.
However, it should be easier to test all edge cases.

@sopel39 sopel39 force-pushed the ks/remove_redundant_pred branch from ea7fb8d to bb2b3bb Compare September 25, 2019 13:28
@sopel39
Copy link
Member Author

sopel39 commented Sep 25, 2019

ac @martint

@sopel39 sopel39 force-pushed the ks/remove_redundant_pred branch 2 times, most recently from eaa32a1 to e21563a Compare September 25, 2019 21:35
Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. Also, I'd squash "Extract PushPredicateIntoTableScan#createResultingPredicate" and "Expose PushPredicateIntoTableScan#arePlansSame" into "Remove redundant table scan predicate during predicate pushdown", since they are in support of that change and don't really "stand out" on their own.

context.getSymbolAllocator().getTypes(),
context.getIdAllocator());

if (arePlansSame(filterNode, tableScan, rewritten)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overkill. You could just check whether the new filter is the same as the one in filterNode.

Map<ColumnHandle, Domain> enforcedColumnDomains = node.getEnforcedConstraint().getDomains().get();

ImmutableMap.Builder<ColumnHandle, Domain> unenforcedColumnDomains = ImmutableMap.builder();
for (ColumnHandle columnHandle : predicateColumnDomains.keySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterate over the entrySet(), since it's immediately calling .get(...) to get the associated value.

Map<ColumnHandle, Domain> enforcedColumnDomains = node.getEnforcedConstraint().getDomains().get();

ImmutableMap.Builder<ColumnHandle, Domain> unenforcedColumnDomains = ImmutableMap.builder();
for (ColumnHandle columnHandle : predicateColumnDomains.keySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm misreading this, if the table scan domain for the column is ALL, this will do:

intersect(ALL, f) -> f
ALL.contains(f) -> true

Which will result in f being dropped.

I think what you want is intersect(F, P) == P? => drop F (where F is the filter, P is the table scan predicate). See below:

IMG_7915

Putting it all together:

for (Map.Entry<ColumnHandle, Domain> entry : predicateColumnDomains.entrySet()) {
    ColumnHandle column = entry.getKey();
    Domain filter = entry.getValue();
    Domain columnPredicate = enforcedColumnDomains.getOrDefault(column, Domain.all(filter.getType()));

    if (!filter.intersect(columnPredicate).equals(columnPredicate)) {
        unenforcedColumnDomains.put(column, filter);
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed there was a bug there. I've adjusted the code and added test.

In your snipped you don't intersect filter with enforcedColumnDomain.

@sopel39 sopel39 force-pushed the ks/remove_redundant_pred branch from e21563a to 32dcec6 Compare September 30, 2019 11:16
Copy link
Member Author

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ac & added some tests

Map<ColumnHandle, Domain> enforcedColumnDomains = node.getEnforcedConstraint().getDomains().get();

ImmutableMap.Builder<ColumnHandle, Domain> unenforcedColumnDomains = ImmutableMap.builder();
for (ColumnHandle columnHandle : predicateColumnDomains.keySet()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed there was a bug there. I've adjusted the code and added test.

In your snipped you don't intersect filter with enforcedColumnDomain.

@sopel39
Copy link
Member Author

sopel39 commented Sep 30, 2019

Also, I'd squash "Extract PushPredicateIntoTableScan#createResultingPredicate"

I usually try to keep mechanical changes separate from logic changes so that reviewer has easier job tracking complicated changes

@sopel39 sopel39 merged commit 6e3f789 into trinodb:master Oct 2, 2019
@sopel39 sopel39 deleted the ks/remove_redundant_pred branch October 2, 2019 15:26
@sopel39 sopel39 mentioned this pull request Oct 2, 2019
6 tasks
@martint martint added this to the 320 milestone Oct 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants