Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimate cost of inequality comparision filters #11518

Merged
merged 1 commit into from
Mar 31, 2022

Conversation

raunaqmorarka
Copy link
Member

@raunaqmorarka raunaqmorarka commented Mar 16, 2022

On top of #11469

Description

When there is some overlap between the ranges of the
operands of an inequality expression, we approximate
it's cost assuming uniform distribution of values
within each range and that all pairs of distinct values
from both ranges exist.

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

CBO

How would you describe this change to a non-technical end user or system administrator?

Improves query plans in the presence of inequality expressions

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Section
* Improve query plans in the presence of inequality expressions. ({issue}`11518`)

@raunaqmorarka
Copy link
Member Author

Copy link
Member

@lukasz-stec lukasz-stec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sopel39
Copy link
Member

sopel39 commented Mar 21, 2022

There are some regressions in partitioned tpcds (some lesser regressions in unpart tpcds). Why it's the case?

@raunaqmorarka
Copy link
Member Author

There are some regressions in partitioned tpcds (some lesser regressions in unpart tpcds). Why it's the case?

The only actual regression is in tpch/q04 partitioned (which we already knew about, it's due to bad NDV estimate), the rest is benchmark flakiness.

// left is always lesser than right
if (leftExpressionRange.getHigh() < rightExpressionRange.getLow()) {
PlanNodeStatsEstimate.Builder estimate = PlanNodeStatsEstimate.buildFrom(inputStatistics);
leftExpressionSymbol.ifPresent(symbol -> estimate.addSymbolStatistics(symbol, leftExpressionStatistics));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can it be empty?

@sopel39
Copy link
Member

sopel39 commented Mar 21, 2022

The only actual regression is in tpch/q04 partitioned (which we already knew about, it's due to bad NDV estimate), the rest is benchmark flakiness.

What about tpcds/q93, q95, 97, 78? Did plans change for these?

@raunaqmorarka
Copy link
Member Author

The only actual regression is in tpch/q04 partitioned (which we already knew about, it's due to bad NDV estimate), the rest is benchmark flakiness.

What about tpcds/q93, q95, 97, 78? Did plans change for these?

The inequality estimation code path is executed only for q72 in tpcds and there was no change to plan there.

@sopel39
Copy link
Member

sopel39 commented Mar 22, 2022

The inequality estimation code path is executed only for q72 in tpcds and there was no change to plan there.

You mean partitioned q72 plan?

@raunaqmorarka
Copy link
Member Author

Updated benchmarks
Inequality estimation 2 sf1000 orc partitioned.pdf
Inequality estimation 2 sf1000 orc unpartitioned.pdf

Ignore results from all the TPCDS queries other than q72 as the inequality estimation code path is encountered only in that TPCDS query.

@raunaqmorarka raunaqmorarka force-pushed the exp-ineq branch 2 times, most recently from 81aa639 to 1d2e45f Compare March 23, 2022 15:07
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@raunaqmorarka raunaqmorarka force-pushed the exp-ineq branch 3 times, most recently from 3851aad to 194059d Compare March 29, 2022 04:19
@raunaqmorarka raunaqmorarka requested a review from sopel39 March 29, 2022 07:07
When there is some overlap between the ranges of the
operands of an inequality expression, we approximate
it's cost assuming uniform distribution of values
within each range and that all pairs of distinct values
from both ranges exist.
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind automation (if it fails)

@sopel39 sopel39 merged commit fbe6079 into trinodb:master Mar 31, 2022
@sopel39 sopel39 mentioned this pull request Mar 31, 2022
@raunaqmorarka raunaqmorarka deleted the exp-ineq branch March 31, 2022 12:09
@github-actions github-actions bot added this to the 376 milestone Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants