Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: filter null value before join #16722

Merged
merged 8 commits into from
Nov 6, 2024

Conversation

xudong963
Copy link
Member

@xudong963 xudong963 commented Oct 29, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Many columns in tpcds contain a lot of NULL values.

The percent of NULL values in each non-primary key column varies from 4 to 100 percent based on the column. Most columns have 4 percent NULL values. The important rec_end_date columns have 50 percent NULL values.

When they're used as join keys for inner join, we can generate not null filters and push down them to scan, then reduce I/O cost and network cost.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@xudong963 xudong963 marked this pull request as draft October 29, 2024 09:01
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Oct 29, 2024
@xudong963 xudong963 force-pushed the filter_nulls_before_join branch 4 times, most recently from 7d92cf5 to e2a2f47 Compare October 31, 2024 03:08
@TCeason
Copy link
Collaborator

TCeason commented Oct 31, 2024

#16739 fix ci err

@xudong963 xudong963 marked this pull request as ready for review October 31, 2024 12:23
@xudong963 xudong963 force-pushed the filter_nulls_before_join branch 2 times, most recently from e988d34 to c50e8c0 Compare November 5, 2024 04:03
@xudong963 xudong963 force-pushed the filter_nulls_before_join branch 4 times, most recently from 510630c to edae83e Compare November 5, 2024 05:45
@Dousir9 Dousir9 added this pull request to the merge queue Nov 6, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 6, 2024
@xudong963 xudong963 added this pull request to the merge queue Nov 6, 2024
Merged via the queue into databendlabs:main with commit 4f987ba Nov 6, 2024
74 checks passed
@xudong963 xudong963 deleted the filter_nulls_before_join branch November 6, 2024 06:28
Dousir9 pushed a commit to dantengsky/fuse-query that referenced this pull request Nov 27, 2024
* feat: filter null value before join

* fix lint

* add annotations and process possible crash

* dedup filters and fix tests (also need to fix native explain test)

* fix test

* support semi join

* fix test for semi join

* adjust threshold and enable only distribution
dantengsky added a commit that referenced this pull request Nov 27, 2024
)

* feat: implement `is_not_null` selectivity based on null count in stats (#16730)

* feat: implement is_not_null selectivity based on null count in stats

* fix test

* chore(planner): improve cardinality estimation (#16938)

* chore(planner): improve cardinality estimation

* chore(planner): improve histogram cardinality estimation

* chore(planner): improve join cardinality

* chore(test): update sqllogictest

* chore(test): update sqllogictest

* chore(code): refine code

* chore(test): update sqllogictest

* chore(test): test ci tpch

* chore(code): fix typos

* chore(test): remove accurate_his test

* chore(test): fix sqllogictest

* chore(query): fix sub overflow

* chore(planner): refine scan histogram

* chore(test): update sqllogictest

* chore(test): update sqllogictest

* ci: fix flaky test  (#16945)

* ci: fix flaky test #16935

* ci: update error format of Bendsql.

* feat: filter null value before join (#16722)

* feat: filter null value before join

* fix lint

* add annotations and process possible crash

* dedup filters and fix tests (also need to fix native explain test)

* fix test

* support semi join

* fix test for semi join

* adjust threshold and enable only distribution

* chore(planner): resolve conflicts

* fix(query): support subquery in pivot (#16631)

* fix(query): support subquery in pivot

* add pivot and unpivot sqllogictests, fix unit-test

* code format

* chore(code): resolve conflicts

* chore(test): update sqllogictest

* chore(test): update sqllogictest

* Revert "ci: fix flaky test  (#16945)"

This reverts commit efcbac3.

* chore: add extra bracket for `and` and  `or` to make explain clear (#16494)

* fix: add extra bracket for and or

* add task test

* chore(test): update sqllogictest

* Revert "Revert "ci: fix flaky test  (#16945)""

This reverts commit 49ea151.

* fix(query): forbid explain explain statement (#16654)

fix(query): forbiden explain explain statement

* fix(ci): flaky test (#16933)

* flaky test

* fix

* fix test

* chore(code): resolve conflicts

* chore(test): update test

---------

Co-authored-by: xudong.w <[email protected]>
Co-authored-by: Jk Xu <[email protected]>
Co-authored-by: Yang Xiufeng <[email protected]>
Co-authored-by: Liu Zhenlong <[email protected]>
Co-authored-by: Dousir9 <[email protected]>
Co-authored-by: TCeason <[email protected]>
Co-authored-by: zhya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants