Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](nereids)Solve the problem of pruning wrong partitions in multi-column partition pruning #43332

Conversation

feiniaofeiafei
Copy link
Contributor

@feiniaofeiafei feiniaofeiafei commented Nov 6, 2024

What problem does this PR solve?

For example, with a partition defined as PARTITION BY RANGE (a, dt) [(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')). With the predicate WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00', partition pruning will expand the partition ranges to:
a = 0, dt in ['2024-01-01 00:00:00', +∞)
a = 1, dt in (-∞, +∞)
a = 2, dt in (-∞, +∞)
...
a = 10, dt in (-∞, '2024-01-10 00:00:00')
Each of these eleven ranges will be evaluated against the predicate. If all evaluations return False, the partition can be pruned.
During the evaluation of the first range (a = 0, dt in ['2024-01-01 00:00:00', +∞)), the range of date_trunc(dt, 'day') is calculated as ['2024-01-01', +∞) and stored in rangeMap. However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞)) reuse this range ['2024-01-01', +∞), which is incorrect. For a = 2, the correct range should be (-∞, +∞) for date_trunc(dt, 'day').
Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will incorrectly evaluate to False, causing improper pruning of the partition.
The correct approach is to place rangeMap within the context, so that a new rangeMap is constructed for each evaluation.

Issue Number: close #xxx

Related PR: introduced by #38849

Problem Summary:

Check List (For Committer)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No colde files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.
  • Release note

    None

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@feiniaofeiafei
Copy link
Contributor Author

run buildall

1 similar comment
@feiniaofeiafei
Copy link
Contributor Author

run buildall

morrySnow
morrySnow previously approved these changes Nov 7, 2024
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 7, 2024
Copy link
Contributor

github-actions bot commented Nov 7, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Nov 7, 2024

PR approved by anyone and no changes requested.

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 7, 2024
@morrySnow morrySnow added the p0_b label Nov 7, 2024
@morrySnow
Copy link
Contributor

run p0

1 similar comment
@feiniaofeiafei
Copy link
Contributor Author

run p0

@feiniaofeiafei feiniaofeiafei force-pushed the fix_multi_column_partition_prune_with_func branch from 85f09cd to cf9dd4c Compare November 11, 2024 02:14
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 11, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morrySnow morrySnow merged commit 1294bbb into apache:master Nov 12, 2024
27 of 28 checks passed
github-actions bot pushed a commit that referenced this pull request Nov 12, 2024
…column partition pruning (#43332)

For example, with a partition defined as PARTITION BY RANGE (a, dt)
[(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')).
With the predicate:
WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00',

partition pruning will expand the partition ranges to:

a = 0, dt in ['2024-01-01 00:00:00', +∞)
a = 1, dt in (-∞, +∞)
a = 2, dt in (-∞, +∞)
...
a = 10, dt in (-∞, '2024-01-10 00:00:00')

Each of these eleven ranges will be evaluated against the predicate. If
all evaluations return False, the partition can be pruned.
During the evaluation of the first range
(a = 0, dt in ['2024-01-01 00:00:00', +∞)),
the range of date_trunc(dt, 'day') is calculated as
['2024-01-01', +∞) and stored in rangeMap.

However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞)
 reuse this range ['2024-01-01', +∞),
which is incorrect. For a = 2, the correct range should be
(-∞, +∞) for date_trunc(dt, 'day').

Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will
incorrectly evaluate to False, causing improper pruning of the
partition.
The correct approach is to place rangeMap within the context, so that a
new rangeMap is constructed for each evaluation.
github-actions bot pushed a commit that referenced this pull request Nov 12, 2024
…column partition pruning (#43332)

For example, with a partition defined as PARTITION BY RANGE (a, dt)
[(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')).
With the predicate:
WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00',

partition pruning will expand the partition ranges to:

a = 0, dt in ['2024-01-01 00:00:00', +∞)
a = 1, dt in (-∞, +∞)
a = 2, dt in (-∞, +∞)
...
a = 10, dt in (-∞, '2024-01-10 00:00:00')

Each of these eleven ranges will be evaluated against the predicate. If
all evaluations return False, the partition can be pruned.
During the evaluation of the first range
(a = 0, dt in ['2024-01-01 00:00:00', +∞)),
the range of date_trunc(dt, 'day') is calculated as
['2024-01-01', +∞) and stored in rangeMap.

However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞)
 reuse this range ['2024-01-01', +∞),
which is incorrect. For a = 2, the correct range should be
(-∞, +∞) for date_trunc(dt, 'day').

Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will
incorrectly evaluate to False, causing improper pruning of the
partition.
The correct approach is to place rangeMap within the context, so that a
new rangeMap is constructed for each evaluation.
feiniaofeiafei added a commit to feiniaofeiafei/doris that referenced this pull request Nov 12, 2024
…column partition pruning (apache#43332)

For example, with a partition defined as PARTITION BY RANGE (a, dt)
[(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')).
With the predicate:
WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00',

partition pruning will expand the partition ranges to:

a = 0, dt in ['2024-01-01 00:00:00', +∞)
a = 1, dt in (-∞, +∞)
a = 2, dt in (-∞, +∞)
...
a = 10, dt in (-∞, '2024-01-10 00:00:00')

Each of these eleven ranges will be evaluated against the predicate. If
all evaluations return False, the partition can be pruned.
During the evaluation of the first range
(a = 0, dt in ['2024-01-01 00:00:00', +∞)),
the range of date_trunc(dt, 'day') is calculated as
['2024-01-01', +∞) and stored in rangeMap.

However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞)
 reuse this range ['2024-01-01', +∞),
which is incorrect. For a = 2, the correct range should be
(-∞, +∞) for date_trunc(dt, 'day').

Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will
incorrectly evaluate to False, causing improper pruning of the
partition.
The correct approach is to place rangeMap within the context, so that a
new rangeMap is constructed for each evaluation.
feiniaofeiafei added a commit to feiniaofeiafei/doris that referenced this pull request Nov 12, 2024
…column partition pruning (apache#43332)

For example, with a partition defined as PARTITION BY RANGE (a, dt)
[(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')).
With the predicate:
WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00',

partition pruning will expand the partition ranges to:

a = 0, dt in ['2024-01-01 00:00:00', +∞)
a = 1, dt in (-∞, +∞)
a = 2, dt in (-∞, +∞)
...
a = 10, dt in (-∞, '2024-01-10 00:00:00')

Each of these eleven ranges will be evaluated against the predicate. If
all evaluations return False, the partition can be pruned.
During the evaluation of the first range
(a = 0, dt in ['2024-01-01 00:00:00', +∞)),
the range of date_trunc(dt, 'day') is calculated as
['2024-01-01', +∞) and stored in rangeMap.

However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞)
 reuse this range ['2024-01-01', +∞),
which is incorrect. For a = 2, the correct range should be
(-∞, +∞) for date_trunc(dt, 'day').

Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will
incorrectly evaluate to False, causing improper pruning of the
partition.
The correct approach is to place rangeMap within the context, so that a
new rangeMap is constructed for each evaluation.
zzzxl1993 pushed a commit to zzzxl1993/doris that referenced this pull request Nov 12, 2024
…column partition pruning (apache#43332)

For example, with a partition defined as PARTITION BY RANGE (a, dt)
[(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')).
With the predicate:
WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00',

partition pruning will expand the partition ranges to:

a = 0, dt in ['2024-01-01 00:00:00', +∞)
a = 1, dt in (-∞, +∞)
a = 2, dt in (-∞, +∞)
...
a = 10, dt in (-∞, '2024-01-10 00:00:00')

Each of these eleven ranges will be evaluated against the predicate. If
all evaluations return False, the partition can be pruned.
During the evaluation of the first range
(a = 0, dt in ['2024-01-01 00:00:00', +∞)),
the range of date_trunc(dt, 'day') is calculated as
['2024-01-01', +∞) and stored in rangeMap.

However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞)
 reuse this range ['2024-01-01', +∞),
which is incorrect. For a = 2, the correct range should be
(-∞, +∞) for date_trunc(dt, 'day').

Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will
incorrectly evaluate to False, causing improper pruning of the
partition.
The correct approach is to place rangeMap within the context, so that a
new rangeMap is constructed for each evaluation.
morrySnow pushed a commit that referenced this pull request Nov 12, 2024
…-column partition pruning (#43332) (#43664)

cherry-pick from master #43332
yiguolei pushed a commit that referenced this pull request Nov 12, 2024
…ns in multi-column partition pruning (#43658)

Cherry-picked from #43332

Co-authored-by: feiniaofeiafei <[email protected]>
py023 pushed a commit to py023/doris that referenced this pull request Nov 13, 2024
…column partition pruning (apache#43332)

For example, with a partition defined as PARTITION BY RANGE (a, dt)
[(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')).
With the predicate:
WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00',

partition pruning will expand the partition ranges to:

a = 0, dt in ['2024-01-01 00:00:00', +∞)
a = 1, dt in (-∞, +∞)
a = 2, dt in (-∞, +∞)
...
a = 10, dt in (-∞, '2024-01-10 00:00:00')

Each of these eleven ranges will be evaluated against the predicate. If
all evaluations return False, the partition can be pruned.
During the evaluation of the first range
(a = 0, dt in ['2024-01-01 00:00:00', +∞)),
the range of date_trunc(dt, 'day') is calculated as
['2024-01-01', +∞) and stored in rangeMap.

However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞)
 reuse this range ['2024-01-01', +∞),
which is incorrect. For a = 2, the correct range should be
(-∞, +∞) for date_trunc(dt, 'day').

Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will
incorrectly evaluate to False, causing improper pruning of the
partition.
The correct approach is to place rangeMap within the context, so that a
new rangeMap is constructed for each evaluation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.3-merged p0_b reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants