-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](nereids)Solve the problem of pruning wrong partitions in multi-column partition pruning #43332
[fix](nereids)Solve the problem of pruning wrong partitions in multi-column partition pruning #43332
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
1 similar comment
run buildall |
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
run buildall |
run p0 |
1 similar comment
run p0 |
85f09cd
to
cf9dd4c
Compare
run buildall |
PR approved by at least one committer and no changes requested. |
…column partition pruning (#43332) For example, with a partition defined as PARTITION BY RANGE (a, dt) [(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')). With the predicate: WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00', partition pruning will expand the partition ranges to: a = 0, dt in ['2024-01-01 00:00:00', +∞) a = 1, dt in (-∞, +∞) a = 2, dt in (-∞, +∞) ... a = 10, dt in (-∞, '2024-01-10 00:00:00') Each of these eleven ranges will be evaluated against the predicate. If all evaluations return False, the partition can be pruned. During the evaluation of the first range (a = 0, dt in ['2024-01-01 00:00:00', +∞)), the range of date_trunc(dt, 'day') is calculated as ['2024-01-01', +∞) and stored in rangeMap. However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞) reuse this range ['2024-01-01', +∞), which is incorrect. For a = 2, the correct range should be (-∞, +∞) for date_trunc(dt, 'day'). Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will incorrectly evaluate to False, causing improper pruning of the partition. The correct approach is to place rangeMap within the context, so that a new rangeMap is constructed for each evaluation.
…column partition pruning (#43332) For example, with a partition defined as PARTITION BY RANGE (a, dt) [(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')). With the predicate: WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00', partition pruning will expand the partition ranges to: a = 0, dt in ['2024-01-01 00:00:00', +∞) a = 1, dt in (-∞, +∞) a = 2, dt in (-∞, +∞) ... a = 10, dt in (-∞, '2024-01-10 00:00:00') Each of these eleven ranges will be evaluated against the predicate. If all evaluations return False, the partition can be pruned. During the evaluation of the first range (a = 0, dt in ['2024-01-01 00:00:00', +∞)), the range of date_trunc(dt, 'day') is calculated as ['2024-01-01', +∞) and stored in rangeMap. However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞) reuse this range ['2024-01-01', +∞), which is incorrect. For a = 2, the correct range should be (-∞, +∞) for date_trunc(dt, 'day'). Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will incorrectly evaluate to False, causing improper pruning of the partition. The correct approach is to place rangeMap within the context, so that a new rangeMap is constructed for each evaluation.
…column partition pruning (apache#43332) For example, with a partition defined as PARTITION BY RANGE (a, dt) [(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')). With the predicate: WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00', partition pruning will expand the partition ranges to: a = 0, dt in ['2024-01-01 00:00:00', +∞) a = 1, dt in (-∞, +∞) a = 2, dt in (-∞, +∞) ... a = 10, dt in (-∞, '2024-01-10 00:00:00') Each of these eleven ranges will be evaluated against the predicate. If all evaluations return False, the partition can be pruned. During the evaluation of the first range (a = 0, dt in ['2024-01-01 00:00:00', +∞)), the range of date_trunc(dt, 'day') is calculated as ['2024-01-01', +∞) and stored in rangeMap. However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞) reuse this range ['2024-01-01', +∞), which is incorrect. For a = 2, the correct range should be (-∞, +∞) for date_trunc(dt, 'day'). Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will incorrectly evaluate to False, causing improper pruning of the partition. The correct approach is to place rangeMap within the context, so that a new rangeMap is constructed for each evaluation.
…column partition pruning (apache#43332) For example, with a partition defined as PARTITION BY RANGE (a, dt) [(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')). With the predicate: WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00', partition pruning will expand the partition ranges to: a = 0, dt in ['2024-01-01 00:00:00', +∞) a = 1, dt in (-∞, +∞) a = 2, dt in (-∞, +∞) ... a = 10, dt in (-∞, '2024-01-10 00:00:00') Each of these eleven ranges will be evaluated against the predicate. If all evaluations return False, the partition can be pruned. During the evaluation of the first range (a = 0, dt in ['2024-01-01 00:00:00', +∞)), the range of date_trunc(dt, 'day') is calculated as ['2024-01-01', +∞) and stored in rangeMap. However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞) reuse this range ['2024-01-01', +∞), which is incorrect. For a = 2, the correct range should be (-∞, +∞) for date_trunc(dt, 'day'). Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will incorrectly evaluate to False, causing improper pruning of the partition. The correct approach is to place rangeMap within the context, so that a new rangeMap is constructed for each evaluation.
…column partition pruning (apache#43332) For example, with a partition defined as PARTITION BY RANGE (a, dt) [(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')). With the predicate: WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00', partition pruning will expand the partition ranges to: a = 0, dt in ['2024-01-01 00:00:00', +∞) a = 1, dt in (-∞, +∞) a = 2, dt in (-∞, +∞) ... a = 10, dt in (-∞, '2024-01-10 00:00:00') Each of these eleven ranges will be evaluated against the predicate. If all evaluations return False, the partition can be pruned. During the evaluation of the first range (a = 0, dt in ['2024-01-01 00:00:00', +∞)), the range of date_trunc(dt, 'day') is calculated as ['2024-01-01', +∞) and stored in rangeMap. However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞) reuse this range ['2024-01-01', +∞), which is incorrect. For a = 2, the correct range should be (-∞, +∞) for date_trunc(dt, 'day'). Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will incorrectly evaluate to False, causing improper pruning of the partition. The correct approach is to place rangeMap within the context, so that a new rangeMap is constructed for each evaluation.
…ns in multi-column partition pruning (#43658) Cherry-picked from #43332 Co-authored-by: feiniaofeiafei <[email protected]>
…column partition pruning (apache#43332) For example, with a partition defined as PARTITION BY RANGE (a, dt) [(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')). With the predicate: WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00', partition pruning will expand the partition ranges to: a = 0, dt in ['2024-01-01 00:00:00', +∞) a = 1, dt in (-∞, +∞) a = 2, dt in (-∞, +∞) ... a = 10, dt in (-∞, '2024-01-10 00:00:00') Each of these eleven ranges will be evaluated against the predicate. If all evaluations return False, the partition can be pruned. During the evaluation of the first range (a = 0, dt in ['2024-01-01 00:00:00', +∞)), the range of date_trunc(dt, 'day') is calculated as ['2024-01-01', +∞) and stored in rangeMap. However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞) reuse this range ['2024-01-01', +∞), which is incorrect. For a = 2, the correct range should be (-∞, +∞) for date_trunc(dt, 'day'). Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will incorrectly evaluate to False, causing improper pruning of the partition. The correct approach is to place rangeMap within the context, so that a new rangeMap is constructed for each evaluation.
What problem does this PR solve?
For example, with a partition defined as PARTITION BY RANGE (a, dt) [(0, '2024-01-01 00:00:00'), (10, '2024-01-10 00:00:00')). With the predicate WHERE a = 0 AND date_trunc(dt, 'day') <= '2024-01-10 00:00:00', partition pruning will expand the partition ranges to:
a = 0, dt in ['2024-01-01 00:00:00', +∞)
a = 1, dt in (-∞, +∞)
a = 2, dt in (-∞, +∞)
...
a = 10, dt in (-∞, '2024-01-10 00:00:00')
Each of these eleven ranges will be evaluated against the predicate. If all evaluations return False, the partition can be pruned.
During the evaluation of the first range (a = 0, dt in ['2024-01-01 00:00:00', +∞)), the range of date_trunc(dt, 'day') is calculated as ['2024-01-01', +∞) and stored in rangeMap. However, subsequent evaluations (e.g., for a = 2, dt in (-∞, +∞)) reuse this range ['2024-01-01', +∞), which is incorrect. For a = 2, the correct range should be (-∞, +∞) for date_trunc(dt, 'day').
Due to this incorrect reuse, the range for a = 2, dt in (-∞, +∞) will incorrectly evaluate to False, causing improper pruning of the partition.
The correct approach is to place rangeMap within the context, so that a new rangeMap is constructed for each evaluation.
Issue Number: close #xxx
Related PR: introduced by #38849
Problem Summary:
Check List (For Committer)
Test
Behavior changed:
Does this need documentation?
Release note
None
Check List (For Reviewer who merge this PR)