Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UT][BugFix] fix PullUpScanPredicateRule (backport #53740) #53838

Closed
wants to merge 1 commit into from

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Dec 11, 2024

Why I'm doing:

What I'm doing:

Fixes https://github.com/StarRocks/StarRocksTest/issues/8896

Fix some bugs in PullUpScanPredicateRule:

  1. Limit is not handled correctly.

when there is a limit on ScanOperator, we need to extract the limit from ScanOperator to FilterOperator

  1. semi-structured data is not handled correctly, it will cause the optimization of subfield column pruning to fail.

for this problem, we cannot directly give up extracting related expressions from scan predicates, otherwise we will lose many opportunities to reuse expressions.
my solution: after extracting the reserved predicate in FilterOperator, we also need to collect the expressions that can be used for subfield column pruning, then add them to the scan projection and replace them with column ref in the final predicate.

taking this query as an example, before fixing, we need read the whole json column since json columns are in project node.

mysql> desc t1;
+----------------------+------+------+-------+---------+-------+
| Field                | Type | Null | Key   | Default | Extra |
+----------------------+------+------+-------+---------+-------+
| k1                   | int  | YES  | true  | NULL    |       |
| no_match_flat_json   | json | YES  | false | NULL    |       |
| one_layer_flat_json  | json | YES  | false | NULL    |       |
| many_layer_flat_json | json | YES  | false | NULL    |       |
+----------------------+------+------+-------+---------+-------+
4 rows in set (0.00 sec)
mysql> explain select k1 from t1 where no_match_flat_json->'$.k9.k0.k3' = one_layer_flat_json->'$.k5';
+---------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                |
+---------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                               |
|  OUTPUT EXPRS:1: k1                                                                                           |
|   PARTITION: UNPARTITIONED                                                                                    |
|                                                                                                               |
|   RESULT SINK                                                                                                 |
|                                                                                                               |
|   3:EXCHANGE                                                                                                  |
|                                                                                                               |
| PLAN FRAGMENT 1                                                                                               |
|  OUTPUT EXPRS:                                                                                                |
|   PARTITION: RANDOM                                                                                           |
|                                                                                                               |
|   STREAM DATA SINK                                                                                            |
|     EXCHANGE ID: 03                                                                                           |
|     UNPARTITIONED                                                                                             |
|                                                                                                               |
|   2:SELECT                                                                                                    |
|   |  predicates: json_query(2: no_match_flat_json, '$.k9.k0.k3') = json_query(3: one_layer_flat_json, '$.k5') |
|   |                                                                                                           |
|   1:Project                                                                                                   |
|   |  <slot 1> : 1: k1                                                                                         |
|   |  <slot 2> : 2: no_match_flat_json                                                                         |
|   |  <slot 3> : 3: one_layer_flat_json                                                                        |
|   |                                                                                                           |
|   0:OlapScanNode                                                                                              |
|      TABLE: t1                                                                                                |
|      PREAGGREGATION: ON                                                                                       |
|      partitions=1/4                                                                                           |
|      rollup: t1                                                                                               |
|      tabletRatio=2/2                                                                                          |
|      tabletList=48051,48053                                                                                   |
|      cardinality=7                                                                                            |
|      avgRowSize=2052.0                                                                                        |
+---------------------------------------------------------------------------------------------------------------+
33 rows in set (0.01 sec)

after fixing, only json_query(xx) in project node, we don't need read the whole column

mysql> explain verbose select k1 from t1 where no_match_flat_json->'$.k9.k0.k3' = one_layer_flat_json->'$.k5';
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                           |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| RESOURCE GROUP: default_wg                                                                                                                               |
|                                                                                                                                                          |
| PLAN COST                                                                                                                                                |
|   CPU: 20500.0                                                                                                                                           |
|   Memory: 0.0                                                                                                                                            |
|                                                                                                                                                          |
| PLAN FRAGMENT 0(F01)                                                                                                                                     |
|   Fragment Cost: 0.0                                                                                                                                     |
|   Output Exprs:1: k1                                                                                                                                     |
|   Input Partition: UNPARTITIONED                                                                                                                         |
|   RESULT SINK                                                                                                                                            |
|                                                                                                                                                          |
|   3:EXCHANGE                                                                                                                                             |
|      cardinality: 5                                                                                                                                      |
|                                                                                                                                                          |
| PLAN FRAGMENT 1(F00)                                                                                                                                     |
|   Fragment Cost: 10250.0                                                                                                                                 |
|                                                                                                                                                          |
|   Input Partition: RANDOM                                                                                                                                |
|   OutPut Partition: UNPARTITIONED                                                                                                                        |
|   OutPut Exchange Id: 03                                                                                                                                 |
|                                                                                                                                                          |
|   2:SELECT                                                                                                                                               |
|   |  predicates: 5: json_query = 6: json_query                                                                                                           |
|   |  cardinality: 5                                                                                                                                      |
|   |                                                                                                                                                      |
|   1:Project                                                                                                                                              |
|   |  output columns:                                                                                                                                     |
|   |  1 <-> [1: k1, INT, true]                                                                                                                            |
|   |  5 <-> json_query[([2: no_match_flat_json, JSON, true], '$.k9.k0.k3'); args: JSON,VARCHAR; result: JSON; args nullable: true; result nullable: true] |
|   |  6 <-> json_query[([3: one_layer_flat_json, JSON, true], '$.k5'); args: JSON,VARCHAR; result: JSON; args nullable: true; result nullable: true]      |
|   |  cardinality: 5                                                                                                                                      |
|   |                                                                                                                                                      |
|   0:OlapScanNode                                                                                                                                         |
|      table: t1, rollup: t1                                                                                                                               |
|      preAggregation: on                                                                                                                                  |
|      partitionsRatio=1/4, tabletsRatio=2/2                                                                                                               |
|      tabletList=48051,48053                                                                                                                              |
|      actualRows=7, avgRowSize=2054.0                                                                                                                     |
|      ColumnAccessPath: [/no_match_flat_json/k9/k0/k3(json), /one_layer_flat_json/k5(json)]                                                               |
|      cardinality: 5                                                                                                                                      |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
41 rows in set (0.01 sec)

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Signed-off-by: silverbullet233 <[email protected]>
(cherry picked from commit 8eff033)

# Conflicts:
#	fe/fe-core/src/main/java/com/starrocks/sql/optimizer/Optimizer.java
#	fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rewrite/TableScanPredicateExtractor.java
#	fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rule/transformation/PullUpScanPredicateRule.java
#	fe/fe-core/src/test/java/com/starrocks/sql/optimizer/ScanPredicateExprReuseTest.java
Copy link
Contributor Author

mergify bot commented Dec 11, 2024

Cherry-pick of 8eff033 has failed:

On branch mergify/bp/branch-3.3/pr-53740
Your branch is up to date with 'origin/branch-3.3'.

You are currently cherry-picking commit 8eff0335b3.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   test/sql/test_expr_reuese/R/test_scan_predicate_expr_reuse
	new file:   test/sql/test_expr_reuese/T/test_scan_predicate_expr_reuse

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
	both modified:   fe/fe-core/src/main/java/com/starrocks/sql/optimizer/Optimizer.java
	deleted by us:   fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rewrite/TableScanPredicateExtractor.java
	deleted by us:   fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rule/transformation/PullUpScanPredicateRule.java
	deleted by us:   fe/fe-core/src/test/java/com/starrocks/sql/optimizer/ScanPredicateExprReuseTest.java

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

Copy link
Contributor Author

mergify bot commented Dec 11, 2024

@mergify[bot]: Backport conflict, please reslove the conflict and resubmit the pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant