Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Optimizer] Eliminate the distinct #2045

Closed
jackwener opened this issue Mar 21, 2022 · 5 comments
Closed

[Optimizer] Eliminate the distinct #2045

jackwener opened this issue Mar 21, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@jackwener
Copy link
Member

jackwener commented Mar 21, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

explain verbose select max(distinct(c1)) from test;

+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type                                             | plan                                                                                                                                        |
+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+
| initial_logical_plan                                  | Projection: #MAX(DISTINCT test.c1)                                                                                                          |
|                                                       |   Aggregate: groupBy=[[]], aggr=[[MAX(DISTINCT #test.c1)]]                                                                                  |
|                                                       |     TableScan: test projection=None                                                                                                         |
| logical_plan after simplify_expressions               | SAME TEXT AS ABOVE                                                                                                                          |
| logical_plan after eliminate_filter                   | SAME TEXT AS ABOVE                                                                                                                          |
| logical_plan after common_sub_expression_eliminate    | SAME TEXT AS ABOVE                                                                                                                          |
| logical_plan after eliminate_limit                    | SAME TEXT AS ABOVE                                                                                                                          |
| logical_plan after projection_push_down               | Projection: #MAX(DISTINCT test.c1)                                                                                                          |
|                                                       |   Aggregate: groupBy=[[]], aggr=[[MAX(DISTINCT #test.c1)]]                                                                                  |
|                                                       |     TableScan: test projection=Some([0])                                                                                                    |
| logical_plan after filter_push_down                   | SAME TEXT AS ABOVE                                                                                                                          |
| logical_plan after limit_push_down                    | SAME TEXT AS ABOVE                                                                                                                          |
| logical_plan after SingleDistinctAggregationToGroupBy | Projection: #MAX(DISTINCT test.c1)                                                                                                          |
|                                                       |   Projection: #MAX(alias1) AS MAX(DISTINCT test.c1)                                                                                         |
|                                                       |     Aggregate: groupBy=[[]], aggr=[[MAX(#alias1)]]                                                                                          |
|                                                       |       Aggregate: groupBy=[[#test.c1 AS alias1]], aggr=[[]]                                                                                  |
|                                                       |         TableScan: test projection=Some([0]) 

Describe the solution you'd like
I think max/min don't need the rewrite, but need to eliminate the distinct.

Describe alternatives you've considered
SingleDistinctAggregationToGroupBy rule make the plan more complex.

Additional context
None

@jackwener jackwener added the enhancement New feature or request label Mar 21, 2022
@jackwener
Copy link
Member Author

I am investigating this issue. If you know some information about it, you can provide it.

@jiangzhx
Copy link
Contributor

@jackwener check this issues #1315

@ic4y contribute this pr.

@jiangzhx
Copy link
Contributor

jiangzhx commented Mar 21, 2022

talk with @jackwener and @ic4y ;
look like max and min function can not use this optimizer.
should add more test case

@jackwener
Copy link
Member Author

During I rewrite the plan for eliminating the distinct, there are some problem I can't figure out. Need to wait to I learn more and then try again.

@jackwener
Copy link
Member Author

jackwener commented Mar 23, 2022

Now the target is rewrite from

| initial_logical_plan | Projection: #Max(DISTINCT test.c1)                     
|                      |   Aggregate: groupBy=[[]], aggr=[[Max(DISTINCT #test.c1)]]                      
|                      |     TableScan: test projection=None

to

| initial_logical_plan | Projection: #Max(DISTINCT test.c1)
|                      |   Projection: #Max(#test.c1) AS Max(DISTINCT test.c1)                   
|                      |     Aggregate: groupBy=[[]], aggr=[[Max(#test.c1)]]                
|                      |       TableScan: test projection=None

instead of

| logical_plan  | Projection: #Max(DISTINCT test.c1)          
|               |   Projection: #Max(alias1) AS Max(DISTINCT test.c1)     
|               |     Aggregate: groupBy=[[]], aggr=[[Max(#alias1)]]        
|               |       Aggregate: groupBy=[[#test.c1 AS alias1]], aggr=[[]]                        
|               |       Aggregate: groupBy=[[#test.c1 AS alias1]], aggr=[[]]  
|               |         TableScan: test projection=Some([0])

@jackwener jackwener changed the title SingleDistinctAggregationToGroupBy rule cause some strange change [Optimizer] Eliminate the distinct Apr 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants