-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the rest of Set Operators: INTERSECT, EXCEPT, etc #1082
Comments
Please assign it to me, thanks! @houqp |
First of all, I'll implement such as SELECT a1 FROM t1 INTERSECT SELECT a2 FROM t2;
to
SELECT DISTINCT a1 FROM t1 LEFT SEMI JOIN t2 ON a1 is not distinct from a2 How do you think about it? @houqp @alamb @Dandandan If it's ok, I think I should implement |
I am not sure Trying it out with postgres: alamb=# select * from foo;
x
---
1
2
(3 rows)
alamb=# select * from bar;
x
---
2
(2 rows)
alamb=# select * from foo intersect select * from bar;
x
---
2
(2 rows) And if alamb=# select * from foo;
x
---
1
2
(3 rows)
alamb=# select * from bar;
x
---
2
(1 row)
alamb=# select * from foo intersect select * from bar;
x
---
2
(1 row) Which I believe is the same semantics as semi-join using an equality predicate (though now that I type this, perhaps since |
Yes. postgres=# select * from bar;
a | b
---+---
1 | 2
| 3
3 | 4
(3 rows)
postgres=# select * from foo;
a | b
---+---
1 | 2
| 3
(2 rows)
-- intersect
select a from foo intersect select a from bar;
/**
a
---
1
(2 rows)
**/
-- equivalent transformation ? (x) since NULL = NULL is not true
select distinct a from foo where a in (select a from bar);
/**
a
---
1
(1 row)
**/ psql doesn't seem to have direct |
Creating / using |
I'm still surveying set operators in cockroachdb. |
We also don't have the semi / anti join SQL syntax, but we support semi and anti join in a I didn't have a look at using select distinct + semi join for I think supporting the |
I remember other systems that basically had special equality checking in semi joins to handle the null stuff -- in fact when using semi join for |
It looks like PostgreSQL has some explicit operator ( > explain select * from demo intersect select * from demo;
HashSetOp Intersect (cost=0.00..30.40 rows=160 width=458)
-> Append (cost=0.00..28.00 rows=320 width=458)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..13.20 rows=160 width=458)
-> Seq Scan on demo (cost=0.00..11.60 rows=160 width=454)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..13.20 rows=160 width=458)
-> Seq Scan on demo demo_1 (cost=0.00..11.60 rows=160 width=454) |
Yes, I have seen the pg code which uses the
|
I can see the potential benefit of implementing some special physical operator to implement INTERSECT / EXCEPT -- it might be potentially faster than the general purpose join operator However, that being said, using a general purpose SEMI join as you have suggested @xudong963 would definitely be my preference to keep the DF code base simpler, unless there is some compelling performance measurement / need for a faster implementation |
@alamb Me too. I also saw the implementation of TIDB which just transfers it to SEMI Join. func (b *PlanBuilder) buildIntersect(ctx context.Context, selects []ast.Node) (LogicalPlan, *ast.SetOprType, error) {
var leftPlan LogicalPlan
var err error
var afterSetOperator *ast.SetOprType
switch x := selects[0].(type) {
case *ast.SelectStmt:
afterSetOperator = x.AfterSetOperator
leftPlan, err = b.buildSelect(ctx, x)
case *ast.SetOprSelectList:
afterSetOperator = x.AfterSetOperator
leftPlan, err = b.buildSetOpr(ctx, &ast.SetOprStmt{SelectList: x})
}
if err != nil {
return nil, nil, err
}
if len(selects) == 1 {
return leftPlan, afterSetOperator, nil
}
columnNums := leftPlan.Schema().Len()
for i := 1; i < len(selects); i++ {
var rightPlan LogicalPlan
switch x := selects[i].(type) {
case *ast.SelectStmt:
if *x.AfterSetOperator == ast.IntersectAll {
// TODO: support intersect all
return nil, nil, errors.Errorf("TiDB do not support intersect all")
}
rightPlan, err = b.buildSelect(ctx, x)
case *ast.SetOprSelectList:
if *x.AfterSetOperator == ast.IntersectAll {
// TODO: support intersect all
return nil, nil, errors.Errorf("TiDB do not support intersect all")
}
rightPlan, err = b.buildSetOpr(ctx, &ast.SetOprStmt{SelectList: x})
}
if err != nil {
return nil, nil, err
}
if rightPlan.Schema().Len() != columnNums {
return nil, nil, ErrWrongNumberOfColumnsInSelect.GenWithStackByArgs()
}
leftPlan, err = b.buildSemiJoinForSetOperator(leftPlan, rightPlan, SemiJoin)
if err != nil {
return nil, nil, err
}
}
return leftPlan, afterSetOperator, nil
} A nice weekend is coming, It's time for me to concentrate on Datafusion. PR about the |
@Dandandan has made #1117 to add |
The issue needs to reopen, there are other things to do, such as |
Reopening as github API got a little too excited |
All related PRs have finished, after merging, the issue can be closed. Thanks again for your help! @alamb @Dandandan @houqp |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: