-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizer does not propagate col is not null
for join
#8587
Comments
(I've updated the description. My test case was slightly wrong, and I misunderstood why it was not working. It looks like the bug is different from MySQL, and the issue is constant propagation. A = B and B IS NOT NULL, should infer A IS NOT NULL in an inner join). |
a simple case to reproduce this issue: TiDB(root@127.0.0.1:test) > create table t1(a bigint, b bigint);
Query OK, 0 rows affected (0.01 sec)
TiDB(root@127.0.0.1:test) > create table t2(a bigint, b bigint);
Query OK, 0 rows affected (0.02 sec)
TiDB(root@127.0.0.1:test) > desc select * from t1 inner join t2 on t1.a=t2.a and t1.a is not null;
+-------------------------+----------+------+--------------------------------------------------------------------+
| id | count | task | operator info |
+-------------------------+----------+------+--------------------------------------------------------------------+
| HashRightJoin_7 | 12487.50 | root | inner join, inner:TableReader_10, equal:[eq(test.t1.a, test.t2.a)] |
| ├─TableReader_10 | 9990.00 | root | data:Selection_9 |
| │ └─Selection_9 | 9990.00 | cop | not(isnull(test.t1.a)) |
| │ └─TableScan_8 | 10000.00 | cop | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo |
| └─TableReader_12 | 10000.00 | root | data:TableScan_11 |
| └─TableScan_11 | 10000.00 | cop | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo |
+-------------------------+----------+------+--------------------------------------------------------------------+
6 rows in set (0.00 sec) As you can see, the filter |
I will take a look. |
RCA: in order to not derive mistaken filter We have 2 alternatives to solve this:
Pros of first alternative: we can remove I prefer second alternative, because it applies to outer join also. |
The first approach sounds more-similar to MySQL. In |
If we want to apply first approach to outer join, we need this adjustment: only derive t1 left join t2 on t1.a = t2.a logically equals to: t1 left join t2 on t1.a = t2.a and t2.a is not null Then only one problem left for first approach:
For example: t1 join t2 on t1.a = t2.a and t1.a is not null or this star shape join: select * from t1, t2, t3, t4, t5 where t1.a = t2.a and t1.a = t3.a and t1.a = t4.a and t1.a = t5.a it is because we don't have general expression simplification or de-duplication now. |
Seems https://github.com/pingcap/tidb/blob/master/expression/constraint_propagation.go can be used for de-duplication, I will give it a try and see whether it works. |
In Spark, each plan has a member variable named |
Sounds interesting, could you post the code link here so we can have a deeper look and discussing? |
col is not null
for join
Bug Report
Please answer these questions before submitting your issue. Thanks!
(Edit: Updated description; it looks like the bug is constant propagation, not consideration of attached conditions.)
Similar problem to: https://bugs.mysql.com/bug.php?id=93491
An inner join on a.b_id = b.id semantically means that neither side can be NULL. The optimizer seems to understand this during execution:
i.e. the range starting at
-inf
shows nulls are not valid.But it doesn't look like the
b.id IS NOT NULL
is propagated back toa.b_id is NOT NULL
. If I do this, I get a much better plan and a change fromMergeJoin
toIndexJoin
:(Answered above.) Full test case:
tidb-server -V
or runselect tidb_version();
on TiDB)?The text was updated successfully, but these errors were encountered: