Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InList Expr: expr and list values must can be converted to a same data type #2759

Closed
Tracked by #2755
liukun4515 opened this issue Jun 22, 2022 · 3 comments
Closed
Tracked by #2755
Labels
bug Something isn't working

Comments

@liukun4515
Copy link
Contributor

liukun4515 commented Jun 22, 2022

Describe the bug
In current inList implemention, we don't consider the data type of the list value.

For example:

I create a table

❯ \d food
+---------------+--------------+------------+-------------+-----------------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type       | is_nullable |
+---------------+--------------+------------+-------------+-----------------+-------------+
| datafusion    | public       | food       | a           | Decimal(10, 5)  | NO          |
| datafusion    | public       | food       | b           | Decimal(20, 15) | NO          |
| datafusion    | public       | food       | c           | Boolean         | NO          |
+---------------+--------------+------------+-------------+-----------------+-------------+

with data

❯  select * from food;
+---------+-------------------+-------+
| a       | b                 | c     |
+---------+-------------------+-------+
| 0.00001 | 0.000000000001000 | true  |
| 0.00002 | 0.000000000002000 | false |
| 0.00002 | 0.000000000002000 | false |
| 0.00003 | 0.000000000003000 | true  |
| 0.00003 | 0.000000000003000 | true  |
| 0.00003 | 0.000000000003000 | true  |
| 0.00004 | 0.000000000004000 | false |
| 0.00004 | 0.000000000004000 | false |
| 0.00004 | 0.000000000004000 | false |
| 0.00004 | 0.000000000004000 | false |
| 0.00005 | 0.000000000005000 | true  |
| 0.00005 | 0.000000000005000 | true  |
| 0.00005 | 0.000000000005000 | true  |
| 0.00005 | 0.000000000005000 | true  |
| 0.00005 | 0.000000000005000 | true  |
+---------+-------------------+-------+

using the in filter to select data

❯  select * from food where c in (true,123);
+---------+-------------------+------+
| a       | b                 | c    |
+---------+-------------------+------+
| 0.00001 | 0.000000000001000 | true |
| 0.00003 | 0.000000000003000 | true |
| 0.00003 | 0.000000000003000 | true |
| 0.00003 | 0.000000000003000 | true |
| 0.00005 | 0.000000000005000 | true |
| 0.00005 | 0.000000000005000 | true |
| 0.00005 | 0.000000000005000 | true |
| 0.00005 | 0.000000000005000 | true |
| 0.00005 | 0.000000000005000 | true |
+---------+-------------------+------+

The column of c is bool data type, we can apply the compare operations to diff data type.

The behavior of other database, such as spark

spark-sql> desc t3;
c1                      int

we will get the error message

spark-sql> select * from t3 where c1 in(1,23,false);
Error in query: cannot resolve '(spark_catalog.default.t3.c1 IN (1, 23, false))' due to data type mismatch: Arguments must be same type but were: int != boolean; line 1 pos 26;
'Project [*]
+- 'Filter c1#358 IN (1,23,false)
   +- SubqueryAlias spark_catalog.default.t3
      +- HiveTableRelation [`default`.`t3`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [c1#358], Partition Cols: []]

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

@liukun4515 liukun4515 added the bug Something isn't working label Jun 22, 2022
@liukun4515
Copy link
Contributor Author

liukun4515 commented Jun 22, 2022

In the datafusion, we can convert NULL value of the NULL type to any other NULL value of data type.
For example: Convert NULL(NULL) to INT32(NULL) or FLOAT(NULL);

❯ EXPLAIN select cast(NULL as FLOAT);
+---------------+------------------------------------------------------+
| plan_type     | plan                                                 |
+---------------+------------------------------------------------------+
| logical_plan  | Projection: Float32(NULL) AS CAST(NULL AS Float32)   |
|               |   EmptyRelation                                      |
| physical_plan | ProjectionExec: expr=[NULL as CAST(NULL AS Float32)] |
|               |   EmptyExec: produce_one_row=true                    |
|               |                                                      |
+---------------+------------------------------------------------------+
2 rows in set. Query took 0.002 seconds.
❯ EXPLAIN select cast(NULL as INT);
+---------------+----------------------------------------------------+
| plan_type     | plan                                               |
+---------------+----------------------------------------------------+
| logical_plan  | Projection: Int32(NULL) AS CAST(NULL AS Int32)     |
|               |   EmptyRelation                                    |
| physical_plan | ProjectionExec: expr=[NULL as CAST(NULL AS Int32)] |
|               |   EmptyExec: produce_one_row=true                  |
|               |                                                    |
+---------------+----------------------------------------------------+

@liukun4515
Copy link
Contributor Author

liukun4515 commented Jun 22, 2022

If we want to use the NULL in the list expr. The data type of NULL is NULL

@liukun4515 liukun4515 changed the title InList Expr: expr and list values must can be convert to a same data type InList Expr: expr and list values must can be converted to a same data type Jun 22, 2022
@liukun4515
Copy link
Contributor Author

In the spark, we can't compare decimal with boolean, but can compare decimal with Null.

spark-sql> DESC T1;
c1                      decimal(10,3)

spark-sql>  select * from t1 where c1 in(NULL);
Time taken: 0.082 seconds
spark-sql>  select * from t1 where c1 in(CAST(NULL AS BOOLEAN));
Error in query: cannot resolve '(spark_catalog.default.t1.c1 IN (CAST(NULL AS BOOLEAN)))' due to data type mismatch: Arguments must be same type but were: decimal(10,3) != boolean; line 1 pos 27;
'Project [*]
+- 'Filter c1#656 IN (cast(null as boolean))
   +- SubqueryAlias spark_catalog.default.t1
      +- HiveTableRelation [`default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [c1#656], Partition Cols: []]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant