Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent type coercion rules with comparison expressions #2890

Closed
andygrove opened this issue Jul 12, 2022 · 4 comments · Fixed by #2901
Closed

Inconsistent type coercion rules with comparison expressions #2890

andygrove opened this issue Jul 12, 2022 · 4 comments · Fixed by #2901
Labels
bug Something isn't working

Comments

@andygrove
Copy link
Member

Describe the bug
I have a table with an int32 and a string column.

❯ select c3, c4 from data limit 2;
+------------+----------+
| c3         | c4       |
+------------+----------+
|            |          |
| -987603476 | H)n_zVMR |
+------------+----------+

I can compare them with = and != and an implicit cast gets added, casting the int to a string.

❯ select c3, c4 from data where c3 = c4 limit 2;
0 rows in set. Query took 0.016 seconds.

❯ select c3, c4 from data where c3 != c4 limit 2;
+------------+----------+
| c3         | c4       |
+------------+----------+
| -987603476 | H)n_zVMR |
| 854785627  | /\t+h*@D |
+------------+----------+

The explain for the = query shows CAST(c3@0 AS Utf8) = c4@1.

Note that Spark would cast the string to an int and not the int to a string, so I'm not sure if we are doing the right thing here.

If I use other comparison expressions I get an error instead.

❯ select c3, c4 from data where c3 < c4 limit 2;
Plan("'Int32 < Utf8' can't be evaluated because there isn't a common type to coerce the types to")

This seems inconsistent.

To Reproduce
See above.

Expected behavior

  • All comparison expressions should use the same casts
  • We should use int as the common type here, not string?

Additional context
None

@andygrove andygrove added the bug Something isn't working label Jul 12, 2022
@andygrove
Copy link
Member Author

andygrove commented Jul 12, 2022

Postgres does not add implicit casts in any of these cases:

andy=# select * from type_test where c0 = c1;
ERROR:  operator does not exist: integer = character varying

@andygrove
Copy link
Member Author

@alamb DataFusion is inconsistent with itself as to when it adds implicit conversions or not. I think it would be better to not have implicit conversions at all and keep everything explicit, and this seems to be Postgres' approach as well (and the opposite of Spark). What is your opinion on this?

@alamb
Copy link
Contributor

alamb commented Jul 13, 2022

@andygrove

I think implict type conversions are needed to do many arithmetic operations (e.g. adding a float to a integer) and that postgres does add implicit type conversions. For example:

alamb=# select cast(1 as smallint) + cast(2 as bigint);
 ?column? 
----------
        3
(1 row)
alamb=# select cast(1 as float) + cast(2 as bigint);
 ?column? 
----------
        3
(1 row)

Postgres also seems to add conversions from integers to string for certain operations

alamb=# create table foo(i int, c varchar);
CREATE TABLE
alamb=# insert into foo values (1, 'foo');
INSERT 0 1
alamb=# insert into foo values (2, 'bar');
INSERT 0 1
alamb=# select i || c from foo;
 ?column? 
----------
 1foo
 2bar
(2 rows)

However interestingly it does not seem to try and coerce constants:

ERROR:  invalid input syntax for type integer: "foo"
LINE 1: select 1 = 'foo';
                   ^
alamb=# 

I suspect postgres doesn't the challenges of equality between strings and numeric values (e.g. you can't always do it losselessly) -- perhaps we should remove the automatic string coercion for equality?

I think the code is
https://github.com/apache/arrow-datafusion/blob/6e0bb8476d783c1caaf6bf011487c92ae9352f78/datafusion/expr/src/binary_rule.rs#L167

@alamb
Copy link
Contributor

alamb commented Jul 13, 2022

But in general I agree that the type coercion in DataFusion could be improved and is likely not self consistent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants