Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signum function incompatible with Postgres and Apache Spark #11557

Closed
andygrove opened this issue Jul 19, 2024 · 4 comments · Fixed by #11580
Closed

signum function incompatible with Postgres and Apache Spark #11557

andygrove opened this issue Jul 19, 2024 · 4 comments · Fixed by #11580
Assignees
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@andygrove
Copy link
Member

andygrove commented Jul 19, 2024

Describe the bug

In Postgres and Apache Spark, signum returns -1 for negative integers, 0 for zero, and 1 for positive integers.

DataFusion uses Rust's signum function, which has different behavior.

❯ select signum(-1), signum(0), signum(1);
+-------------------+------------------+------------------+
| signum(Int64(-1)) | signum(Int64(0)) | signum(Int64(1)) |
+-------------------+------------------+------------------+
| -1.0              | 1.0              | 1.0              |
+-------------------+------------------+------------------+

For floating-point inputs, Apache Spark returns -1 for -0.0 and 1 for +0.0. I have not researched what Postgres does in this case.

@andygrove andygrove added the bug Something isn't working label Jul 19, 2024
@andygrove andygrove added help wanted Extra attention is needed good first issue Good for newcomers labels Jul 19, 2024
@goldmedal
Copy link
Contributor

take

@goldmedal goldmedal removed their assignment Jul 20, 2024
@goldmedal
Copy link
Contributor

I will solve another issue first. I'll take it back if no one else works on it after I finish.

@getChan
Copy link
Contributor

getChan commented Jul 20, 2024

take

@getChan
Copy link
Contributor

getChan commented Jul 21, 2024

Comparison of the signum function between DataFusion SQL and Rust

datafusion rust spark postgresql
-1 -1 -1 (-1_i32) -1 -1
0 1 0 (0_i32) 0 0
+1 1 1 (1_i32) 1 1
-0.0 -1 -1 (-0.0_f32) 0 0
+0.0 1 1 (0.0_f32) 0 0

In Rust, the definition of the signum function for floating point types is as follows:

Returns a number that represents the sign of self.

  • 1.0 if the number is positive, +0.0 or INFINITY
  • -1.0 if the number is negative, -0.0 or NEG_INFINITY
  • NaN if the number is NaN

My guess is that the implementation of the signum function in DataFusion only accepts floating point types as input. Consequently, if it receives 0 as input, it appears to convert it to Rust’s 0.0 and output 1

ref) https://github.com/apache/datafusion/blob/main/datafusion/functions/src/macros.rs#L225-L254

implementation

  • I will implement a separate signum function without using the unary function macro.
  • The output will match that of Spark and PostgreSQL.

If you have any other opinions, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants