Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix : signum function bug when 0.0 input #11580

Merged
merged 5 commits into from
Jul 24, 2024

Conversation

getChan
Copy link
Contributor

@getChan getChan commented Jul 21, 2024

Which issue does this PR close?

Closes #11557

What changes are included in this PR?

  1. add signum function unit test
  2. move signum_order function from monotonicity.rs to signum.rs
  3. signum function implementation by new code. not make_udf_function! macro

Are these changes tested?

  1. Add some unit tests.
  2. datafusion-cli test
> select signum(-1), signum(1), signum(0), signum(-0.0), signum(0.0), signum(1.0), signum(-1.0)
;
+-------------------+------------------+------------------+---------------------+--------------------+--------------------+---------------------+
| signum(Int64(-1)) | signum(Int64(1)) | signum(Int64(0)) | signum(Float64(-0)) | signum(Float64(0)) | signum(Float64(1)) | signum(Float64(-1)) |
+-------------------+------------------+------------------+---------------------+--------------------+--------------------+---------------------+
| -1.0              | 1.0              | 0.0              | 0.0                 | 0.0                | 1.0                | -1.0                |
+-------------------+------------------+------------------+---------------------+--------------------+--------------------+---------------------+

Are there any user-facing changes?

no

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jul 21, 2024
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for a Postgres-compatible implementation of signum.

For spark compatibility we would need a small change to handle the -0.0 case to return -1. We could either implement a copy of this code in the Comet project, or add a flag to the constructor of this function to control behavior for the -0.0 special case.

@Throne3d
Copy link

Throne3d commented Jul 23, 2024

Here are the values I get for Spark 3.5.1:

data = ["-1", "1", "0", "-0.0", "0.0", "1.0", "-1.0"]
df = spark.createDataFrame([(datum,) for datum in data], "col1 string")
df.collect()
# [Row(col1='-1'), Row(col1='1'), Row(col1='0'), Row(col1='-0.0'), Row(col1='0.0'), Row(col1='1.0'), Row(col1='-1.0')]

pprint.pprint(df.selectExpr("col1", "signum(cast(col1 as float)) as result").collect())
# [Row(col1='-1', result=-1.0),
#  Row(col1='1', result=1.0),
#  Row(col1='0', result=0.0),
#  Row(col1='-0.0', result=-0.0),
#  Row(col1='0.0', result=0.0),
#  Row(col1='1.0', result=1.0),
#  Row(col1='-1.0', result=-1.0)]

So it seems like -0.0 in Spark should specifically return -0.0 in the signum function, not -1, right?

Literals seem to be treated as Decimal instead, which doesn't make the distinction between negative and positive zero, so those don't see the same behavior - I'd guess that's what caused the specific results listed for Spark in #11557 (comment):

spark.sql("""SELECT -1, 1, 0, -0.0, 0.0, 1.0, -1.0""").collect()
# [Row(-1=-1, 1=1, 0=0, 0.0=Decimal('0.0'), 0.0=Decimal('0.0'), 1.0=Decimal('1.0'), -1.0=Decimal('-1.0'))]
spark.sql("""SELECT signum(-1), signum(1), signum(0), signum(-0.0), signum(0.0), signum(1.0), signum(-1.0)""").collect()
# [Row(SIGNUM(-1)=-1.0, SIGNUM(1)=1.0, SIGNUM(0)=0.0, SIGNUM(0.0)=0.0, SIGNUM(0.0)=0.0, SIGNUM(1.0)=1.0, SIGNUM(-1.0)=-1.0)]

For Postgres 16.3, this behavior looks like what I see for both decimal and float though!

postgres=# WITH tab(datum) AS (VALUES ('-1'), ('1'), ('0'), ('-0.0'), ('0.0'), ('1.0'), ('-1.0'))
SELECT *, cast(datum as decimal) decimal, cast(datum as float) float, sign(cast(datum as decimal)) sign_decimal, sign(cast(datum as float)) sign_float FROM tab;
 datum | decimal | float | sign_decimal | sign_float
-------+---------+-------+--------------+------------
 -1    |      -1 |    -1 |           -1 |         -1
 1     |       1 |     1 |            1 |          1
 0     |       0 |     0 |            0 |          0
 -0.0  |     0.0 |    -0 |            0 |          0
 0.0   |     0.0 |     0 |            0 |          0
 1.0   |     1.0 |     1 |            1 |          1
 -1.0  |    -1.0 |    -1 |           -1 |         -1
(7 rows)

@alamb alamb merged commit 8945462 into apache:main Jul 24, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 24, 2024

Thanks @getChan @andygrove and @Throne3d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

signum function incompatible with Postgres and Apache Spark
4 participants