Fix : `signum` function bug when `0.0` input #11580

getChan · 2024-07-21T10:39:43Z

Which issue does this PR close?

What changes are included in this PR?

add signum function unit test
move signum_order function from monotonicity.rs to signum.rs
signum function implementation by new code. not make_udf_function! macro

Are these changes tested?

Add some unit tests.
datafusion-cli test

> select signum(-1), signum(1), signum(0), signum(-0.0), signum(0.0), signum(1.0), signum(-1.0)
;
+-------------------+------------------+------------------+---------------------+--------------------+--------------------+---------------------+
| signum(Int64(-1)) | signum(Int64(1)) | signum(Int64(0)) | signum(Float64(-0)) | signum(Float64(0)) | signum(Float64(1)) | signum(Float64(-1)) |
+-------------------+------------------+------------------+---------------------+--------------------+--------------------+---------------------+
| -1.0              | 1.0              | 0.0              | 0.0                 | 0.0                | 1.0                | -1.0                |
+-------------------+------------------+------------------+---------------------+--------------------+--------------------+---------------------+

Are there any user-facing changes?

no

andygrove

LGTM for a Postgres-compatible implementation of signum.

For spark compatibility we would need a small change to handle the -0.0 case to return -1. We could either implement a copy of this code in the Comet project, or add a flag to the constructor of this function to control behavior for the -0.0 special case.

Throne3d · 2024-07-23T04:01:31Z

Here are the values I get for Spark 3.5.1:

data = ["-1", "1", "0", "-0.0", "0.0", "1.0", "-1.0"]
df = spark.createDataFrame([(datum,) for datum in data], "col1 string")
df.collect()
# [Row(col1='-1'), Row(col1='1'), Row(col1='0'), Row(col1='-0.0'), Row(col1='0.0'), Row(col1='1.0'), Row(col1='-1.0')]

pprint.pprint(df.selectExpr("col1", "signum(cast(col1 as float)) as result").collect())
# [Row(col1='-1', result=-1.0),
#  Row(col1='1', result=1.0),
#  Row(col1='0', result=0.0),
#  Row(col1='-0.0', result=-0.0),
#  Row(col1='0.0', result=0.0),
#  Row(col1='1.0', result=1.0),
#  Row(col1='-1.0', result=-1.0)]

So it seems like -0.0 in Spark should specifically return -0.0 in the signum function, not -1, right?

Literals seem to be treated as Decimal instead, which doesn't make the distinction between negative and positive zero, so those don't see the same behavior - I'd guess that's what caused the specific results listed for Spark in #11557 (comment):

spark.sql("""SELECT -1, 1, 0, -0.0, 0.0, 1.0, -1.0""").collect()
# [Row(-1=-1, 1=1, 0=0, 0.0=Decimal('0.0'), 0.0=Decimal('0.0'), 1.0=Decimal('1.0'), -1.0=Decimal('-1.0'))]
spark.sql("""SELECT signum(-1), signum(1), signum(0), signum(-0.0), signum(0.0), signum(1.0), signum(-1.0)""").collect()
# [Row(SIGNUM(-1)=-1.0, SIGNUM(1)=1.0, SIGNUM(0)=0.0, SIGNUM(0.0)=0.0, SIGNUM(0.0)=0.0, SIGNUM(1.0)=1.0, SIGNUM(-1.0)=-1.0)]

For Postgres 16.3, this behavior looks like what I see for both decimal and float though!

postgres=# WITH tab(datum) AS (VALUES ('-1'), ('1'), ('0'), ('-0.0'), ('0.0'), ('1.0'), ('-1.0'))
SELECT *, cast(datum as decimal) decimal, cast(datum as float) float, sign(cast(datum as decimal)) sign_decimal, sign(cast(datum as float)) sign_float FROM tab;
 datum | decimal | float | sign_decimal | sign_float
-------+---------+-------+--------------+------------
 -1    |      -1 |    -1 |           -1 |         -1
 1     |       1 |     1 |            1 |          1
 0     |       0 |     0 |            0 |          0
 -0.0  |     0.0 |    -0 |            0 |          0
 0.0   |     0.0 |     0 |            0 |          0
 1.0   |     1.0 |     1 |            1 |          1
 -1.0  |    -1.0 |    -1 |           -1 |         -1
(7 rows)

alamb · 2024-07-24T13:11:11Z

Thanks @getChan @andygrove and @Throne3d

getChan added 5 commits July 21, 2024 17:23

add signum unit test

6fc9ca3

fix: signum function implementation - input zero output zero

37d0065

fix: run cargo fmt

cec3863

fix: not specified return type is float64

8a18fbd

fix: sqllogictest

768f7ff

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jul 21, 2024

andygrove approved these changes Jul 22, 2024

View reviewed changes

andygrove mentioned this pull request Jul 22, 2024

signum(0) returns incorrect result apache/datafusion-comet#664

Open

alamb merged commit 8945462 into apache:main Jul 24, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix : `signum` function bug when `0.0` input #11580

Fix : `signum` function bug when `0.0` input #11580

getChan commented Jul 21, 2024 •

edited

Loading

andygrove left a comment

Throne3d commented Jul 23, 2024 •

edited

Loading

alamb commented Jul 24, 2024

Fix : signum function bug when 0.0 input #11580

Fix : signum function bug when 0.0 input #11580

Conversation

getChan commented Jul 21, 2024 • edited Loading

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

andygrove left a comment

Choose a reason for hiding this comment

Throne3d commented Jul 23, 2024 • edited Loading

alamb commented Jul 24, 2024

Fix : `signum` function bug when `0.0` input #11580

Fix : `signum` function bug when `0.0` input #11580

getChan commented Jul 21, 2024 •

edited

Loading

Throne3d commented Jul 23, 2024 •

edited

Loading