-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix : signum
function bug when 0.0
input
#11580
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for a Postgres-compatible implementation of signum.
For spark compatibility we would need a small change to handle the -0.0 case to return -1. We could either implement a copy of this code in the Comet project, or add a flag to the constructor of this function to control behavior for the -0.0
special case.
Here are the values I get for Spark 3.5.1: data = ["-1", "1", "0", "-0.0", "0.0", "1.0", "-1.0"]
df = spark.createDataFrame([(datum,) for datum in data], "col1 string")
df.collect()
# [Row(col1='-1'), Row(col1='1'), Row(col1='0'), Row(col1='-0.0'), Row(col1='0.0'), Row(col1='1.0'), Row(col1='-1.0')]
pprint.pprint(df.selectExpr("col1", "signum(cast(col1 as float)) as result").collect())
# [Row(col1='-1', result=-1.0),
# Row(col1='1', result=1.0),
# Row(col1='0', result=0.0),
# Row(col1='-0.0', result=-0.0),
# Row(col1='0.0', result=0.0),
# Row(col1='1.0', result=1.0),
# Row(col1='-1.0', result=-1.0)] So it seems like -0.0 in Spark should specifically return -0.0 in the signum function, not -1, right? Literals seem to be treated as Decimal instead, which doesn't make the distinction between negative and positive zero, so those don't see the same behavior - I'd guess that's what caused the specific results listed for Spark in #11557 (comment): spark.sql("""SELECT -1, 1, 0, -0.0, 0.0, 1.0, -1.0""").collect()
# [Row(-1=-1, 1=1, 0=0, 0.0=Decimal('0.0'), 0.0=Decimal('0.0'), 1.0=Decimal('1.0'), -1.0=Decimal('-1.0'))]
spark.sql("""SELECT signum(-1), signum(1), signum(0), signum(-0.0), signum(0.0), signum(1.0), signum(-1.0)""").collect()
# [Row(SIGNUM(-1)=-1.0, SIGNUM(1)=1.0, SIGNUM(0)=0.0, SIGNUM(0.0)=0.0, SIGNUM(0.0)=0.0, SIGNUM(1.0)=1.0, SIGNUM(-1.0)=-1.0)] For Postgres 16.3, this behavior looks like what I see for both postgres=# WITH tab(datum) AS (VALUES ('-1'), ('1'), ('0'), ('-0.0'), ('0.0'), ('1.0'), ('-1.0'))
SELECT *, cast(datum as decimal) decimal, cast(datum as float) float, sign(cast(datum as decimal)) sign_decimal, sign(cast(datum as float)) sign_float FROM tab;
datum | decimal | float | sign_decimal | sign_float
-------+---------+-------+--------------+------------
-1 | -1 | -1 | -1 | -1
1 | 1 | 1 | 1 | 1
0 | 0 | 0 | 0 | 0
-0.0 | 0.0 | -0 | 0 | 0
0.0 | 0.0 | 0 | 0 | 0
1.0 | 1.0 | 1 | 1 | 1
-1.0 | -1.0 | -1 | -1 | -1
(7 rows) |
Thanks @getChan @andygrove and @Throne3d |
Which issue does this PR close?
Closes #11557
What changes are included in this PR?
signum_order
function from monotonicity.rs to signum.rsmake_udf_function!
macroAre these changes tested?
Are there any user-facing changes?
no