Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Handle Decimal128 computation with overflow of Remainder on Spark 3.4 #8330

Closed
NVnavkumar opened this issue May 19, 2023 · 0 comments · Fixed by #8414
Closed

[BUG] Handle Decimal128 computation with overflow of Remainder on Spark 3.4 #8330

NVnavkumar opened this issue May 19, 2023 · 0 comments · Fixed by #8414
Assignees
Labels
bug Something isn't working Spark 3.4+ Spark 3.4+ issues

Comments

@NVnavkumar
Copy link
Collaborator

Describe the bug
When computing the Remainder for Decimal types, the existing algorithm can only handle what previous versions of Spark could handle as far as overflow of the Decimal128 operands. This would mean that if an upcast cannot be performed because of limitations of precision, an exception would be thrown (ANSI mode) or the result would be null. Spark 3.4 actually computes this remainder now because it can use the BigDecimal type from JAVA and round from there.

Steps/Code to reproduce bug
Try to compute the remainder between 2 large decimal 128 values:

PySpark example:

data = [[Decimal('5776949384953805890688943467625198736'), Decimal('-67337920196996830.354487679299')]]
schema = StructType([
    StructField("a", DecimalType(38,0), True),
    StructField("b", DecimalType(27,7), True)])
df = spark.createDataFrame(data=data, schema=schema)

out = df.selectExpr("a", "b", r"a % b").collect()

Expected behavior
When running in Spark 3.4, this should return the correct value for a % b (in other versions of Spark, it returns null)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Spark 3.4+ Spark 3.4+ issues
Projects
None yet
2 participants