-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better float/double cases for casting tests #1781
Better float/double cases for casting tests #1781
Conversation
Signed-off-by: sperlingxx <[email protected]>
build |
build |
@@ -425,6 +383,14 @@ class CastOpSuite extends GpuExpressionTestSuite { | |||
col("c1").cast(FloatType)) | |||
} | |||
|
|||
ignore("overflow double strings") { | |||
testSparkResultsAreEqual("overflow double strings", badDoubleStringsDf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
badDoubleStringsDf
is not as comprehensive as I would like it to be. The values in it are big enough to force an overflow, but there are still a number of corner cases where according to rapidsai/cudf#5225 are not being covered. The patch to fix it rapidsai/cudf#7410 is not in and not working 100% either in these same overflow cases. I am happy to have these go in as it, but I would like to see a follow on issue to cover the corner cases that are missing, and have a comment in badDoubleStringsDf
pointing to it. Would be good to look at the float one too and make sure it is also not suffering from the same issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @revans2, I tried with many other similar (slightly overflow) cases. And I found GPU can produce correct results for almost all of them, except some special cases included in badDoubleStringsDf
.
For float type, I failed to find any mismatching case.
For double type, I only found numbers within range [1.79769313486231581E308, 1.797693134862316E308) produce inconsistent results with spark.
build |
build |
@razajafri you filed the original bug about casting from string to double. Could you take a look at this PR too? |
@@ -1131,14 +1131,16 @@ trait SparkQueryCompareTestSuite extends FunSuite with Arm { | |||
"9.8e5").toDF("c0") | |||
} | |||
|
|||
def badFloatStringsDf(session: SparkSession): DataFrame = { | |||
def invalidFloatStringsDf(session: SparkSession): DataFrame = { | |||
import session.sqlContext.implicits._ | |||
Seq(("A", "null"), ("1.3", "43.54")).toDF("c0", "c1") | |||
} | |||
|
|||
def badDoubleStringsDf(session: SparkSession): DataFrame = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for consistency may be this should be renamed to invalidDoubleStringsDf
Just one nit otherwise LGTM |
* enhance float/double cases for casting tests Signed-off-by: sperlingxx <[email protected]> * continue * code clean * code clean * fix typo * fix typo * some updates * fix typo
* enhance float/double cases for casting tests Signed-off-by: sperlingxx <[email protected]> * continue * code clean * code clean * fix typo * fix typo * some updates * fix typo
This pull request is to address issue #1764. And current PR also includes some code clean work .