Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] unix_timestamp on GPU returns invalid data for bad input #1098

Closed
andygrove opened this issue Nov 11, 2020 · 1 comment · Fixed by #1113
Closed

[BUG] unix_timestamp on GPU returns invalid data for bad input #1098

andygrove opened this issue Nov 11, 2020 · 1 comment · Fixed by #1113
Assignees
Labels
bug Something isn't working

Comments

@andygrove
Copy link
Contributor

Describe the bug

The unix_timestamp function should either return null values or throw an exception for values that cannot be parsed given the specified format (depending on the value of spark.sql.legacy.timeParserPolicy).

However, the function currently returns incorrect data.

Steps/Code to reproduce bug

testSparkResultsAreEqual("Test unix_timestamp mix of valid and invalid inputs",
  spark => {
    import spark.implicits._
    Seq(
      "31/12/1999",
      "31-12-1999",
      "12/31/1999",
      "12-31-1999",
      "1999-12-31",
      "1999/12/31",
    ).toDF("c0")
  },
  conf()) {
  df => df.withColumn("c1", unix_timestamp(col("c0"), "yyyy-MM-dd"))
}
Running on the GPU and on the CPU did not match 
CPU: ArrayBuffer([31/12/1999,null], [31-12-1999,null], [12/31/1999,null], [12-31-1999,null], [1999-12-31,946598400], [1999/12/31,null])

GPU: ArrayBuffer([31/12/1999,-61206796800], [31-12-1999,-61206796800], [12/31/1999,-61806412800], [12-31-1999,-61806412800], [1999-12-31,946598400], [1999/12/31,946598400])

Expected behavior
Behavior should match Spark on CPU.

Environment details (please complete the following information)
N/A

Additional context
N/A

@andygrove andygrove added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 11, 2020
@andygrove andygrove added this to the Nov 9 - Nov 20 milestone Nov 11, 2020
@andygrove andygrove self-assigned this Nov 11, 2020
@andygrove
Copy link
Contributor Author

cuDF now has an is_timestamp function that we could use to filter out invalid values.

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Nov 17, 2020
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants