Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Misleading documentation for spark.rapids.sql.incompatibleDateFormats.enabled #2003

Closed
andygrove opened this issue Mar 23, 2021 · 0 comments · Fixed by #2086
Closed
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@andygrove
Copy link
Contributor

Report incorrect documentation

Location of incorrect documentation

The RapidsConf documentation for spark.rapids.sql.incompatibleDateFormats.enabled is slightly misleading.

Describe the problems or issues found in the documentation

The documentation states that:

When parsing strings as dates and timestamps in functions like unix_timestamp, 
setting this to true will force all parsing onto GPU even for formats that can 
result in incorrect results when parsing invalid inputs.

What isn't clear here is that some formats are not supported at all on GPU and will still fall back to CPU. For example, we don't support formats that include MMM on GPU.

Steps taken to verify documentation is incorrect

scala> spark.conf.set("spark.sql.session.timeZone", "UTC")
scala> spark.conf.set("spark.rapids.sql.incompatibleDateFormats.enabled", "true")
scala> val df_notsupported = Seq(("2021-Dec-25 11:11:11")).toDF("ts")
scala> df_notsupported.write.format("parquet").mode("overwrite").save("/tmp/testts_notsupported.parquet")
scala> spark.read.parquet("/tmp/testts_notsupported.parquet").createOrReplaceTempView("df_notsupported")
scala> val df = spark.sql("select to_timestamp(ts, 'yyyy-MMM-dd HH:mm:ss') from df_notsupported")
df: org.apache.spark.sql.DataFrame = [to_timestamp(ts, yyyy-MMM-dd HH:mm:ss): timestamp]

scala> df.collect
21/03/23 12:46:29 WARN GpuOverrides: 
!Exec <ProjectExec> cannot run on GPU because unsupported data types in output: TimestampType; not all expressions can be replaced
  !Expression <Alias> gettimestamp(ts#7, yyyy-MMM-dd HH:mm:ss, Some(UTC), true) AS to_timestamp(ts, yyyy-MMM-dd HH:mm:ss)#9 cannot run on GPU because expression GetTimestamp gettimestamp(ts#7, yyyy-MMM-dd HH:mm:ss, Some(UTC), true) produces an unsupported type TimestampType; expression Alias gettimestamp(ts#7, yyyy-MMM-dd HH:mm:ss, Some(UTC), true) AS to_timestamp(ts, yyyy-MMM-dd HH:mm:ss)#9 produces an unsupported type TimestampType
    !Expression <GetTimestamp> gettimestamp(ts#7, yyyy-MMM-dd HH:mm:ss, Some(UTC), true) cannot run on GPU because expression GetTimestamp gettimestamp(ts#7, yyyy-MMM-dd HH:mm:ss, Some(UTC), true) produces an unsupported type TimestampType; Failed to convert Unsupported word: MMM null
      @Expression <AttributeReference> ts#7 could run on GPU
      @Expression <Literal> yyyy-MMM-dd HH:mm:ss could run on GPU
  *Exec <FileSourceScanExec> will run on GPU

Suggested fix for documentation

When parsing strings as dates and timestamps in functions like unix_timestamp, some formats are
fully supported on GPU, some are supported but can produce incorrect results for invalid inputs, 
and others are not supported at all. Setting this to true will force all parsing onto GPU for supported formats, 
including formats that can result in incorrect results.
```
@andygrove andygrove added documentation Improvements or additions to documentation ? - Needs Triage Need team to review and classify labels Mar 23, 2021
@andygrove andygrove added this to the Mar 15 - March 26 milestone Mar 23, 2021
@andygrove andygrove self-assigned this Mar 23, 2021
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Mar 23, 2021
@sameerz sameerz self-assigned this Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants