You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The RapidsConf documentation for spark.rapids.sql.incompatibleDateFormats.enabled is slightly misleading.
Describe the problems or issues found in the documentation
The documentation states that:
When parsing strings as dates and timestamps in functions like unix_timestamp,
setting this to true will force all parsing onto GPU even for formats that can
result in incorrect results when parsing invalid inputs.
What isn't clear here is that some formats are not supported at all on GPU and will still fall back to CPU. For example, we don't support formats that include MMM on GPU.
Steps taken to verify documentation is incorrect
scala> spark.conf.set("spark.sql.session.timeZone", "UTC")
scala> spark.conf.set("spark.rapids.sql.incompatibleDateFormats.enabled", "true")
scala> val df_notsupported = Seq(("2021-Dec-25 11:11:11")).toDF("ts")
scala> df_notsupported.write.format("parquet").mode("overwrite").save("/tmp/testts_notsupported.parquet")
scala> spark.read.parquet("/tmp/testts_notsupported.parquet").createOrReplaceTempView("df_notsupported")
scala> val df = spark.sql("select to_timestamp(ts, 'yyyy-MMM-dd HH:mm:ss') from df_notsupported")
df: org.apache.spark.sql.DataFrame = [to_timestamp(ts, yyyy-MMM-dd HH:mm:ss): timestamp]
scala> df.collect
21/03/23 12:46:29 WARN GpuOverrides:
!Exec <ProjectExec> cannot run on GPU because unsupported data types in output: TimestampType; not all expressions can be replaced
!Expression <Alias> gettimestamp(ts#7, yyyy-MMM-dd HH:mm:ss, Some(UTC), true) AS to_timestamp(ts, yyyy-MMM-dd HH:mm:ss)#9 cannot run on GPU because expression GetTimestamp gettimestamp(ts#7, yyyy-MMM-dd HH:mm:ss, Some(UTC), true) produces an unsupported type TimestampType; expression Alias gettimestamp(ts#7, yyyy-MMM-dd HH:mm:ss, Some(UTC), true) AS to_timestamp(ts, yyyy-MMM-dd HH:mm:ss)#9 produces an unsupported type TimestampType
!Expression <GetTimestamp> gettimestamp(ts#7, yyyy-MMM-dd HH:mm:ss, Some(UTC), true) cannot run on GPU because expression GetTimestamp gettimestamp(ts#7, yyyy-MMM-dd HH:mm:ss, Some(UTC), true) produces an unsupported type TimestampType; Failed to convert Unsupported word: MMM null
@Expression <AttributeReference> ts#7 could run on GPU
@Expression <Literal> yyyy-MMM-dd HH:mm:ss could run on GPU
*Exec <FileSourceScanExec> will run on GPU
Suggested fix for documentation
When parsing strings as dates and timestamps in functions like unix_timestamp, some formats are
fully supported on GPU, some are supported but can produce incorrect results for invalid inputs,
and others are not supported at all. Setting this to true will force all parsing onto GPU for supported formats,
including formats that can result in incorrect results.
```
The text was updated successfully, but these errors were encountered:
Report incorrect documentation
Location of incorrect documentation
The
RapidsConf
documentation forspark.rapids.sql.incompatibleDateFormats.enabled
is slightly misleading.Describe the problems or issues found in the documentation
The documentation states that:
What isn't clear here is that some formats are not supported at all on GPU and will still fall back to CPU. For example, we don't support formats that include
MMM
on GPU.Steps taken to verify documentation is incorrect
Suggested fix for documentation
The text was updated successfully, but these errors were encountered: