From cc3be0f87cc5057819a52e15db635fea8d55083c Mon Sep 17 00:00:00 2001 From: Sameer Raheja Date: Tue, 20 Apr 2021 08:15:04 -0700 Subject: [PATCH] Updating documentation for data format support (#2086) * Updating documentation for data format support Signed-off-by: Sameer Raheja * Update config docs to reflect compatibility doc update Signed-off-by: Sameer Raheja --- docs/configs.md | 2 +- .../scala/com/nvidia/spark/rapids/RapidsConf.scala | 10 +++++++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/docs/configs.md b/docs/configs.md index e3eb258983c..064792420f1 100644 --- a/docs/configs.md +++ b/docs/configs.md @@ -87,7 +87,7 @@ Name | Description | Default Value spark.rapids.sql.hashOptimizeSort.enabled|Whether sorts should be inserted after some hashed operations to improve output ordering. This can improve output file sizes when saving to columnar formats.|false spark.rapids.sql.improvedFloatOps.enabled|For some floating point operations spark uses one way to compute the value and the underlying cudf implementation can use an improved algorithm. In some cases this can result in cudf producing an answer when spark overflows. Because this is not as compatible with spark, we have it disabled by default.|false spark.rapids.sql.improvedTimeOps.enabled|When set to true, some operators will avoid overflowing by converting epoch days directly to seconds without first converting to microseconds|false -spark.rapids.sql.incompatibleDateFormats.enabled|When parsing strings as dates and timestamps in functions like unix_timestamp, setting this to true will force all parsing onto GPU even for formats that can result in incorrect results when parsing invalid inputs.|false +spark.rapids.sql.incompatibleDateFormats.enabled|When parsing strings as dates and timestamps in functions like unix_timestamp, some formats are fully supported on the GPU and some are unsupported and will fall back to the CPU. Some formats behave differently on the GPU than the CPU. Spark on the CPU interprets date formats with unsupported trailing characters as nulls, while Spark on the GPU will parse the date with invalid trailing characters. More detail can be found at [parsing strings as dates or timestamps](compatibility.md#parsing-strings-as-dates-or-timestamps).|false spark.rapids.sql.incompatibleOps.enabled|For operations that work, but are not 100% compatible with the Spark equivalent set if they should be enabled by default or disabled by default.|false spark.rapids.sql.metrics.level|GPU plans can produce a lot more metrics than CPU plans do. In very large queries this can sometimes result in going over the max result size limit for the driver. Supported values include DEBUG which will enable all metrics supported and typically only needs to be enabled when debugging the plugin. MODERATE which should output enough metrics to understand how long each part of the query is taking and how much data is going to each part of the query. ESSENTIAL which disables most metrics except those Apache Spark CPU plans will also report or their equivalents.|MODERATE spark.rapids.sql.python.gpu.enabled|This is an experimental feature and is likely to change in the future. Enable (true) or disable (false) support for scheduling Python Pandas UDFs with GPU resources. When enabled, pandas UDFs are assumed to share the same GPU that the RAPIDs accelerator uses and will honor the python GPU configs|false diff --git a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala b/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala index 6ca923e699f..a7dc35dbde1 100644 --- a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala +++ b/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala @@ -494,9 +494,13 @@ object RapidsConf { .createWithDefault(false) val INCOMPATIBLE_DATE_FORMATS = conf("spark.rapids.sql.incompatibleDateFormats.enabled") - .doc("When parsing strings as dates and timestamps in functions like unix_timestamp, " + - "setting this to true will force all parsing onto GPU even for formats that can " + - "result in incorrect results when parsing invalid inputs.") + .doc("When parsing strings as dates and timestamps in functions like unix_timestamp, some " + + "formats are fully supported on the GPU and some are unsupported and will fall back to " + + "the CPU. Some formats behave differently on the GPU than the CPU. Spark on the CPU " + + "interprets date formats with unsupported trailing characters as nulls, while Spark on " + + "the GPU will parse the date with invalid trailing characters. More detail can be found " + + "at [parsing strings as dates or timestamps]" + + "(compatibility.md#parsing-strings-as-dates-or-timestamps).") .booleanConf .createWithDefault(false)