Skip to content

Commit

Permalink
Updating documentation for data format support (#2086)
Browse files Browse the repository at this point in the history
* Updating documentation for data format support

Signed-off-by: Sameer Raheja <[email protected]>

* Update config docs to reflect compatibility doc update

Signed-off-by: Sameer Raheja <[email protected]>
  • Loading branch information
sameerz authored Apr 20, 2021
1 parent 512ba88 commit cc3be0f
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 4 deletions.
2 changes: 1 addition & 1 deletion docs/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Name | Description | Default Value
<a name="sql.hashOptimizeSort.enabled"></a>spark.rapids.sql.hashOptimizeSort.enabled|Whether sorts should be inserted after some hashed operations to improve output ordering. This can improve output file sizes when saving to columnar formats.|false
<a name="sql.improvedFloatOps.enabled"></a>spark.rapids.sql.improvedFloatOps.enabled|For some floating point operations spark uses one way to compute the value and the underlying cudf implementation can use an improved algorithm. In some cases this can result in cudf producing an answer when spark overflows. Because this is not as compatible with spark, we have it disabled by default.|false
<a name="sql.improvedTimeOps.enabled"></a>spark.rapids.sql.improvedTimeOps.enabled|When set to true, some operators will avoid overflowing by converting epoch days directly to seconds without first converting to microseconds|false
<a name="sql.incompatibleDateFormats.enabled"></a>spark.rapids.sql.incompatibleDateFormats.enabled|When parsing strings as dates and timestamps in functions like unix_timestamp, setting this to true will force all parsing onto GPU even for formats that can result in incorrect results when parsing invalid inputs.|false
<a name="sql.incompatibleDateFormats.enabled"></a>spark.rapids.sql.incompatibleDateFormats.enabled|When parsing strings as dates and timestamps in functions like unix_timestamp, some formats are fully supported on the GPU and some are unsupported and will fall back to the CPU. Some formats behave differently on the GPU than the CPU. Spark on the CPU interprets date formats with unsupported trailing characters as nulls, while Spark on the GPU will parse the date with invalid trailing characters. More detail can be found at [parsing strings as dates or timestamps](compatibility.md#parsing-strings-as-dates-or-timestamps).|false
<a name="sql.incompatibleOps.enabled"></a>spark.rapids.sql.incompatibleOps.enabled|For operations that work, but are not 100% compatible with the Spark equivalent set if they should be enabled by default or disabled by default.|false
<a name="sql.metrics.level"></a>spark.rapids.sql.metrics.level|GPU plans can produce a lot more metrics than CPU plans do. In very large queries this can sometimes result in going over the max result size limit for the driver. Supported values include DEBUG which will enable all metrics supported and typically only needs to be enabled when debugging the plugin. MODERATE which should output enough metrics to understand how long each part of the query is taking and how much data is going to each part of the query. ESSENTIAL which disables most metrics except those Apache Spark CPU plans will also report or their equivalents.|MODERATE
<a name="sql.python.gpu.enabled"></a>spark.rapids.sql.python.gpu.enabled|This is an experimental feature and is likely to change in the future. Enable (true) or disable (false) support for scheduling Python Pandas UDFs with GPU resources. When enabled, pandas UDFs are assumed to share the same GPU that the RAPIDs accelerator uses and will honor the python GPU configs|false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -494,9 +494,13 @@ object RapidsConf {
.createWithDefault(false)

val INCOMPATIBLE_DATE_FORMATS = conf("spark.rapids.sql.incompatibleDateFormats.enabled")
.doc("When parsing strings as dates and timestamps in functions like unix_timestamp, " +
"setting this to true will force all parsing onto GPU even for formats that can " +
"result in incorrect results when parsing invalid inputs.")
.doc("When parsing strings as dates and timestamps in functions like unix_timestamp, some " +
"formats are fully supported on the GPU and some are unsupported and will fall back to " +
"the CPU. Some formats behave differently on the GPU than the CPU. Spark on the CPU " +
"interprets date formats with unsupported trailing characters as nulls, while Spark on " +
"the GPU will parse the date with invalid trailing characters. More detail can be found " +
"at [parsing strings as dates or timestamps]" +
"(compatibility.md#parsing-strings-as-dates-or-timestamps).")
.booleanConf
.createWithDefault(false)

Expand Down

0 comments on commit cc3be0f

Please sign in to comment.