Added in FAQ and fixed spelling (#378)

NVIDIA · Jul 16, 2020 · e408eb7 · e408eb7
1 parent 4fde6f4
commit e408eb7
Show file tree

Hide file tree

Showing 2 changed files with 21 additions and 2 deletions.
diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -0,0 +1,19 @@
+---
+layout: page
+title: Frequently Asked Questions
+nav_order: 8
+---
+# Frequently Asked Questions
+
+### Why does `explain()` show that the GPU will be used even after setting `spark.rapids.sql.enabled` to `false`?
+
+Apache Spark caches what is used to build the output of the `explain()` function. That cache has no
+knowledge about configs, so it may return results that are not up to date with the current config
+settings. This is true of all configs in Spark. If you changed
+`spark.sql.autoBroadcastJoinThreshold` after running `explain()` on a `DataFrame`, the resulting
+query would not change to reflect that config and still show a `SortMergeJoin` even though the
+new config might have changed to be a `BroadcastHashJoin` instead. When actually running something
+like with `collect`, `show` or `write` a new `DataFrame` is constructed causing spark to replan the
+query. This is why `spark.rapids.sql.enabled` is still respected when running, even if explain
+shows stale results.
+
diff --git a/docs/index.md b/docs/index.md
@@ -13,8 +13,8 @@ As data scientists shift from using traditional analytics to leveraging AI appli
 
 The RAPIDS Accelerator for Apache Spark combines the power of the <a href="https://github.com/rapidsai/cudf/">RAPIDS cuDF</a> library and the scale of the Spark distributed computing framework.  The RAPIDS Accelerator library also has a built-in accelerated shuffle based on <a href="https://github.com/openucx/ucx/">UCX</a> that can be configured to leverage GPU-to-GPU communication and RDMA capabilities. 
 
-## Perfomance & Cost Benefits
-Rapids Accelerator for Apache Spark reaps the benefit of GPU perfomance while saving infrastructure costs.
+## Performance & Cost Benefits
+Rapids Accelerator for Apache Spark reaps the benefit of GPU performance while saving infrastructure costs.
 ![Perf-cost](/img/Perf-cost.png)
 *ETL for FannieMae Mortgage Dataset (~200GB) as shown in our [demo](https://databricks.com/session_na20/deep-dive-into-gpu-support-in-apache-spark-3-x). Costs based on Cloud T4 GPU instance market price & V100 GPU price on Databricks Standard edition