Skip to content

Commit

Permalink
Added in FAQ and fixed spelling (#378)
Browse files Browse the repository at this point in the history
  • Loading branch information
revans2 authored Jul 16, 2020
1 parent 4fde6f4 commit e408eb7
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 2 deletions.
19 changes: 19 additions & 0 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
layout: page
title: Frequently Asked Questions
nav_order: 8
---
# Frequently Asked Questions

### Why does `explain()` show that the GPU will be used even after setting `spark.rapids.sql.enabled` to `false`?

Apache Spark caches what is used to build the output of the `explain()` function. That cache has no
knowledge about configs, so it may return results that are not up to date with the current config
settings. This is true of all configs in Spark. If you changed
`spark.sql.autoBroadcastJoinThreshold` after running `explain()` on a `DataFrame`, the resulting
query would not change to reflect that config and still show a `SortMergeJoin` even though the
new config might have changed to be a `BroadcastHashJoin` instead. When actually running something
like with `collect`, `show` or `write` a new `DataFrame` is constructed causing spark to replan the
query. This is why `spark.rapids.sql.enabled` is still respected when running, even if explain
shows stale results.

4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ As data scientists shift from using traditional analytics to leveraging AI appli

The RAPIDS Accelerator for Apache Spark combines the power of the <a href="https://github.com/rapidsai/cudf/">RAPIDS cuDF</a> library and the scale of the Spark distributed computing framework. The RAPIDS Accelerator library also has a built-in accelerated shuffle based on <a href="https://github.com/openucx/ucx/">UCX</a> that can be configured to leverage GPU-to-GPU communication and RDMA capabilities.

## Perfomance & Cost Benefits
Rapids Accelerator for Apache Spark reaps the benefit of GPU perfomance while saving infrastructure costs.
## Performance & Cost Benefits
Rapids Accelerator for Apache Spark reaps the benefit of GPU performance while saving infrastructure costs.
![Perf-cost](/img/Perf-cost.png)
*ETL for FannieMae Mortgage Dataset (~200GB) as shown in our [demo](https://databricks.com/session_na20/deep-dive-into-gpu-support-in-apache-spark-3-x). Costs based on Cloud T4 GPU instance market price & V100 GPU price on Databricks Standard edition

Expand Down

0 comments on commit e408eb7

Please sign in to comment.