Skip to content

Commit

Permalink
Update docs for Databricks 8.2 ML (NVIDIA#2631)
Browse files Browse the repository at this point in the history
* Update docs for Databricks 8.2 ML

Signed-off-by: Sameer Raheja <[email protected]>

* Mention that databricks will do updates that may impact the plugin

Signed-off-by: Sameer Raheja <[email protected]>

* Point to init scripts for 7.3 and 8.2.

Signed-off-by: Sameer Raheja <[email protected]>
  • Loading branch information
sameerz authored Jun 9, 2021
1 parent 0f491b9 commit 03377a2
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 16 deletions.
2 changes: 1 addition & 1 deletion docs/demo/Databricks/generate-init-script.ipynb
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.06.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.06.0/rapids-4-spark_2.12-21.06.0.jar\nsudo wget -O /databricks/jars/cudf-0.19.2-cuda10-1.jar https://repo1.maven.org/maven2/ai/rapids/cudf/0.19.2/cudf-0.19.2-cuda10-1.jar\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
{"cells":[{"cell_type":"code","source":["dbutils.fs.mkdirs(\"dbfs:/databricks/init_scripts/\")\n \ndbutils.fs.put(\"/databricks/init_scripts/init.sh\",\"\"\"\n#!/bin/bash\nsudo wget -O /databricks/jars/rapids-4-spark_2.12-21.06.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/21.06.0/rapids-4-spark_2.12-21.06.0.jar\nsudo wget -O /databricks/jars/cudf-21.06-cuda11.jar https://repo1.maven.org/maven2/ai/rapids/cudf/21.06.0/cudf-21.06.0-cuda11.jar\"\"\", True)"],"metadata":{},"outputs":[],"execution_count":1},{"cell_type":"code","source":["%sh\ncd ../../dbfs/databricks/init_scripts\npwd\nls -ltr\ncat init.sh"],"metadata":{},"outputs":[],"execution_count":2},{"cell_type":"code","source":[""],"metadata":{},"outputs":[],"execution_count":3}],"metadata":{"name":"generate-init-script","notebookId":2645746662301564},"nbformat":4,"nbformat_minor":0}
42 changes: 27 additions & 15 deletions docs/get-started/getting-started-databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,18 @@ parent: Getting-Started
---

# Getting started with RAPIDS Accelerator on Databricks
This guide will run through how to set up the RAPIDS Accelerator for Apache Spark 3.0 on Databricks.
This guide will run through how to set up the RAPIDS Accelerator for Apache Spark 3.x on Databricks.
At the end of this guide, the reader will be able to run a sample Apache Spark application that runs
on NVIDIA GPUs on Databricks.

## Prerequisites
* Apache Spark 3.0 running in DataBricks Runtime 7.3 ML with GPU
* AWS: 7.3 LTS ML (GPU, Scala 2.12, Spark 3.0.1)
* Azure: 7.3 LTS ML (GPU, Scala 2.12, Spark 3.0.1)
* Apache Spark 3.x running in Databricks Runtime 7.3 ML or 8.2 ML with GPU
* AWS: 7.3 LTS ML (GPU, Scala 2.12, Spark 3.0.1) or 8.2 ML (GPU, Scala 2.12, Spark 3.1.1)
* Azure: 7.3 LTS ML (GPU, Scala 2.12, Spark 3.0.1) or 8.2 ML (GPU, Scala 2.12, Spark 3.1.1)

[Databricks 7.3 LTS
ML](https://docs.databricks.com/release-notes/runtime/7.3ml.html#system-environment) runs CUDA 10.1
Update 2, and the initialization scripts will install the appropriate cudf version to match.
Databricks may do [maintenance
releases](https://docs.databricks.com/release-notes/runtime/maintenance-updates.html) for their
runtimes which may impact the behavior of the plugin.

The number of GPUs per node dictates the number of Spark executors that can run in that node.

Expand All @@ -31,7 +31,7 @@ cluster meets the prerequisites above by configuring it as follows:
4. Select a worker type. On AWS, use nodes with 1 GPU each such as `p3.2xlarge` or `g4dn.xlarge`.
p2 nodes do not meet the architecture requirements (Pascal or higher) for the Spark worker
(although they can be used for the driver node). For Azure, choose GPU nodes such as
Standard_NC6s_v3.
Standard_NC6s_v3. For GCP, choose N1 or A2 instance types with GPUs.
5. Select the driver type. Generally this can be set to be the same as the worker.
6. Start the cluster.

Expand All @@ -40,11 +40,23 @@ cluster meets the prerequisites above by configuring it as follows:
We will need to create an initialization script for the cluster that installs the RAPIDS jars to the
cluster.

1. To create the initialization script, import the initialization script notebook from the repo
[generate-init-script.ipynb](../demo/Databricks/generate-init-script.ipynb) to your
workspace. See [Managing
Notebooks](https://docs.databricks.com/notebooks/notebooks-manage.html#id2) on how to import a
notebook, then open the notebook.
1. To create the initialization script, import the initialization script notebook from the repo to
your workspace. See [Managing
Notebooks](https://docs.databricks.com/notebooks/notebooks-manage.html#id2) for instructions on
how to import a notebook.
Select the initialization script based on the Databricks runtime
version:
- [Databricks 7.3 LTS
ML](https://docs.databricks.com/release-notes/runtime/7.3ml.html#system-environment) runs CUDA 10.1
Update 2. Users wishing to try 21.06 on Databricks 7.3 LTS ML will need to install the CUDA
11.0 toolkit on the cluster. This can be done with the [generate-init-script-cuda11.ipynb
](../demo/Databricks/generate-init-script-cuda11.ipynb) init script, which installs both the RAPIDS
Spark plugin and the CUDA 11 toolkit.
- [Databricks 8.2
ML](https://docs.databricks.com/release-notes/runtime/8.2ml.html#system-environment) has CUDA 11
installed. In this case use
[generate-init-script.ipynb](../demo/Databricks/generate-init-script.ipynb) which will install
the RAPIDS Spark plugin.
2. Once you are in the notebook, click the “Run All” button.
3. Ensure that the newly created init.sh script is present in the output from cell 2 and that the
contents of the script are correct.
Expand Down Expand Up @@ -93,12 +105,12 @@ cluster.
[`spark.rapids.sql.python.gpu.enabled`](../configs.md#sql.python.gpu.enabled) to `true` to
enable GPU support for python. Add the path of the plugin jar (supposing it is placed under
`/databricks/jars/`) to the `spark.executorEnv.PYTHONPATH` option. For more details please go to
[**GPU Scheduling For Pandas UDF**](../additional-functionality/rapids-udfs.md#gpu-scheduling-for-pandas-udf)
[GPU Scheduling For Pandas UDF](../additional-functionality/rapids-udfs.md#gpu-scheduling-for-pandas-udf)

```bash
spark.rapids.sql.python.gpu.enabled true
spark.python.daemon.module rapids.daemon_databricks
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-0.4.1.jar:/databricks/spark/python
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-21.06.0.jar:/databricks/spark/python
```

7. Once you’ve added the Spark config, click “Confirm and Restart”.
Expand Down

0 comments on commit 03377a2

Please sign in to comment.