From 2b71126eed02085357a272be6ace39dc231c81da Mon Sep 17 00:00:00 2001 From: Liangcai Li Date: Tue, 6 Apr 2021 11:04:01 +0800 Subject: [PATCH] Update the doc for pandas udf on databricks (#2025) Update the doc for pandas udf on databricks Signed-off-by: Firestarman --- docs/get-started/getting-started-databricks.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/docs/get-started/getting-started-databricks.md b/docs/get-started/getting-started-databricks.md index cc8a168fe99..2bb9028639e 100644 --- a/docs/get-started/getting-started-databricks.md +++ b/docs/get-started/getting-started-databricks.md @@ -85,6 +85,22 @@ cluster. ![Spark Config](../img/Databricks/sparkconfig.png) + If running Pandas UDFs with GPU support from the plugin, at least three additional options + as below are required. The `spark.python.daemon.module` option is to choose the right daemon module + of python for Databricks. On Databricks, the python runtime requires different parameters than the + Spark one, so a dedicated python deamon module `rapids.daemon_databricks` is created and should + be specified here. Set the config + [`spark.rapids.sql.python.gpu.enabled`](../configs.md#sql.python.gpu.enabled) to `true` to + enable GPU support for python. Add the path of the plugin jar (supposing it is placed under + `/databricks/jars/`) to the `spark.executorEnv.PYTHONPATH` option. For more details please go to + [**GPU Scheduling For Pandas UDF**](../additional-functionality/rapids-udfs.md#gpu-scheduling-for-pandas-udf) + + ```bash + spark.rapids.sql.python.gpu.enabled true + spark.python.daemon.module rapids.daemon_databricks + spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-0.4.1.jar:/databricks/spark/python + ``` + 7. Once you’ve added the Spark config, click “Confirm and Restart”. 8. Once the cluster comes back up, it is now enabled for GPU-accelerated Spark with RAPIDS and cuDF.