-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dynamic Spark configuration for Databricks #2116
Changes from all commits
244ef65
4631bb9
0367a2b
0331f3a
1353df7
48f136d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,9 +15,10 @@ | |
# limitations under the License. | ||
# | ||
|
||
set -e | ||
set -ex | ||
|
||
LOCAL_JAR_PATH=$1 | ||
SPARK_CONF=$2 | ||
|
||
# tests | ||
export PATH=/databricks/conda/envs/databricks-ml-gpu/bin:/databricks/conda/condabin:$PATH | ||
|
@@ -38,21 +39,35 @@ CUDF_UDF_TEST_ARGS="--conf spark.python.daemon.module=rapids.daemon_databricks \ | |
--conf spark.rapids.python.memory.gpu.allocFraction=0.1 \ | ||
--conf spark.rapids.python.concurrentPythonWorkers=2" | ||
|
||
## 'spark.foo=1,spark.bar=2,...' to 'export PYSP_TEST_spark_foo=1 export PYSP_TEST_spark_bar=2' | ||
if [ -n "$SPARK_CONF" ]; then | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to translate The scripts |
||
CONF_LIST=${SPARK_CONF//','/' '} | ||
for CONF in ${CONF_LIST}; do | ||
KEY=${CONF%%=*} | ||
VALUE=${CONF#*=} | ||
## run_pyspark_from_build.sh requires 'export PYSP_TEST_spark_foo=1' as the spark configs | ||
export PYSP_TEST_${KEY//'.'/'_'}=$VALUE | ||
done | ||
|
||
## 'spark.foo=1,spark.bar=2,...' to '--conf spark.foo=1 --conf spark.bar=2 --conf ...' | ||
SPARK_CONF="--conf ${SPARK_CONF/','/' --conf '}" | ||
fi | ||
|
||
TEST_TYPE="nightly" | ||
if [ -d "$LOCAL_JAR_PATH" ]; then | ||
## Run tests with jars in the LOCAL_JAR_PATH dir downloading from the denpedency repo | ||
LOCAL_JAR_PATH=$LOCAL_JAR_PATH bash $LOCAL_JAR_PATH/integration_tests/run_pyspark_from_build.sh --runtime_env="databricks" --test_type=$TEST_TYPE | ||
|
||
## Run cudf-udf tests | ||
CUDF_UDF_TEST_ARGS="$CUDF_UDF_TEST_ARGS --conf spark.executorEnv.PYTHONPATH=`ls $LOCAL_JAR_PATH/rapids-4-spark_*.jar | grep -v 'tests.jar'`" | ||
LOCAL_JAR_PATH=$LOCAL_JAR_PATH SPARK_SUBMIT_FLAGS=$CUDF_UDF_TEST_ARGS TEST_PARALLEL=1 \ | ||
LOCAL_JAR_PATH=$LOCAL_JAR_PATH SPARK_SUBMIT_FLAGS="$SPARK_CONF $CUDF_UDF_TEST_ARGS" TEST_PARALLEL=1 \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. append the dynamic spark confs here |
||
bash $LOCAL_JAR_PATH/integration_tests/run_pyspark_from_build.sh --runtime_env="databricks" -m "cudf_udf" --cudf_udf --test_type=$TEST_TYPE | ||
else | ||
## Run tests with jars building from the spark-rapids source code | ||
bash /home/ubuntu/spark-rapids/integration_tests/run_pyspark_from_build.sh --runtime_env="databricks" --test_type=$TEST_TYPE | ||
|
||
## Run cudf-udf tests | ||
CUDF_UDF_TEST_ARGS="$CUDF_UDF_TEST_ARGS --conf spark.executorEnv.PYTHONPATH=`ls /home/ubuntu/spark-rapids/dist/target/rapids-4-spark_*.jar | grep -v 'tests.jar'`" | ||
SPARK_SUBMIT_FLAGS=$CUDF_UDF_TEST_ARGS TEST_PARALLEL=1 \ | ||
SPARK_SUBMIT_FLAGS="$SPARK_CONF $CUDF_UDF_TEST_ARGS" TEST_PARALLEL=1 \ | ||
bash /home/ubuntu/spark-rapids/integration_tests/run_pyspark_from_build.sh --runtime_env="databricks" -m "cudf_udf" --cudf_udf --test_type=$TEST_TYPE | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment to make the opt '-f' string format clear