Databricks Build Support #221

tgravescs · 2020-06-19T02:36:28Z

This is to support building on Databricks. We will setup a jenkins job to run via CI. The general flow is:

CI is based on a static Databricks cluster
Update maven version to 0.1-databricks-SNAPSHOT
apply the patch in jenkins/databricks/dbimports.patch to handle some classes moving. Note this patch file is temporary until we can add in better support for different distributions.
tar up the spark-rapids source
start the databricks cluster and copy things over
Build and run tests on the databricks cluster
tar up source code and copy back to jenkins box
deploy rapids spark databricks jar
shutdown the cluster

Note this definitely needs more error handling in run-tests.py. Thing were working (ie failing properly and quitting) when I ran myself but in jenkins nodes its not. I need to investigate that more and fix. I will likely switch away from os.system. Note that when tests fail it does not fail the build right now. There are a few tests failing that I have filed issues for.

20:47:53 = 9 failed, 1690 passed, 41 skipped, 28 xfailed, 1 xpassed, 2 warnings in 2415.43s (0:40:15) =

I would like to do that in a followup PR. These scripts successfully passed in the dev CI environment.

DeserializeToObjectExec change was because on Databricks it has an extra parameter so I changed it around a bit to not rely on the number of parameters.

Note we also need to push a new docker image and then update to use that image instead of building everytime. That can be done as followup as well once the dockerfile is committed.

releases have an extra parameter to it

tgravescs · 2020-06-19T02:36:47Z

build

revans2

It looks good. I would love to see a follow on issue to add a command line option to the pyspark tests so the tests know what environment they are running in. This will help us to make tests as xfail if they on fail in one environment but not others. If you think that is good I'll file it.

tgravescs · 2020-06-19T12:58:24Z

yeah that is a good idea.

tgravescs · 2020-06-19T13:15:36Z

follow on issue filed to improve error handling and such: #224

jlowe

Looks OK other than there's a number of places we need to remember to update when we move to the next version. I'm OK with filing an issue to track that as a followup.

jlowe · 2020-06-19T13:13:22Z

jenkins/Jenkinsfile.databricksnightly

+            steps {
+                script {
+                    sshagent(credentials : ['svcngcc_pubpriv']) {
+                        sh "mvn versions:set -DnewVersion=0.1-databricks-SNAPSHOT && git clean -d -f"


Should this be a jenkins variable or somehow computed from the version set in the pom? Otherwise we need to manually remember to update it when we move versions.

yep, I agree, that is in the followup issue to make things more parameterized. I can change this here if it makes releasing easier.

I'm OK as long as whoever is doing the release knows to update these three places. Assuming we're doing the release as a PR against master, I think we can manually check it for this first release before merging to master and fix it in the followup #224

jlowe · 2020-06-19T13:14:08Z

jenkins/databricks/deploy.sh

+echo "Maven mirror is $MVN_URM_MIRROR"
+SERVER_ID='snapshots'
+SERVER_URL='https://urm.nvidia.com:443/artifactory/sw-spark-maven-local'
+FPATH=./dist/target/rapids-4-spark_2.12-0.1-databricks-SNAPSHOT.jar


Same hardcoded version issue here.

jlowe · 2020-06-19T13:14:30Z

jenkins/databricks/build.sh

+mvn -Pdatabricks clean verify -DskipTests
+
+# copy so we pick up new built jar
+sudo cp dist/target/rapids-4-spark_2.12-*-SNAPSHOT.jar /databricks/jars/rapids-4-spark_2.12-0.1-SNAPSHOT-ci.jar


Same hardcoded version issue here.

* Add a profile to build for Databricks * move dependency to management section for databricks annoation * Databricks build and deploy scripts * Change the way we match DeserializeToObjectExec because some Spark releases have an extra parameter to it * Update description on jenkins file * cleanup copyrights * remove extra databricks exclude * more cleanup Co-authored-by: Thomas Graves <[email protected]>

Signed-off-by: spark-rapids automation <[email protected]>

tgravescs added 8 commits June 18, 2020 21:05

Add a profile to build for Databricks

4f961b2

move dependency to management section for databricks annoation

3874aa5

Databricks build and deploy scripts

4fb2cda

Change the way we match DeserializeToObjectExec because some Spark

d3f73a1

releases have an extra parameter to it

Update description on jenkins file

e6e14e3

cleanup copyrights

320ff12

remove extra databricks exclude

b171ce5

more cleanup

2db2618

revans2 approved these changes Jun 19, 2020

View reviewed changes

jlowe added the build Related to CI / CD or cleanly building label Jun 19, 2020

jlowe added this to the Jun 8 - Jun 19 milestone Jun 19, 2020

jlowe approved these changes Jun 19, 2020

View reviewed changes

tgravescs mentioned this pull request Jun 19, 2020

[FEA] Enhance databricks CI support #224

Closed

tgravescs merged commit 3c77cc0 into NVIDIA:branch-0.1 Jun 19, 2020

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023

Update submodule cudf to 0d11591 (NVIDIA#221)

93103c6

Signed-off-by: spark-rapids automation <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Databricks Build Support #221

Databricks Build Support #221

tgravescs commented Jun 19, 2020

tgravescs commented Jun 19, 2020

revans2 left a comment

tgravescs commented Jun 19, 2020

tgravescs commented Jun 19, 2020

jlowe left a comment

jlowe Jun 19, 2020

tgravescs Jun 19, 2020

jlowe Jun 19, 2020

jlowe Jun 19, 2020

jlowe Jun 19, 2020

Databricks Build Support #221

Databricks Build Support #221

Conversation

tgravescs commented Jun 19, 2020

tgravescs commented Jun 19, 2020

revans2 left a comment

Choose a reason for hiding this comment

tgravescs commented Jun 19, 2020

tgravescs commented Jun 19, 2020

jlowe left a comment

Choose a reason for hiding this comment

jlowe Jun 19, 2020

Choose a reason for hiding this comment

tgravescs Jun 19, 2020

Choose a reason for hiding this comment

jlowe Jun 19, 2020

Choose a reason for hiding this comment

jlowe Jun 19, 2020

Choose a reason for hiding this comment

jlowe Jun 19, 2020

Choose a reason for hiding this comment