-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Databricks Build Support #221
Conversation
releases have an extra parameter to it
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good. I would love to see a follow on issue to add a command line option to the pyspark tests so the tests know what environment they are running in. This will help us to make tests as xfail if they on fail in one environment but not others. If you think that is good I'll file it.
yeah that is a good idea. |
follow on issue filed to improve error handling and such: #224 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK other than there's a number of places we need to remember to update when we move to the next version. I'm OK with filing an issue to track that as a followup.
steps { | ||
script { | ||
sshagent(credentials : ['svcngcc_pubpriv']) { | ||
sh "mvn versions:set -DnewVersion=0.1-databricks-SNAPSHOT && git clean -d -f" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a jenkins variable or somehow computed from the version set in the pom? Otherwise we need to manually remember to update it when we move versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, I agree, that is in the followup issue to make things more parameterized. I can change this here if it makes releasing easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK as long as whoever is doing the release knows to update these three places. Assuming we're doing the release as a PR against master, I think we can manually check it for this first release before merging to master and fix it in the followup #224
echo "Maven mirror is $MVN_URM_MIRROR" | ||
SERVER_ID='snapshots' | ||
SERVER_URL='https://urm.nvidia.com:443/artifactory/sw-spark-maven-local' | ||
FPATH=./dist/target/rapids-4-spark_2.12-0.1-databricks-SNAPSHOT.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same hardcoded version issue here.
mvn -Pdatabricks clean verify -DskipTests | ||
|
||
# copy so we pick up new built jar | ||
sudo cp dist/target/rapids-4-spark_2.12-*-SNAPSHOT.jar /databricks/jars/rapids-4-spark_2.12-0.1-SNAPSHOT-ci.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same hardcoded version issue here.
* Add a profile to build for Databricks * move dependency to management section for databricks annoation * Databricks build and deploy scripts * Change the way we match DeserializeToObjectExec because some Spark releases have an extra parameter to it * Update description on jenkins file * cleanup copyrights * remove extra databricks exclude * more cleanup Co-authored-by: Thomas Graves <[email protected]>
* Add a profile to build for Databricks * move dependency to management section for databricks annoation * Databricks build and deploy scripts * Change the way we match DeserializeToObjectExec because some Spark releases have an extra parameter to it * Update description on jenkins file * cleanup copyrights * remove extra databricks exclude * more cleanup Co-authored-by: Thomas Graves <[email protected]>
Signed-off-by: spark-rapids automation <[email protected]>
This is to support building on Databricks. We will setup a jenkins job to run via CI. The general flow is:
Note this definitely needs more error handling in run-tests.py. Thing were working (ie failing properly and quitting) when I ran myself but in jenkins nodes its not. I need to investigate that more and fix. I will likely switch away from os.system. Note that when tests fail it does not fail the build right now. There are a few tests failing that I have filed issues for.
20:47:53 = 9 failed, 1690 passed, 41 skipped, 28 xfailed, 1 xpassed, 2 warnings in 2415.43s (0:40:15) =
I would like to do that in a followup PR. These scripts successfully passed in the dev CI environment.
DeserializeToObjectExec change was because on Databricks it has an extra parameter so I changed it around a bit to not rely on the number of parameters.
Note we also need to push a new docker image and then update to use that image instead of building everytime. That can be done as followup as well once the dockerfile is committed.