-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed #15659
Changes from 55 commits
7763f3c
30debc7
5155531
2f0bf9b
4c00b98
7ff8d0f
610b975
cb2e06d
01f791d
e2e4d1c
fb15d7e
aab7ee4
5a57620
646aa23
36c9d45
a78754b
955e92b
2d88a40
9e5c532
be7eadd
07d3849
11b5fa8
8791f82
92837a3
6947a85
944160c
5bf0746
435f842
27ca27e
df126cf
70a78a0
555d443
051abe5
b345bdb
28da44b
0f16c08
574c1f0
6299744
17104c1
0447ea2
c335c80
146567b
0e2223d
849ded0
cf5ab7e
4b69871
3788bfb
308a168
74b79c4
3056553
125ae2a
d2da8b0
595409f
cf421b0
31ac8e2
0e9cb8d
264b253
802f682
fba37a0
8ba499f
1c177f3
6ace070
ab8ca53
77f8eca
f590898
f956a5d
489d4e3
9e4fdb5
e668af6
3bf961e
c9d48d3
e9f1e8e
1cdcf61
7af912a
c77d9fd
7b1d8b7
298bda6
9770260
99940ee
f6806b2
b0cd655
6bb422e
b5b4713
2b808dc
b958f7e
577554b
9cf2ec9
154a287
b478bdf
fb62a8a
6540964
d2389ed
48cd1ad
23109a4
49fc6db
7001f90
8d74672
210c9d4
9efca67
fd3e89c
587c0eb
2904998
3345eb9
f86574a
05fc25f
dd243a2
df5a3f9
d753d80
e139855
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
#!/usr/bin/env bash | ||
|
||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
# Attempts to find a proper value for SPARK_HOME. Should be included using "source" directive. | ||
|
||
FIND_SPARK_HOME_PYTHON_SCRIPT="$(cd "`dirname "$0"`"; pwd)/find_spark_home.py" | ||
|
||
# Short cirtuit if the user already has this set. | ||
if [ ! -z "${SPARK_HOME}" ]; then | ||
exit 0 | ||
elif [ ! -f $FIND_SPARK_HOME_PYTHON_SCRIPT ]; then | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I'll just paste
Some of these aren't super-important, but the word-splitting ones are. |
||
# If we are not in the same directory as find_spark_home.py we are not pip installed so we don't | ||
# need to search the different Python directories for a Spark installation. | ||
# Note only that, if the user has pip installed PySpark but is directly calling pyspark-shell or | ||
# spark-submit in another directory we want to use that version of PySpark rather than the | ||
# pip installed version of PySpark. | ||
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)" | ||
else | ||
# We are pip installed, use the Python script to resolve a reasonable SPARK_HOME | ||
# Default to standard python interpreter unless told otherwise | ||
if [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then | ||
PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"python"}" | ||
fi | ||
export SPARK_HOME=`$PYSPARK_DRIVER_PYTHON $FIND_SPARK_HOME_PYTHON_SCRIPT` | ||
fi |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,7 @@ | |
|
||
# Figure out where Spark is installed | ||
if [ -z "${SPARK_HOME}" ]; then | ||
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)" | ||
source `dirname $0`/find-spark-home | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's also apply the same fix for Shellcheck complaints here and at all other occurrences of this line. |
||
fi | ||
|
||
if [ -z "$SPARK_ENV_LOADED" ]; then | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,7 +18,7 @@ | |
# | ||
|
||
if [ -z "${SPARK_HOME}" ]; then | ||
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)" | ||
source `dirname $0`/find-spark-home | ||
fi | ||
|
||
. "${SPARK_HOME}"/bin/load-spark-env.sh | ||
|
@@ -36,7 +36,7 @@ else | |
fi | ||
|
||
# Find Spark jars. | ||
if [ -f "${SPARK_HOME}/RELEASE" ]; then | ||
if [ -d "${SPARK_HOME}/jars" ]; then | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did this get changed from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because both pip installed PySpark and RELEASE Spark have jars in the jars directory, it seems more reasonable to just check if the jars directory exists directly rather than checking for a file which indicates that the JARs directory is present. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. This seems reasonable to me. |
||
SPARK_JARS_DIR="${SPARK_HOME}/jars" | ||
else | ||
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,7 +29,7 @@ esac | |
set -o posix | ||
|
||
if [ -z "${SPARK_HOME}" ]; then | ||
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)" | ||
source `dirname $0`/find-spark-home | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Many files are changed with this. Do those scripts need to find SPARK_HOME with pip installed Spark? I assume they should be shipped with Spark release? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So for pip installed PySpark we need to do a bit more work to find the SPARK_HOME, using a common script to determine if we need to do more hunting rather than duplicating the logic. |
||
fi | ||
|
||
export _SPARK_CMD_USAGE="Usage: ./bin/spark-shell [options]" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -162,14 +162,35 @@ if [[ "$1" == "package" ]]; then | |
export ZINC_PORT=$ZINC_PORT | ||
echo "Creating distribution: $NAME ($FLAGS)" | ||
|
||
# Write out the NAME and VERSION to PySpark version info we rewrite the - into a . and SNAPSHOT | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to have a version string that's slightly different from the "original", just for Python? I'm thinking about what will happen if people, for example, want to do the same for R. Having 3 slightly different ways of showing the version string seems unnecessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup we do, otherwise we can't publish on PyPI which is the end goal. |
||
# to dev0 to be closer to PEP440. We use the NAME as a "local version". | ||
PYSPARK_VERSION=`echo "$SPARK_VERSION+$NAME" | sed -r "s/-/./" | sed -r "s/SNAPSHOT/dev0/"` | ||
echo "__version__='$PYSPARK_VERSION'" > python/pyspark/version.py | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Looks like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need it if the user runs setup.py sdist on their on own and as part of the packaging tests during any Python change in Jenkins. If the only way to build a pip installable package was make_release then yes - but that isn't the case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we can, I'd like to consolidate this logic into the |
||
|
||
# Get maven home set by MVN | ||
MVN_HOME=`$MVN -version 2>&1 | grep 'Maven home' | awk '{print $NF}'` | ||
|
||
echo "Creating distribution" | ||
./dev/make-distribution.sh --name $NAME --mvn $MVN_HOME/bin/mvn --tgz $FLAGS \ | ||
-DzincPort=$ZINC_PORT 2>&1 > ../binary-release-$NAME.log | ||
cd .. | ||
cp spark-$SPARK_VERSION-bin-$NAME/spark-$SPARK_VERSION-bin-$NAME.tgz . | ||
|
||
echo "Copying and signing python distribution" | ||
PYTHON_DIST_NAME=pyspark-$PYSPARK_VERSION.tar.gz | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without specifying format, seems the output distribution will be zip file on windows? Because in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think packaging on windows can be considered a future todo There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also realistically I don't think packaging on windows is super supported right now given how we are doing it with shell scripts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, actually I have this question is because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The goal is to eventually support building sdists on Windows - but I think porting the entire release process to windows is out of scope. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We shouldn't port the release bash scripts to Windows. It's going to be a huge pain with little obvious benefit. Windows users who want to make release builds can just run this version of the script in a |
||
cp spark-$SPARK_VERSION-bin-$NAME/python/dist/$PYTHON_DIST_NAME . | ||
|
||
echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --armour \ | ||
--output $PYTHON_DIST_NAME.asc \ | ||
--detach-sig $PYTHON_DIST_NAME | ||
echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md \ | ||
MD5 $PYTHON_DIST_NAME.gz > \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have a wrongly appended |
||
$PYTHON_DIST_NAME.md5 | ||
echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md \ | ||
SHA512 $PYTHON_DIST_NAME > \ | ||
$PYTHON_DIST_NAME.sha | ||
|
||
echo "Copying and signing regular binary distribution" | ||
cp spark-$SPARK_VERSION-bin-$NAME/spark-$SPARK_VERSION-bin-$NAME.tgz . | ||
echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --armour \ | ||
--output spark-$SPARK_VERSION-bin-$NAME.tgz.asc \ | ||
--detach-sig spark-$SPARK_VERSION-bin-$NAME.tgz | ||
|
@@ -187,10 +208,10 @@ if [[ "$1" == "package" ]]; then | |
# We increment the Zinc port each time to avoid OOM's and other craziness if multiple builds | ||
# share the same Zinc server. | ||
FLAGS="-Psparkr -Phive -Phive-thriftserver -Pyarn -Pmesos" | ||
make_binary_release "hadoop2.3" "-Phadoop2.3 $FLAGS" "3033" & | ||
make_binary_release "hadoop2.4" "-Phadoop2.4 $FLAGS" "3034" & | ||
make_binary_release "hadoop2.6" "-Phadoop2.6 $FLAGS" "3035" & | ||
make_binary_release "hadoop2.7" "-Phadoop2.7 $FLAGS" "3036" & | ||
make_binary_release "hadoop2.3" "-Phadoop-2.3 $FLAGS" "3033" & | ||
make_binary_release "hadoop2.4" "-Phadoop-2.4 $FLAGS" "3034" & | ||
make_binary_release "hadoop2.6" "-Phadoop-2.6 $FLAGS" "3035" & | ||
make_binary_release "hadoop2.7" "-Phadoop-2.7 $FLAGS" "3036" & | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Looks like these changes are not related and can be separated to another small pr? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the packaging doesn't seem to currently build without this - and I wanted to test the packaging as part of this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Curious why this has not been found before... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yah I don't know if it's just I have a slightly older maven or some weird plugin. I can take a look some more at this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is a new issue which was introduced in https://github.com/apache/spark/pull/14637/files#diff-01ca42240614718522afde4d4885b40dL189. I'd be in favor of fixing this separately. Do you mind splitting this change into a separate small PR which I'll merge right away? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done - #15860 |
||
make_binary_release "hadoop2.4-without-hive" "-Psparkr -Phadoop-2.4 -Pyarn -Pmesos" "3037" & | ||
make_binary_release "without-hadoop" "-Psparkr -Phadoop-provided -Pyarn -Pmesos" "3038" & | ||
wait | ||
|
@@ -208,6 +229,7 @@ if [[ "$1" == "package" ]]; then | |
# Re-upload a second time and leave the files in the timestamped upload directory: | ||
LFTP mkdir -p $dest_dir | ||
LFTP mput -O $dest_dir 'spark-*' | ||
LFTP mput -O $dest_dir 'pyspark-*' | ||
exit 0 | ||
fi | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -201,6 +201,12 @@ fi | |
# Copy data files | ||
cp -r "$SPARK_HOME/data" "$DISTDIR" | ||
|
||
# Make pip package | ||
echo "Building python distribution package" | ||
cd $SPARK_HOME/python | ||
python setup.py sdist | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a parameter so we only build this if the parameter is given? I think we may not need to build this by default. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Almost everything else is built by default (except for maven profiles and the big tgz file). I don't think adding a parameter would really improve it (although if on some machines people have difficulty or its slow we can revist - but so far building the Python distribution is very fast). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we be sure that the machine building distribution always has python installed? If the users doesn't think about to use pyspark, original procedure of making distribution doesn't ask them to do anything with pyspark. But with this change, it seems making assumption all users want to have python distribution. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Speed might not be a problem as you said. My concern is this step might need the users to meet some requirements they don't expect and don't want to use, e.g., some python modules. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think if a user is publishing a release it's reasonable to assume they have Python and at least pandas and numpy. But if this isn't the case working around it with a environment variable same as TGZ sounds like a good suggestions. |
||
cd .. | ||
|
||
# Copy other things | ||
mkdir "$DISTDIR"/conf | ||
cp "$SPARK_HOME"/conf/*.template "$DISTDIR"/conf | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
from __future__ import print_function | ||
|
||
from pyspark.sql import SparkSession | ||
import sys | ||
|
||
if __name__ == "__main__": | ||
spark = SparkSession\ | ||
.builder\ | ||
.appName("PipSanityCheck")\ | ||
.getOrCreate() | ||
sc = spark.sparkContext | ||
rdd = sc.parallelize(range(100), 10) | ||
value = rdd.reduce(lambda x, y: x + y) | ||
if (value != 4950): | ||
print("Value %d did not match expected value." % value, file=sys.stderr) | ||
sys.exit(-1) | ||
print("Successfuly ran pip sanity check") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Successfully. |
||
|
||
spark.stop() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
#!/usr/bin/env bash | ||
|
||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
|
||
FWDIR="$(cd "`dirname $0`"/..; pwd)" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
cd "$FWDIR" | ||
|
||
# Run the tests, we wrap the underlying test script for cleanup and because early exit | ||
# doesn't always properly exit a virtualenv. | ||
$FWDIR/dev/run-pip-tests-2 | ||
export success=$? | ||
|
||
# Clean up the virtual env enviroment used if we created one. | ||
if [ -f ./virtual_env_tmp_dir ]; then | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that you could combine both this and the
and putting that at the top of the script before you actually create the temporary directory / virtualenv. |
||
rm -rf `cat ./virtual_env_temp_dir` | ||
rm ./virtaul_env_tmp_dir | ||
fi | ||
|
||
exit $success |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
#!/usr/bin/env bash | ||
|
||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
# Stop on error | ||
set -e | ||
# Set nullglob for when we are checking existence based on globs | ||
shopt -s nullglob | ||
|
||
FWDIR="$(cd "`dirname $0`"/..; pwd)" | ||
cd "$FWDIR" | ||
# Some systems don't have pip or virtualenv - in those cases our tests won't work. | ||
if ! hash virtualenv 2>/dev/null; then | ||
echo "Missing virtualenv skipping pip installability tests." | ||
exit 0 | ||
fi | ||
if ! hash pip 2>/dev/null; then | ||
echo "Missing pip, skipping pip installability tests." | ||
exit 0 | ||
fi | ||
|
||
if [ -d ~/.cache/pip/wheels/ ]; then | ||
echo "Cleaning up pip wheel cache so we install the fresh package" | ||
rm -rf ~/.cache/pip/wheels/ | ||
fi | ||
|
||
# Create a temp directory for us to work in and save its name to a file for cleanup | ||
echo "Constucting virtual env for testing" | ||
mktemp -d > ./virtual_env_temp_dir | ||
VIRTUALENV_BASE=`cat ./virtual_env_temp_dir` | ||
echo "Using $VIRTUALENV_BASE for virtualenv" | ||
virtualenv $VIRTUALENV_BASE | ||
source $VIRTUALENV_BASE/bin/activate | ||
# Upgrade pip | ||
pip install --upgrade pip | ||
|
||
echo "Creating pip installable source dist" | ||
cd python | ||
python setup.py sdist | ||
|
||
|
||
echo "Installing dist into virtual env" | ||
cd dist | ||
# Verify that the dist directory only contains one thing to install | ||
sdists=(*.tar.gz) | ||
if [ ${#sdists[@]} -ne 1 ]; then | ||
echo "Unexpected number of targets found in dist directory - please cleanup existing sdists first." | ||
exit -1 | ||
fi | ||
# Do the actual installation | ||
pip install --upgrade --force-reinstall *.tar.gz | ||
|
||
cd / | ||
|
||
echo "Run basic sanity check on pip installed version with spark-submit" | ||
spark-submit $FWDIR/dev/pip-sanity-check.py | ||
echo "Run basic sanity check with import based" | ||
python $FWDIR/dev/pip-sanity-check.py | ||
echo "Run the tests for context.py" | ||
python $FWDIR/python/pyspark/context.py | ||
|
||
exit 0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -432,6 +432,12 @@ def run_python_tests(test_modules, parallelism): | |
run_cmd(command) | ||
|
||
|
||
def run_python_packaging_tests(): | ||
set_title_and_block("Running PySpark packaging tests", "BLOCK_PYSPARK_PIP_TESTS") | ||
command = [os.path.join(SPARK_HOME, "dev", "./dev/run-pip-tests")] | ||
run_cmd(command) | ||
|
||
|
||
def run_build_tests(): | ||
set_title_and_block("Running build tests", "BLOCK_BUILD_TESTS") | ||
run_cmd([os.path.join(SPARK_HOME, "dev", "test-dependencies.sh")]) | ||
|
@@ -583,6 +589,7 @@ def main(): | |
modules_with_python_tests = [m for m in test_modules if m.python_test_goals] | ||
if modules_with_python_tests: | ||
run_python_tests(modules_with_python_tests, opts.parallelism) | ||
run_python_packaging_tests() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to run pip packaging test every time? Would be better if we can choose to run it or not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's really hard to determine if the Python change is one thats going to break packaging - and the packaging tests are really quite fast. I think for now erring on the side of testing slightly more often than we need is the best course of action. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would +1 on this given the logic in setup.py that should be checked There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 as well; this seems cheap to run and it's better to err on the side of running things more often. |
||
if any(m.should_run_r_tests for m in test_modules): | ||
run_sparkr_tests() | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with making that change - just point out that we were previously using backtick execution.