Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update buildall script and the build doc [skip ci] #4029

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 18 additions & 14 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,31 +50,35 @@ You can find all available build versions in the top level pom.xml file. If you
for Databricks then you should use the `jenkins/databricks/build.sh` script and modify it for
the version you want.

To get an uber jar with more than 1 version you have to `mvn install` each version
and then use one of the defined profiles in the dist module. See the next section
for more details.
To get an uber jar with more than 1 version you have to `mvn package` each version
and then use one of the defined profiles in the dist module, or a comma-separated list of
build versions. See the next section for more details.

### Building a Distribution for Multiple Versions of Spark

By default the distribution jar only includes code for a single version of Spark. If you want
to create a jar with multiple versions we currently have 4 options.
to create a jar with multiple versions we have the following options.

1. Build for all Apache Spark versions and CDH with no SNAPSHOT versions of Spark, only released. Use `-PnoSnapshots`.
2. Build for all Apache Spark versions and CDH including SNAPSHOT versions of Spark we have supported for. Use `-Psnapshots`.
3. Build for all Apache Spark versions, CDH and Databricks with no SNAPSHOT versions of Spark, only released. Use `-PnoSnaphsotsWithDatabricks`.
4. Build for all Apache Spark versions, CDH and Databricks including SNAPSHOT versions of Spark we have supported for. Use `-PsnapshotsWithDatabricks`
5. Build for an arbitrary combination of comma-separated build versions using `-Dincluded_buildvers=<CSV list of build versions>`.
E.g., `-Dincluded_buildvers=312,330`

You must first build and install each of the versions of Spark and then build one final time using the profile for the option you want.
You must first build each of the versions of Spark and then build one final time using the profile for the option you want.

You can also install some manually and build a combined jar. For instance to build non-snapshot versions:

```shell script
mvn -Dbuildver=301 clean install -DskipTests
mvn -Dbuildver=302 clean install -Drat.skip=true -DskipTests
mvn -Dbuildver=303 clean install -Drat.skip=true -DskipTests
mvn -Dbuildver=311 clean install -Drat.skip=true -DskipTests
mvn -Dbuildver=312 clean install -Drat.skip=true -DskipTests
mvn -Dbuildver=311cdh clean install -Drat.skip=true -DskipTests
mvn clean
mvn -Dbuildver=301 package -DskipTests
mvn -Dbuildver=302 package -Drat.skip=true -DskipTests
mvn -Dbuildver=303 package -Drat.skip=true -DskipTests
mvn -Dbuildver=311 package -Drat.skip=true -DskipTests
mvn -Dbuildver=312 package -Drat.skip=true -DskipTests
mvn -Dbuildver=320 package -Drat.skip=true -DskipTests
mvn -Dbuildver=311cdh package -Drat.skip=true -DskipTests
mvn -pl dist -PnoSnapshots package -DskipTests
```
#### Building with buildall script
Expand All @@ -84,11 +88,11 @@ There is a build script `build/buildall` that automates the local build process.

By default, it builds everything that is needed to create a distribution jar for all released (noSnapshots) Spark versions except for Databricks. Other profiles that you can pass using `--profile=<distribution profile>` include
- `snapshots`
- `minimumFeatureVersionMix` that currently includes 302, 311cdh, 312, 320 is recommended for catching incompatibilites already in the local development cycle
- `minimumFeatureVersionMix` that currently includes 302, 311cdh, 312, 320 is recommended for catching incompatibilities already in the local development cycle

For initial quick iterations we can use `--profile=<buildver>` to build a single-shim version. e.g., `-Dbuildver=320` for Spark 3.2.0
For initial quick iterations we can use `--profile=<buildver>` to build a single-shim version. e.g., `--profile=301` for Spark 3.0.1.

The option `--module=<module>` allows to limit the number of build steps. When iterating, we often don't have the need the entire build. We may be interested in building everything necessary just to run integration tests (`--module=integration_tests`), or we may want to just rebuild the distribution jar (`--module=dist`)
The option `--module=<module>` allows to limit the number of build steps. When iterating, we often don't have the need for the entire build. We may be interested in building everything necessary just to run integration tests (`--module=integration_tests`), or we may want to just rebuild the distribution jar (`--module=dist`)

By default, `buildall` builds up to 4 shims in parallel using `xargs -P <n>`. This can be adjusted by
specifying the environment variable `BUILD_PARALLEL=<n>`.
Expand Down
32 changes: 18 additions & 14 deletions build/buildall
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,19 @@ while [[ "$1" != "" ]] ; do
function print_usage() {
echo "Usage: buildall [OPTION]"
echo "Options:"
echo " --profile=DIST_PROFILE profile for the dist module, default: noSnapshots,"
echo " also supported: snapshots, minimumFeatureVersionMix, snapshotsWithDataBricks,"
echo " and noSnapshotsWithDataBricks. NOTE: the Databricks-related spark3XYdb shims"
echo " are not built locally, the jars are fetched prebuilt from a remote Maven repo."
echo " --profile=BUILDVER builds a single Spark shim, e.g. --profile=320"
echo " --module=MODULE after finishing parallel builds, resume from dist and build"
echo " up to and including module MODULE, e.g., integration_tests"
echo " --parallel=N build in parallel, N is passed via -P to xargs. NOTE: standard maven does not support"
echo " concurrent installs. If you see an error related to a maven-metadata-local.xml in"
echo " your local m2 cache, delete its parent directory and retry."
echo " -h, --help"
echo " print this help message"
echo " -p=DIST_PROFILE, --profile=DIST_PROFILE"
echo " use this profile for the dist module, default: noSnapshots, also supported: snapshots, minimumFeatureVersionMix,"
echo " snapshotsWithDatabricks, and noSnapshotsWithDatabricks. NOTE: the Databricks-related spark3XYdb shims"
echo " are not built locally, the jars are fetched prebuilt from a remote Maven repo."
echo " You can also supply a comma-separated list of build versions. E.g., --profile=320,330 will build only"
echo " the distribution jar only for 3.2.0 and 3.3.0"
echo " -m=MODULE, --module=MODULE"
echo " after finishing parallel builds, resume from dist and build up to and including module MODULE."
echo " E.g., --module=integration_tests"
echo " -P=N, --parallel=N"
echo " Build in parallel, N (4 by default) is passed via -P to xargs"
}

case "$1" in
Expand Down Expand Up @@ -115,7 +118,8 @@ case $DIST_PROFILE in
;;

3*)
SPARK_SHIM_VERSIONS=($DIST_PROFILE)
<<< $DIST_PROFILE IFS="," read -ra SPARK_SHIM_VERSIONS
INCLUDED_BUILDVERS_OPT="-Dincluded_buildvers=$DIST_PROFILE"
unset DIST_PROFILE
;;

Expand Down Expand Up @@ -147,7 +151,7 @@ build_single_shim() {
SKIP_CHECKS=$( [[ "$BUILD_VER" == "$BASE_VER" ]] && echo false || echo true )

echo "#### REDIRECTING mvn output to $LOG_FILE ####"
mvn -U install \
mvn -U package \
-DskipTests \
-Dbuildver="$BUILD_VER" \
-Drat.skip="$SKIP_CHECKS" \
Expand Down Expand Up @@ -178,7 +182,7 @@ time (
xargs -t -I% -P "$BUILD_PARALLEL" -n 1 \
bash -c 'build_single_shim "$@"' _ %
echo "Resuming from dist build only using $BASE_VER"
mvn install --resume-from dist $MODULE_OPT $MVN_PROFILE_OPT \
mvn package --resume-from dist $MODULE_OPT $MVN_PROFILE_OPT $INCLUDED_BUILDVERS_OPT \
-Dbuildver="$BASE_VER" \
-DskipTests
-DskipTests -Dskip -Dmaven.javadoc.skip
)