Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement scripts for binary release build #932

Merged
merged 10 commits into from
Sep 19, 2024
24 changes: 24 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,30 @@ format:
./mvnw compile test-compile scalafix:scalafix -Psemanticdb $(PROFILES)
./mvnw spotless:apply $(PROFILES)

# build native libs for arm64 architecture Linux/MacOS on a Linux/arm64 machine/container
core-arm64-libs:
# if the environment variable HAS_OSXCROSS is defined
ifdef $(HAS_OSXCROSS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need MacOS X SDK installed for HAS_OSXCROSS case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is a placeholder for future work to enable MacOS. MacOS Sdk has to be provided to the Docker file as input and the build-release-comet script will copy it into the release builder's Docker image.
I removed the option because the build did not succeed but left the work so we can fix this later. I can remove it if it makes things clearer.

cd native && cargo zigbuild -j 1 --target aarch64-apple-darwin --release
endif
cd native && cargo build -j 2 --release
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for MacOSX build, we need to run both cargo zigbuild and cargo build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. This was a mistake. I experimented with zigbuild for macos. Removed


# build native libs for amd64 architecture Linux/MacOS on a Linux/amd64 machine/container
core-amd64-libs:
cd native && cargo build -j 2 --release
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to specify target for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will build the binary for the same architecture as the machine. So no need to specify target.

ifdef HAS_OSXCROSS
rustup target add x86_64-apple-darwin
cd native && cargo build -j 2 --target x86_64-apple-darwin --release
endif
Comment on lines +52 to +55
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as the L51 is not in an else block, if HAS_OSXCROSS is true, we will build the library for x86_64-apple-darwin additionally? I.e., two libraries for core-amd64-libs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For the amd64 architecture, one for linux and one for MacOS


# build native libs for arm64 architecture Linux/MacOS on a Linux/arm64 machine/container
core-arm64-libs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two core-arm64-libs targets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

cd native && cargo build -j 2 --release
ifdef HAS_OSXCROSS
rustup target add aarch64-apple-darwin
cd native && cargo build -j 2 --target aarch64-apple-darwin --release
endif

core-amd64:
rustup target add x86_64-apple-darwin
cd native && RUSTFLAGS="-Ctarget-cpu=skylake -Ctarget-feature=-prefer-256-bit" CC=o64-clang CXX=o64-clang++ cargo build --target x86_64-apple-darwin --release
Expand Down
53 changes: 53 additions & 0 deletions dev/release/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,50 @@ python3 generate-changelog.py 0.0.0 HEAD 0.1.0 > ../changelog/0.1.0.md
Create a PR against the _main_ branch to add this change log and once this is approved and merged, cherry-pick the
commit into the release branch.

### Build the jars

#### Setup to do the build
The build process requires Docker. Download the latest Docker Desktop from https://www.docker.com/products/docker-desktop/.
If you have multiple docker contexts running switch to the context of the Docker Desktop. For example -

```shell
$ docker context ls
NAME DESCRIPTION DOCKER ENDPOINT ERROR
default Current DOCKER_HOST based configuration unix:///var/run/docker.sock
desktop-linux Docker Desktop unix:///Users/parth/.docker/run/docker.sock
my_custom_context * tcp://192.168.64.2:2376

$ docker context use desktop-linux
```
#### Run the build script
The `build-release-comet.sh` script will create a docker image for each architecture and use the image
to build the platform specific binaries. These builder images are created every time this script is run.
The script optionally allows overriding of the repository and branch to build the binaries from (Note that
the local git repo is not used in the building of the binaries, but it is used to build the final uber jar).

```shell
Usage: build-release-comet.sh [options]

This script builds comet native binaries inside a docker image. The image is named
"comet-rm" and will be generated by this script

Options are:

-r [repo] : git repo (default: https://github.com/apache/datafusion-comet.git)
-b [branch] : git branch (default: release)
-t [tag] : tag for the spark-rm docker image to use for building (default: "latest").
```

Example:

```shell
cd dev/release && ./build-release-comet.sh && cd ../..
```

#### Build output
The build output is installed to a temporary local maven repository. The build script will print the name of the repository
location at the end. This location will be required at the time of deploying the artifacts to a staging repository

### Tag the Release Candidate

Tag the release branch with `0.1.0-rc1` and push to the `apache` repo
Expand All @@ -105,6 +149,15 @@ Run the create-tarball script on the release candidate tag (`0.1.0-rc1`) to crea
GH_TOKEN=<TOKEN> ./dev/release/create-tarball.sh 0.1.0 1
```

### Publish the maven artifacts
#### Setup maven
##### One time project setup
Setting up your project in the ASF Nexus Repository from here: https://infra.apache.org/publishing-maven-artifacts.html
##### Release Manager Setup
Set up your development environment from here: https://infra.apache.org/publishing-maven-artifacts.html

TODO: build and publish a release candidate to nexus.

### Start an Email Voting Thread

Send the email that is generated in the previous step to `[email protected]`.
Expand Down
202 changes: 202 additions & 0 deletions dev/release/build-release-comet.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

andygrove marked this conversation as resolved.
Show resolved Hide resolved
set -e

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null && pwd)"
COMET_HOME_DIR=$SCRIPT_DIR/../..

function usage {
local NAME=$(basename $0)
cat <<EOF
Usage: $NAME [options]

This script builds comet native binaries inside a docker image. The image is named
"comet-rm" and will be generated by this script

Options are:

-r [repo] : git repo (default: ${REPO})
-b [branch] : git branch (default: ${BRANCH})
-t [tag] : tag for the spark-rm docker image to use for building (default: "latest").
EOF
exit 1
}

function cleanup()
{
if [ $CLEANUP != 0 ]
then
echo Cleaning up ...
if [ "$(docker ps -a | grep comet-arm64-builder-container)" != "" ]
then
docker rm comet-arm64-builder-container
fi
if [ "$(docker ps -a | grep comet-amd64-builder-container)" != "" ]
then
docker rm comet-amd64-builder-container
fi
CLEANUP=0
fi
}

trap cleanup SIGINT SIGTERM EXIT

CLEANUP=1

REPO="https://github.com/apache/datafusion-comet.git"
BRANCH="release"
andygrove marked this conversation as resolved.
Show resolved Hide resolved
MACOS_SDK=
HAS_MACOS_SDK="false"
IMGTAG=latest

while getopts "b:hr:t:" opt; do
case $opt in
r) REPO="$OPTARG";;
b) BRANCH="$OPTARG";;
t) IMGTAG="$OPTARG" ;;
h) usage ;;
\?) error "Invalid option. Run with -h for help." ;;
esac
done

echo "Building binaries from $REPO/$BRANCH"

WORKING_DIR="$SCRIPT_DIR/comet-rm/workdir"
cp $SCRIPT_DIR/../cargo.config $WORKING_DIR

# TODO: Search for Xcode (Once building macos binaries works)
#PS3="Select Xcode:"
#select xcode_path in `find . -name "${MACOS_SDK}"`
#do
# echo "found Xcode in $xcode_path"
# cp $xcode_path $WORKING_DIR
# break
#done

if [ -f "${WORKING_DIR}/${MACOS_SDK}" ]
then
HAS_MACOS_SDK="true"
fi

BUILDER_IMAGE_ARM64="comet-rm-arm64:$IMGTAG"
BUILDER_IMAGE_AMD64="comet-rm-amd64:$IMGTAG"

# Build the docker image in which we will do the build
docker build \
--platform=linux/arm64 \
-t "$BUILDER_IMAGE_ARM64" \
--build-arg HAS_MACOS_SDK=${HAS_MACOS_SDK} \
--build-arg MACOS_SDK=${MACOS_SDK} \
"$SCRIPT_DIR/comet-rm"

docker build \
--platform=linux/amd64 \
-t "$BUILDER_IMAGE_AMD64" \
--build-arg HAS_MACOS_SDK=${HAS_MACOS_SDK} \
--build-arg MACOS_SDK=${MACOS_SDK} \
"$SCRIPT_DIR/comet-rm"

# Clean previous Java build
pushd $COMET_HOME_DIR && ./mvnw clean && popd

# Run the builder container for each architecture. The entrypoint script will build the binaries

# AMD64
echo "Building amd64 binary"
docker run \
--name comet-amd64-builder-container \
--memory 24g \
--cpus 6 \
-it \
--platform linux/amd64 \
$BUILDER_IMAGE_AMD64 "${REPO}" "${BRANCH}" amd64

if [ $? != 0 ]
then
echo "Building amd64 binary failed."
exit 1
fi

# ARM64
echo "Building arm64 binary"
docker run \
--name comet-arm64-builder-container \
--memory 24g \
--cpus 6 \
-it \
--platform linux/arm64 \
$BUILDER_IMAGE_ARM64 "${REPO}" "${BRANCH}" arm64

if [ $? != 0 ]
then
echo "Building arm64 binary failed."
exit 1
fi

echo "Building binaries completed"
echo "Copying to java build directories"

JVM_TARGET_DIR=$COMET_HOME_DIR/common/target/classes/org/apache/comet
mkdir -p $JVM_TARGET_DIR

mkdir -p $JVM_TARGET_DIR/linux/amd64
docker cp \
comet-amd64-builder-container:"/opt/comet-rm/comet/native/target/release/libcomet.so" \
$JVM_TARGET_DIR/linux/amd64/

if [ "$HAS_MACOS_SDK" == "true" ]
then
mkdir -p $JVM_TARGET_DIR/darwin/x86_64
docker cp \
comet-amd64-builder-container:"/opt/comet-rm/comet/native/target/x86_64-apple-darwin/release/libcomet.dylib" \
$JVM_TARGET_DIR/darwin/x86_64/
fi

mkdir -p $JVM_TARGET_DIR/linux/aarch64
docker cp \
comet-arm64-builder-container:"/opt/comet-rm/comet/native/target/release/libcomet.so" \
$JVM_TARGET_DIR/linux/aarch64/

if [ "$HAS_MACOS_SDK" == "true" ]
then
mkdir -p $JVM_TARGET_DIR/linux/aarch64
docker cp \
comet-arm64-builder-container:"/opt/comet-rm/comet/native/target/aarch64-apple-darwin/release/libcomet.dylib" \
$JVM_TARGET_DIR/darwin/aarch64/
fi

# Build final jar
echo "Building uber jar and publishing it locally"
pushd $COMET_HOME_DIR

GIT_HASH=$(git rev-parse --short HEAD)
LOCAL_REPO=$(mktemp -d /tmp/comet-staging-repo-XXXXX)

./mvnw "-Dmaven.repo.local=${LOCAL_REPO}" -P spark-3.4 -P scala-2.12 -DskipTests install
./mvnw "-Dmaven.repo.local=${LOCAL_REPO}" -P spark-3.4 -P scala-2.13 -DskipTests install
./mvnw "-Dmaven.repo.local=${LOCAL_REPO}" -P spark-3.3 -P scala-2.12 -DskipTests install
./mvnw "-Dmaven.repo.local=${LOCAL_REPO}" -P spark-3.3 -P scala-2.13 -DskipTests install
./mvnw "-Dmaven.repo.local=${LOCAL_REPO}" -P spark-3.5 -P scala-2.12 -DskipTests install
./mvnw "-Dmaven.repo.local=${LOCAL_REPO}" -P spark-3.5 -P scala-2.13 -DskipTests install

echo "Installed to local repo: ${LOCAL_REPO}"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to remove the created docker image/container after installation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container is removed in the cleanup part of the script which is invoked on exit or error.

popd
91 changes: 91 additions & 0 deletions dev/release/comet-rm/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
ARG HAS_MACOS_SDK="false"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://hub.docker.com/r/messense/cargo-zigbuild claims they have MacOS X SDK pre-installed in their docker image. Can we reuse it to use MacOS X SDK for Comet build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not try the docker image from zigbuild (yet). I will try it and if it works, then we can remove the HAS_OSXCROSS portions entirely.
Follow up issue: #947

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the zigbuild docker image and the build failed. I'll investigate the failure in the followup.


FROM ubuntu:20.04 AS base

USER root

# For apt to be noninteractive
ENV DEBIAN_FRONTEND=noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN=true

ENV LC_ALL=C
# Install pr-requisites for rust
RUN export LC_ALL=C \
&& apt-get update \
&& apt-get install --no-install-recommends -y \
ca-certificates \
build-essential \
curl \
wget \
git \
llvm \
clang \
libssl-dev \
lzma-dev \
liblzma-dev \
openssh-client \
cmake \
cpio \
libxml2-dev \
patch \
bzip2 \
libbz2-dev \
zlib1g-dev


# Install rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"
RUN cargo install cargo2junit

# Stage to add OSXCross if MacOSSDK is provided
FROM base AS with-macos-sdk-true
ARG MACOS_SDK

COPY workdir/$MACOS_SDK /opt/xcode/

RUN if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
rustup target add aarch64-apple-darwin; \
elif [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
rustup target add x86_64-apple-darwin; \
fi

# Build OSXCross
RUN cd /opt && git clone --depth 1 https://github.com/tpoechtrager/osxcross.git \
&& cd /opt/osxcross \
&& ./tools/gen_sdk_package_pbzx.sh /opt/xcode/${MACOS_SDK} \
&& cd .. \
&& cp /opt/osxcross/*.tar.xz tarballs \
&& UNATTENDED=1 ./build.sh
ENV PATH="/opt/osxcross/target/bin:${PATH}"
# Use osxcross toolchain for cargo
COPY workdir/cargo.config /root/.cargo/config
ENV HAS_OSXCROSS="true"

# Placeholder Stage if MacOSSDK is not provided
FROM base AS with-macos-sdk-false
RUN echo "Building without MacOS"


FROM with-macos-sdk-${HAS_MACOS_SDK} AS final

COPY build-comet-native-libs.sh /opt/comet-rm/build-comet-native-libs.sh
WORKDIR /opt/comet-rm

ENTRYPOINT [ "/opt/comet-rm/build-comet-native-libs.sh"]
Loading
Loading