Skip to content

Commit

Permalink
[CI] Use precise terminology for image components
Browse files Browse the repository at this point in the history
  • Loading branch information
hcho3 committed Jan 11, 2025
1 parent b4a7cd1 commit cc839cc
Show file tree
Hide file tree
Showing 18 changed files with 79 additions and 78 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/jvm_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -247,10 +247,10 @@ jobs:
matrix:
variant:
- name: cpu
container_id: xgb-ci.jvm
image_repo: xgb-ci.jvm
artifact_from: build-test-jvm-packages
- name: gpu
container_id: xgb-ci.jvm_gpu_build
image_repo: xgb-ci.jvm_gpu_build
artifact_from: build-jvm-gpu
scala_version: ['2.12', '2.13']
steps:
Expand All @@ -272,4 +272,4 @@ jobs:
- name: Deploy JVM packages to S3
run: |
bash ops/pipeline/deploy-jvm-packages.sh ${{ matrix.variant.name }} \
${{ matrix.variant.container_id }} ${{ matrix.scala_version }}
${{ matrix.variant.image_repo }} ${{ matrix.scala_version }}
39 changes: 20 additions & 19 deletions doc/contrib/ci.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ To make changes to the CI container, carry out the following steps:
the proposed changes to the Dockerfile. Make note of the pull request number. Example: ``#204``
5. Clone `dmlc/xgboost <https://github.com/dmlc/xgboost>`_ and update all references to the
old container to point to the new container. More specifically, all Docker tags of format
``492475357299.dkr.ecr.us-west-2.amazonaws.com/[container_id]:main`` should have the last
``492475357299.dkr.ecr.us-west-2.amazonaws.com/[image_repo]:main`` should have the last
component replaced with ``PR-#``, where ``#`` is the pull request number. For the example above,
we'd replace ``492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main`` with
``492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:PR-204``.
Expand Down Expand Up @@ -83,11 +83,12 @@ and invoke ``containers/docker_build.sh`` as follows:
# For local testing, set them to "main"
export GITHUB_SHA="main"
export BRANCH_NAME="main"
bash containers/docker_build.sh CONTAINER_ID
bash containers/docker_build.sh IMAGE_REPO
where ``CONTAINER_ID`` identifies for the container. The wrapper script will look up the YAML file
``containers/ci_container.yml``. For example, when ``CONTAINER_ID`` is set to ``xgb-ci.gpu``,
the script will use the corresponding entry from ``containers/ci_container.yml``:
where ``IMAGE_REPO`` is the name of the container image. The wrapper script will look up the
YAML file ``containers/ci_container.yml``. For example, when ``IMAGE_REPO`` is set to
``xgb-ci.gpu``, the script will use the corresponding entry from
``containers/ci_container.yml``:

.. code-block:: yaml
Expand All @@ -114,7 +115,7 @@ the build arguments are:
The build arguments provide inputs to the ``ARG`` instructions in the Dockerfile.

When ``containers/docker_build.sh`` completes, you will have access to the container with tag
``492475357299.dkr.ecr.us-west-2.amazonaws.com/[container_id]:main``. The prefix
``492475357299.dkr.ecr.us-west-2.amazonaws.com/[image_repo]:main``. The prefix
``492475357299.dkr.ecr.us-west-2.amazonaws.com/`` was added so that the container could
later be uploaded to AWS Elastic Container Registry (ECR), a private Docker registry.

Expand All @@ -126,7 +127,7 @@ Invoke ``ops/docker_run.py`` from the main ``dmlc/xgboost`` repo as follows:
.. code-block:: bash
python3 ops/docker_run.py \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/[container_id]:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/[image_repo]:[image_tag] \
[--use-gpus] \
-- "command to run inside the container"
Expand All @@ -138,12 +139,12 @@ For example:
# Run without GPU
python3 ops/docker_run.py \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \
-- bash ops/pipeline/build-cpu-impl.sh cpu
# Run with NVIDIA GPU
python3 ops/docker_run.py \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--use-gpus \
-- bash ops/pipeline/test-python-wheel-impl.sh gpu
Expand All @@ -154,7 +155,7 @@ Optionally, you can specify ``--run-args`` to pass extra arguments to ``docker r
# Allocate extra space in /dev/shm to enable NCCL
# Also run the container with elevated privileges
python3 ops/docker_run.py \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--use-gpus \
--run-args='--shm-size=4g --privileged' \
-- bash ops/pipeline/test-python-wheel-impl.sh gpu
Expand All @@ -171,7 +172,7 @@ Examples: useful tasks for local development
export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com
python3 ops/docker_run.py \
--container-tag ${DOCKER_REGISTRY}/xgb-ci.gpu_build_rockylinux8:main \
--image-uri ${DOCKER_REGISTRY}/xgb-ci.gpu_build_rockylinux8:main \
-- ops/pipeline/build-cuda-impl.sh
* Run Python tests
Expand All @@ -180,7 +181,7 @@ Examples: useful tasks for local development
export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com
python3 ops/docker_run.py \
--container-tag ${DOCKER_REGISTRY}/xgb-ci.cpu:main \
--image-uri ${DOCKER_REGISTRY}/xgb-ci.cpu:main \
-- ops/pipeline/test-python-wheel-impl.sh cpu
* Run Python tests with GPU algorithm
Expand All @@ -189,7 +190,7 @@ Examples: useful tasks for local development
export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com
python3 ops/docker_run.py \
--container-tag ${DOCKER_REGISTRY}/xgb-ci.gpu:main \
--image-uri ${DOCKER_REGISTRY}/xgb-ci.gpu:main \
--use-gpus \
-- ops/pipeline/test-python-wheel-impl.sh gpu
Expand All @@ -199,7 +200,7 @@ Examples: useful tasks for local development
export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com
python3 ops/docker_run.py \
--container-tag ${DOCKER_REGISTRY}/xgb-ci.gpu:main \
--image-uri ${DOCKER_REGISTRY}/xgb-ci.gpu:main \
--use-gpus \
--run-args='--shm-size=4g' \
-- ops/pipeline/test-python-wheel-impl.sh mgpu
Expand All @@ -212,7 +213,7 @@ Examples: useful tasks for local development
export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com
export SCALA_VERSION=2.12 # Specify Scala version (2.12 or 2.13)
python3 ops/docker_run.py \
--container-tag ${DOCKER_REGISTRY}/xgb-ci.jvm:main \
--image-uri ${DOCKER_REGISTRY}/xgb-ci.jvm:main \
--run-args "-e SCALA_VERSION" \
-- ops/pipeline/build-test-jvm-packages-impl.sh
Expand All @@ -224,7 +225,7 @@ Examples: useful tasks for local development
export SCALA_VERSION=2.12 # Specify Scala version (2.12 or 2.13)
export USE_CUDA=1
python3 ops/docker_run.py \
--container-tag ${DOCKER_REGISTRY}/xgb-ci.jvm_gpu_build:main \
--image-uri ${DOCKER_REGISTRY}/xgb-ci.jvm_gpu_build:main \
--use-gpus \
--run-args "-e SCALA_VERSION -e USE_CUDA --shm-size=4g" \
-- ops/pipeline/build-test-jvm-packages-impl.sh
Expand Down Expand Up @@ -456,7 +457,7 @@ For example, when you run ``bash containers/docker_build.sh xgb-ci.gpu``, the lo
# docker_build.sh calls docker_build.py...
python3 containers/docker_build.py --container-def gpu \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--build-arg CUDA_VERSION_ARG=12.4.1 --build-arg NCCL_VERSION_ARG=2.23.4-1 \
--build-arg RAPIDS_VERSION_ARG=24.10
Expand All @@ -480,14 +481,14 @@ Here is an example with ``docker_run.py``:
# Run without GPU
python3 ops/docker_run.py \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \
-- bash ops/pipeline/build-cpu-impl.sh cpu
# Run with NVIDIA GPU
# Allocate extra space in /dev/shm to enable NCCL
# Also run the container with elevated privileges
python3 ops/docker_run.py \
--container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--image-uri 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \
--use-gpus \
--run-args='--shm-size=4g --privileged' \
-- bash ops/pipeline/test-python-wheel-impl.sh gpu
Expand Down
12 changes: 6 additions & 6 deletions ops/docker_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def fancy_print_cli_args(*, cli_args: list[str]) -> None:

def docker_run(
*,
container_tag: str,
image_uri: str,
command_args: list[str],
use_gpus: bool,
workdir: pathlib.Path,
Expand All @@ -71,7 +71,7 @@ def docker_run(
itertools.chain.from_iterable([["-e", f"{k}={v}"] for k, v in user_ids.items()])
)
docker_run_cli_args.extend(extra_args)
docker_run_cli_args.append(container_tag)
docker_run_cli_args.append(image_uri)
docker_run_cli_args.extend(command_args)

cli_args = ["docker", "run"] + docker_run_cli_args
Expand All @@ -90,7 +90,7 @@ def main(*, args: argparse.Namespace) -> None:
run_args.append("-it")

docker_run(
container_tag=args.container_tag,
image_uri=args.image_uri,
command_args=args.command_args,
use_gpus=args.use_gpus,
workdir=args.workdir,
Expand All @@ -102,18 +102,18 @@ def main(*, args: argparse.Namespace) -> None:
if __name__ == "__main__":
parser = argparse.ArgumentParser(
usage=(
f"{sys.argv[0]} --container-tag CONTAINER_TAG [--use-gpus] [--interactive] "
f"{sys.argv[0]} --image-uri IMAGE_URI [--use-gpus] [--interactive] "
"[--workdir WORKDIR] [--run-args RUN_ARGS] -- COMMAND_ARG "
"[COMMAND_ARG ...]"
),
description="Run tasks inside a Docker container",
)
parser.add_argument(
"--container-tag",
"--image-uri",
type=str,
required=True,
help=(
"Container tag to identify the container, e.g. "
"Fully qualified image URI to identify the container, e.g. "
"492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main"
),
)
Expand Down
6 changes: 3 additions & 3 deletions ops/pipeline/build-cpu-arm64.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,17 @@ source ops/pipeline/classify-git-branch.sh
source ops/pipeline/get-docker-registry-details.sh

WHEEL_TAG=manylinux_2_28_aarch64
CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.aarch64:main
IMAGE_URI=${DOCKER_REGISTRY_URL}/xgb-ci.aarch64:main

echo "--- Build CPU code targeting ARM64"
set -x
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
-- ops/pipeline/build-cpu-arm64-impl.sh

echo "--- Audit binary wheel to ensure it's compliant with ${WHEEL_TAG} standard"
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
-- auditwheel repair --only-plat \
--plat ${WHEEL_TAG} python-package/dist/*.whl
python3 -m wheel tags --python-tag py3 --abi-tag none --platform ${WHEEL_TAG} --remove \
Expand Down
6 changes: 3 additions & 3 deletions ops/pipeline/build-cpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ set -euo pipefail
source ops/pipeline/classify-git-branch.sh
source ops/pipeline/get-docker-registry-details.sh

CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.cpu:main
IMAGE_URI=${DOCKER_REGISTRY_URL}/xgb-ci.cpu:main

echo "--- Build CPU code"
set -x
Expand All @@ -24,13 +24,13 @@ export UBSAN_OPTIONS='print_stacktrace=1:log_path=ubsan_error.log'
# Work around https://github.com/google/sanitizers/issues/1614
sudo sysctl vm.mmap_rnd_bits=28
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
--run-args '-e ASAN_SYMBOLIZER_PATH -e ASAN_OPTIONS -e UBSAN_OPTIONS
--cap-add SYS_PTRACE' \
-- bash ops/pipeline/build-cpu-impl.sh cpu-sanitizer

# Test without sanitizer
rm -rf build/
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
-- bash ops/pipeline/build-cpu-impl.sh cpu
12 changes: 6 additions & 6 deletions ops/pipeline/build-cuda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ fi

if [[ "$#" -lt 2 ]]
then
echo "Usage: $0 [container_id] {enable-rmm,disable-rmm}"
echo "Usage: $0 [image_repo] {enable-rmm,disable-rmm}"
exit 2
fi
container_id="$1"
image_repo="$1"
rmm_flag="$2"

# Validate RMM flag
Expand All @@ -35,8 +35,8 @@ source ops/pipeline/classify-git-branch.sh
source ops/pipeline/get-docker-registry-details.sh

WHEEL_TAG=manylinux_2_28_x86_64
BUILD_CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:main"
MANYLINUX_CONTAINER_TAG="${DOCKER_REGISTRY_URL}/xgb-ci.${WHEEL_TAG}:main"
BUILD_IMAGE_URI="${DOCKER_REGISTRY_URL}/${image_repo}:main"
MANYLINUX_IMAGE_URI="${DOCKER_REGISTRY_URL}/xgb-ci.${WHEEL_TAG}:main"

echo "--- Build with CUDA"

Expand All @@ -57,13 +57,13 @@ fi
set -x

python3 ops/docker_run.py \
--container-tag ${BUILD_CONTAINER_TAG} \
--image-uri ${BUILD_IMAGE_URI} \
--run-args='-e BUILD_ONLY_SM75 -e USE_RMM' \
-- ops/pipeline/build-cuda-impl.sh

echo "--- Audit binary wheel to ensure it's compliant with ${WHEEL_TAG} standard"
python3 ops/docker_run.py \
--container-tag ${MANYLINUX_CONTAINER_TAG} \
--image-uri ${MANYLINUX_IMAGE_URI} \
-- auditwheel repair --only-plat \
--plat ${WHEEL_TAG} python-package/dist/*.whl
python3 -m wheel tags --python-tag py3 --abi-tag none --platform ${WHEEL_TAG} --remove \
Expand Down
4 changes: 2 additions & 2 deletions ops/pipeline/build-gpu-rpkg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ fi
source ops/pipeline/classify-git-branch.sh
source ops/pipeline/get-docker-registry-details.sh

CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.gpu_build_r_rockylinux8:main
IMAGE_URI=${DOCKER_REGISTRY_URL}/xgb-ci.gpu_build_r_rockylinux8:main

echo "--- Build XGBoost R package with CUDA"
set -x
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
-- ops/pipeline/build-gpu-rpkg-impl.sh \
${GITHUB_SHA}

Expand Down
4 changes: 2 additions & 2 deletions ops/pipeline/build-jvm-doc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ fi

source ops/pipeline/get-docker-registry-details.sh

CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.jvm_gpu_build:main
IMAGE_URI=${DOCKER_REGISTRY_URL}/xgb-ci.jvm_gpu_build:main

echo "--- Build JVM packages doc"
set -x
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
-- ops/pipeline/build-jvm-doc-impl.sh ${BRANCH_NAME}
4 changes: 2 additions & 2 deletions ops/pipeline/build-jvm-gpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ set -euo pipefail
source ops/pipeline/classify-git-branch.sh
source ops/pipeline/get-docker-registry-details.sh

CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.jvm_gpu_build:main
IMAGE_URI=${DOCKER_REGISTRY_URL}/xgb-ci.jvm_gpu_build:main

echo "--- Build libxgboost4j.so with CUDA"

Expand All @@ -32,5 +32,5 @@ mkdir -p build-gpu/
# TODO(hcho3): Remove this once new CUDA version ships with CCCL 2.6.0+
git clone https://github.com/NVIDIA/cccl.git -b v2.6.1 --quiet --depth 1
python3 ops/docker_run.py \
--container-tag ${CONTAINER_TAG} \
--image-uri ${IMAGE_URI} \
-- bash -c "${COMMAND}"
6 changes: 3 additions & 3 deletions ops/pipeline/build-jvm-manylinux2014.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,19 @@ then
fi

arch=$1
container_id="xgb-ci.manylinux2014_${arch}"
image_repo="xgb-ci.manylinux2014_${arch}"

source ops/pipeline/classify-git-branch.sh
source ops/pipeline/get-docker-registry-details.sh

CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:main"
IMAGE_URI="${DOCKER_REGISTRY_URL}/${image_repo}:main"

# Build XGBoost4J binary
echo "--- Build libxgboost4j.so (targeting glibc 2.17)"
set -x
mkdir build
python3 ops/docker_run.py \
--container-tag "${CONTAINER_TAG}" \
--image-uri "${IMAGE_URI}" \
-- bash -c \
"cd build && cmake .. -DJVM_BINDINGS=ON -DUSE_OPENMP=ON && make -j$(nproc)"
ldd lib/libxgboost4j.so
Expand Down
Loading

0 comments on commit cc839cc

Please sign in to comment.