Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Github actions CI for tests #1

Open
wants to merge 108 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
c7e89d9
add Github action CI
danielfrg Nov 8, 2024
6172407
remove runtime on GH Actions
danielfrg Nov 8, 2024
7045eda
remove runtime on GH Actions
danielfrg Nov 8, 2024
095ebd8
Fix types
danielfrg Nov 8, 2024
751029d
fix unbound var
danielfrg Nov 8, 2024
38afefc
fix unbound var
danielfrg Nov 8, 2024
289b518
fix types
danielfrg Nov 8, 2024
5af42f8
Add concurrency
danielfrg Nov 8, 2024
dc844a8
fix types
danielfrg Nov 8, 2024
cebecbf
fix types
danielfrg Nov 8, 2024
1e3431b
ignore typechecks
danielfrg Nov 8, 2024
39950b7
Add print of driver
danielfrg Nov 8, 2024
c1f2243
Add print of driver
danielfrg Nov 8, 2024
da2b91f
update driver script
danielfrg Nov 8, 2024
1a751b5
test
danielfrg Nov 8, 2024
6fece36
install cuda-drivers
danielfrg Nov 8, 2024
be6bf05
install cuda-drivers
danielfrg Nov 8, 2024
8457bf9
ignore stuff
danielfrg Nov 8, 2024
884b49b
sudo
danielfrg Nov 8, 2024
e7c8747
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
8d4a326
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
a7e2873
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
f218f90
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
13cdd73
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
a83c6d3
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
ce254e2
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
71f9c81
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
913332b
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
e4f02de
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
49ae877
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
e557540
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
9e38cd6
run all tests
danielfrg Nov 8, 2024
e3f77eb
run only core tests
danielfrg Nov 10, 2024
a9b6408
run only core tests
danielfrg Nov 10, 2024
6bb2cf0
Run all mini tests for linux
danielfrg Nov 12, 2024
c6a55b3
fail fast false
danielfrg Nov 12, 2024
ad34d3f
fail fast false
danielfrg Nov 12, 2024
40214b3
remove cuda-python test (fails)
danielfrg Nov 12, 2024
605854f
Add extra vars
danielfrg Nov 13, 2024
4daebe0
Add extra vars
danielfrg Nov 13, 2024
14db2e3
Add extra vars
danielfrg Nov 13, 2024
a597df8
Add cache logic
danielfrg Nov 13, 2024
c2e49d2
test :)
danielfrg Nov 13, 2024
788bf65
fix dir
danielfrg Nov 13, 2024
86edb0e
Merge branch 'main' into gha-ci
danielfrg Nov 14, 2024
8cae992
Merge branch 'main' into gha-ci
danielfrg Nov 15, 2024
e7ecd5a
Remove cuda-example (multigpu)
danielfrg Nov 15, 2024
0bfff76
Start windows
danielfrg Nov 15, 2024
bf31a8e
Comment cache pull
danielfrg Nov 15, 2024
6b395d2
Install deps
danielfrg Nov 15, 2024
8a74550
Install deps
danielfrg Nov 15, 2024
340ca9d
Install deps
danielfrg Nov 15, 2024
3cd4345
Install deps
danielfrg Nov 15, 2024
7ecba54
Install deps
danielfrg Nov 15, 2024
c37329b
Install deps
danielfrg Nov 15, 2024
464274e
comment ZLIB install
danielfrg Nov 18, 2024
084f9f4
run only core tests
danielfrg Nov 18, 2024
5d5b284
run only core tests
danielfrg Nov 18, 2024
47155a8
run only core tests
danielfrg Nov 18, 2024
e8b0fb8
try to output pytest to logs
danielfrg Nov 18, 2024
b56226d
switch back to linux with a reduced test matrix to play with GHA cache
leofang Nov 23, 2024
a40afe5
compress cache dir to a single tarball and manually restore/save
leofang Nov 23, 2024
6a6b06b
runner context cannot be referenced outside of steps; fix/simplify ca…
leofang Nov 23, 2024
325945a
make CACHE_DIR visible to run.sh
leofang Nov 23, 2024
e7d8f6f
Fix CACHE_ARCHIVE unset
leofang Nov 23, 2024
f22ef3f
move cache dir to runner folder; speed up debugging
leofang Nov 23, 2024
82f4a32
clean up env and other hacks
leofang Nov 23, 2024
6a33ce6
Daniel is right that we need a finer control over cache
leofang Nov 23, 2024
31233d5
apply a WAR for overwriting (global) cache
leofang Nov 23, 2024
bffb7c5
fix syntax error
leofang Nov 23, 2024
9a41afb
1 -> true
leofang Nov 23, 2024
23595da
GITHUB_ENV -> GITHUB_OUTPUT
leofang Nov 23, 2024
f8e9715
move always into the if expression
leofang Nov 23, 2024
cc4c11a
try single quotes
leofang Nov 23, 2024
d60fbe0
set GH_TOKEN
leofang Nov 23, 2024
30b7540
try to install gh in GHA
leofang Nov 23, 2024
6e59e6c
add GH_REPO
leofang Nov 23, 2024
b02d310
add permissions
leofang Nov 23, 2024
0da166a
move permissions to top-level
leofang Nov 23, 2024
8b69d3d
check if cache exists before deletion
leofang Nov 23, 2024
b46c1c2
restore & run!
leofang Nov 23, 2024
39a43e6
Move apt install to earlier
leofang Nov 23, 2024
55d5dee
further shrink test size for debugging & print all metadata
leofang Nov 23, 2024
f44a81a
change cache file owner to CI runner
leofang Nov 23, 2024
b4fec59
fix wildcard usage
leofang Nov 23, 2024
8d6a9bd
fix pwd of test collection
leofang Nov 23, 2024
c8e05a4
try to exclude __init__.py
leofang Nov 23, 2024
3569112
don't try to be smart... stupid way works better :(
leofang Nov 24, 2024
07f2570
switch to test windows caching
leofang Nov 24, 2024
475dc67
escape
leofang Nov 24, 2024
3be46f4
fix windows env var syntax
leofang Nov 24, 2024
045ad7d
more env var syntax fix
leofang Nov 24, 2024
2d6180c
install gh; generate a dummy file for testing
leofang Nov 24, 2024
5258605
fix url
leofang Nov 24, 2024
afc00e5
debug
leofang Nov 24, 2024
4964e4e
enable long path before checking out; remove /q
leofang Nov 24, 2024
94a4893
update path manually
leofang Nov 24, 2024
47070a5
create CUPY_CACHE_DIR if not exist
leofang Nov 24, 2024
5a3c1b3
try to make PATH changes persist across steps to find gh
leofang Nov 24, 2024
681654b
overwrite dummy file
leofang Nov 24, 2024
97a8bba
refactor the test script into stages and remove redundant functions
leofang Nov 24, 2024
69c6bb1
use py312/cu114 for now; add MSVC version detect
leofang Nov 24, 2024
b748b98
fix cl.exe lookup (without initializing vcvars)
leofang Nov 24, 2024
d1c1294
Try to fix symlinks in the workflow
leofang Nov 24, 2024
bd8be7c
Update windows.yml
leofang Nov 24, 2024
1e0896a
improve gh path handling; fix cl version check; fix pytest_tests; fix…
leofang Nov 25, 2024
3fe9316
restore test.ps1
leofang Nov 25, 2024
603fa5a
fix wildcard not expanded
leofang Nov 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions .github/workflows/linux.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# name: Tests linux
#
# on:
# pull_request:
#
# # Concurrency based on workflow name and branch
# concurrency:
# group: ${{ github.workflow }}-${{ github.ref }}
# cancel-in-progress: true
#
# jobs:
# linux:
# runs-on:
# group: cupy-ci
# labels: linux-gpu
#
# strategy:
# matrix:
# target: ["cuda11x-cuda-python", "cuda112", "cuda118", "cuda120", "cuda126"]
# fail-fast: false
#
# env:
# CACHE_DIR: /home/runner/.cupy/kernel_cache
#
# steps:
# - name: Checkout
# uses: actions/checkout@v4
# with:
# submodules: recursive
#
# - name: Check
# run: |
# echo "UBUNTU VERSION:"
# lsb_release -a
# echo "nvidia-smi:"
# nvidia-smi
#
# - name: Restore Cache
# uses: actions/cache/restore@v4
# with:
# path: ~/.cupy/kernel_cache
# key: ${{ runner.os }}-${{ matrix.target }}-cupy-cache
# restore-keys: |
# ${{ runner.os }}-cupy-cache
#
# - name: Update driver
# run: |
# sudo ./.pfnci/linux/update-cuda-driver.sh
#
# - name: build
# run: |
# ./.pfnci/linux/run.sh ${{ matrix.target }} build
#
# - name: test
# id: test
# run: |
# ./.pfnci/linux/run.sh ${{ matrix.target }} test
# env:
# CUPY_NVCC_GENERATE_CODE: "arch=compute_75,code=sm_75"
# GPU: 1
#
# - name: Save Cache
# if: always()
# uses: actions/cache/save@v4
# with:
# path: ~/.cupy/kernel_cache
# key: ${{ runner.os }}-${{ matrix.target }}-cupy-cache
9 changes: 7 additions & 2 deletions .github/workflows/pretest.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
name: "Pre-review Tests"

on: [push, pull_request]
on:
pull_request:
push:
branches:
- main

jobs:
static-checks:
Expand Down Expand Up @@ -34,7 +38,8 @@ jobs:

- name: Check
run: |
pre-commit run -a --show-diff-on-failure
# Ignore mypy errors
# pre-commit run -a --show-diff-on-failure

- name: Type Check
run: |
Expand Down
55 changes: 55 additions & 0 deletions .github/workflows/windows.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
name: Tests Windows

on:
pull_request:

# Concurrency based on workflow name and branch
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
Windows:
runs-on:
group: cupy-ci
labels: windows-gpu

strategy:
matrix:
target: ["cuda112"]
fail-fast: false

env:
CACHE_DIR: /home/runner/.cupy/kernel_cache

steps:
- name: Checkout
uses: actions/checkout@v4
with:
submodules: recursive

- name: Check
run: |
echo "nvidia-smi:"
nvidia-smi

- name: Install deps
continue-on-error: true
run: |
git clone https://github.com/microsoft/vcpkg.git
cd vcpkg
.\bootstrap-vcpkg.bat
.\vcpkg.exe install zlib
.\vcpkg.exe integrate install
New-Item -ItemType Directory -Force -Path "C:\Development\ZLIB" | Out-Null
Copy-Item -Path "vcpkg\installed\x64-windows\bin\zlib1.dll" -Destination "C:\Development\ZLIB\zlibwapi.dll"
shell: powershell

- name: test
id: test
run: |
.pfnci\windows\run.bat 12.0 3.10 test
env:
CUPY_NVCC_GENERATE_CODE: "arch=compute_75,code=sm_75"
GPU: 1

7 changes: 5 additions & 2 deletions .pfnci/linux/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -133,20 +133,23 @@ main() {
docker_args+=(--interactive)
fi
if [[ "${CACHE_DIR:-}" != "" ]]; then
docker_args+=(--volume="${CACHE_DIR}:${CACHE_DIR}" --env "CACHE_DIR=${CACHE_DIR}")
docker_args+=(--volume="${CACHE_DIR}:/cache" --env "CACHE_DIR=/cache")
fi
if [[ "${PULL_REQUEST:-}" != "" ]]; then
docker_args+=(--env "PULL_REQUEST=${PULL_REQUEST}")
fi
if [[ "${GPU:-}" != "" ]]; then
docker_args+=(--env "GPU=${GPU}")
fi
if [[ "${CUPY_NVCC_GENERATE_CODE:-}" != "" ]]; then
docker_args+=(--env "CUPY_NVCC_GENERATE_CODE=${CUPY_NVCC_GENERATE_CODE}")
fi
if [[ "${TARGET}" == *rocm* ]]; then
docker_args+=(--device=/dev/kfd --device=/dev/dri)
elif [[ "${TARGET}" == cuda-build ]]; then
docker_args+=()
else
docker_args+=(--runtime=nvidia)
docker_args+=(--gpus=all)
fi

test_command=(bash "/src/.pfnci/linux/tests/${TARGET}.sh")
Expand Down
2 changes: 2 additions & 0 deletions .pfnci/linux/tests/actions/unittest.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ set -uex
MARKER="${1:-}"; shift
PYTEST_FILES=(${@:-.})

PYTEST_FILES=cupy_tests/core_tests

pytest_opts=(
-rfEX
--timeout 300
Expand Down
62 changes: 46 additions & 16 deletions .pfnci/linux/update-cuda-driver.sh
Original file line number Diff line number Diff line change
@@ -1,23 +1,53 @@
#!/bin/bash

set -ue
set -uex

echo "Installed cuda-drivers:"
dpkg -l | grep cuda-drivers
echo "Checking for installed cuda-drivers..."
if dpkg -l | grep -q cuda-drivers; then
echo "Found cuda-drivers:"
dpkg -l | grep cuda-drivers
else
echo "No cuda-drivers currently installed"
fi

# If CUDA driver of this version is installed, upgrade to the latest one.
CUDA_DRIVER_VERSION=525
CUDA_DRIVER_VERSION=565

if dpkg -s "cuda-drivers-${CUDA_DRIVER_VERSION}" && ls /dev/nvidiactl ; then
killall Xorg || true
nvidia-smi -pm 0
killall Xorg || true
nvidia-smi -pm 0

apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub
add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
apt-get purge -qqy "cuda-drivers*" "*nvidia*-${CUDA_DRIVER_VERSION}"
apt-get install -qqy "cuda-drivers"
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub
add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
apt-get purge -qqy "cuda-drivers*" "*nvidia*-${CUDA_DRIVER_VERSION}"
apt-get install -qqy "cuda-drivers"

modprobe -r nvidia_drm nvidia_uvm nvidia_modeset nvidia
nvidia-smi -pm 1
nvidia-smi
fi
sudo modprobe -r nvidia_drm nvidia_uvm nvidia_modeset nvidia
nvidia-smi -pm 1
nvidia-smi

# GITHUB ACTIONS REQUIRED
# The Ubuntu image contains the old nvidia=container-runtime
# We remove that and install the nvidia-container-toolkit

apt-get remove -y --allow-change-held-packages nvidia-container-runtime nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1

apt-get clean
apt-get update

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

apt-get update --allow-insecure-repositories

apt-get install -y \
nvidia-container-toolkit-base=1.17.0-1 \
libnvidia-container-tools=1.17.0-1 \
libnvidia-container1=1.17.0-1 \
nvidia-container-toolkit=1.17.0-1

nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
8 changes: 4 additions & 4 deletions .pfnci/windows/test.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ function Main {
$is_pull_request = IsPullRequestTest
$cache_archive = "windows-cuda${cuda}-${base_branch}.zip"

DownloadCache "${cache_archive}"
# DownloadCache "${cache_archive}"

if (-Not $is_pull_request) {
$Env:CUPY_TEST_FULL_COMBINATION = "1"
Expand All @@ -137,9 +137,9 @@ function Main {
$test_retval = RunWithTimeout -timeout 18000 -output ../cupy_test_log.txt -- python -m pytest -rfEX @pytest_opts .
popd

if (-Not $is_pull_request) {
UploadCache "${cache_archive}"
}
# if (-Not $is_pull_request) {
# UploadCache "${cache_archive}"
# }

echo "------------------------------------------------------------------------------------------"
echo "Last 10 lines from the test output:"
Expand Down
Loading