Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Github actions CI for tests #1

Open
wants to merge 108 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
c7e89d9
add Github action CI
danielfrg Nov 8, 2024
6172407
remove runtime on GH Actions
danielfrg Nov 8, 2024
7045eda
remove runtime on GH Actions
danielfrg Nov 8, 2024
095ebd8
Fix types
danielfrg Nov 8, 2024
751029d
fix unbound var
danielfrg Nov 8, 2024
38afefc
fix unbound var
danielfrg Nov 8, 2024
289b518
fix types
danielfrg Nov 8, 2024
5af42f8
Add concurrency
danielfrg Nov 8, 2024
dc844a8
fix types
danielfrg Nov 8, 2024
cebecbf
fix types
danielfrg Nov 8, 2024
1e3431b
ignore typechecks
danielfrg Nov 8, 2024
39950b7
Add print of driver
danielfrg Nov 8, 2024
c1f2243
Add print of driver
danielfrg Nov 8, 2024
da2b91f
update driver script
danielfrg Nov 8, 2024
1a751b5
test
danielfrg Nov 8, 2024
6fece36
install cuda-drivers
danielfrg Nov 8, 2024
be6bf05
install cuda-drivers
danielfrg Nov 8, 2024
8457bf9
ignore stuff
danielfrg Nov 8, 2024
884b49b
sudo
danielfrg Nov 8, 2024
e7c8747
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
8d4a326
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
a7e2873
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
f218f90
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
13cdd73
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
a83c6d3
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
ce254e2
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
71f9c81
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
913332b
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
e4f02de
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
49ae877
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
e557540
add container toolkit and --gpus flag
danielfrg Nov 8, 2024
9e38cd6
run all tests
danielfrg Nov 8, 2024
e3f77eb
run only core tests
danielfrg Nov 10, 2024
a9b6408
run only core tests
danielfrg Nov 10, 2024
6bb2cf0
Run all mini tests for linux
danielfrg Nov 12, 2024
c6a55b3
fail fast false
danielfrg Nov 12, 2024
ad34d3f
fail fast false
danielfrg Nov 12, 2024
40214b3
remove cuda-python test (fails)
danielfrg Nov 12, 2024
605854f
Add extra vars
danielfrg Nov 13, 2024
4daebe0
Add extra vars
danielfrg Nov 13, 2024
14db2e3
Add extra vars
danielfrg Nov 13, 2024
a597df8
Add cache logic
danielfrg Nov 13, 2024
c2e49d2
test :)
danielfrg Nov 13, 2024
788bf65
fix dir
danielfrg Nov 13, 2024
86edb0e
Merge branch 'main' into gha-ci
danielfrg Nov 14, 2024
8cae992
Merge branch 'main' into gha-ci
danielfrg Nov 15, 2024
e7ecd5a
Remove cuda-example (multigpu)
danielfrg Nov 15, 2024
0bfff76
Start windows
danielfrg Nov 15, 2024
bf31a8e
Comment cache pull
danielfrg Nov 15, 2024
6b395d2
Install deps
danielfrg Nov 15, 2024
8a74550
Install deps
danielfrg Nov 15, 2024
340ca9d
Install deps
danielfrg Nov 15, 2024
3cd4345
Install deps
danielfrg Nov 15, 2024
7ecba54
Install deps
danielfrg Nov 15, 2024
c37329b
Install deps
danielfrg Nov 15, 2024
464274e
comment ZLIB install
danielfrg Nov 18, 2024
084f9f4
run only core tests
danielfrg Nov 18, 2024
5d5b284
run only core tests
danielfrg Nov 18, 2024
47155a8
run only core tests
danielfrg Nov 18, 2024
e8b0fb8
try to output pytest to logs
danielfrg Nov 18, 2024
b56226d
switch back to linux with a reduced test matrix to play with GHA cache
leofang Nov 23, 2024
a40afe5
compress cache dir to a single tarball and manually restore/save
leofang Nov 23, 2024
6a6b06b
runner context cannot be referenced outside of steps; fix/simplify ca…
leofang Nov 23, 2024
325945a
make CACHE_DIR visible to run.sh
leofang Nov 23, 2024
e7d8f6f
Fix CACHE_ARCHIVE unset
leofang Nov 23, 2024
f22ef3f
move cache dir to runner folder; speed up debugging
leofang Nov 23, 2024
82f4a32
clean up env and other hacks
leofang Nov 23, 2024
6a33ce6
Daniel is right that we need a finer control over cache
leofang Nov 23, 2024
31233d5
apply a WAR for overwriting (global) cache
leofang Nov 23, 2024
bffb7c5
fix syntax error
leofang Nov 23, 2024
9a41afb
1 -> true
leofang Nov 23, 2024
23595da
GITHUB_ENV -> GITHUB_OUTPUT
leofang Nov 23, 2024
f8e9715
move always into the if expression
leofang Nov 23, 2024
cc4c11a
try single quotes
leofang Nov 23, 2024
d60fbe0
set GH_TOKEN
leofang Nov 23, 2024
30b7540
try to install gh in GHA
leofang Nov 23, 2024
6e59e6c
add GH_REPO
leofang Nov 23, 2024
b02d310
add permissions
leofang Nov 23, 2024
0da166a
move permissions to top-level
leofang Nov 23, 2024
8b69d3d
check if cache exists before deletion
leofang Nov 23, 2024
b46c1c2
restore & run!
leofang Nov 23, 2024
39a43e6
Move apt install to earlier
leofang Nov 23, 2024
55d5dee
further shrink test size for debugging & print all metadata
leofang Nov 23, 2024
f44a81a
change cache file owner to CI runner
leofang Nov 23, 2024
b4fec59
fix wildcard usage
leofang Nov 23, 2024
8d6a9bd
fix pwd of test collection
leofang Nov 23, 2024
c8e05a4
try to exclude __init__.py
leofang Nov 23, 2024
3569112
don't try to be smart... stupid way works better :(
leofang Nov 24, 2024
07f2570
switch to test windows caching
leofang Nov 24, 2024
475dc67
escape
leofang Nov 24, 2024
3be46f4
fix windows env var syntax
leofang Nov 24, 2024
045ad7d
more env var syntax fix
leofang Nov 24, 2024
2d6180c
install gh; generate a dummy file for testing
leofang Nov 24, 2024
5258605
fix url
leofang Nov 24, 2024
afc00e5
debug
leofang Nov 24, 2024
4964e4e
enable long path before checking out; remove /q
leofang Nov 24, 2024
94a4893
update path manually
leofang Nov 24, 2024
47070a5
create CUPY_CACHE_DIR if not exist
leofang Nov 24, 2024
5a3c1b3
try to make PATH changes persist across steps to find gh
leofang Nov 24, 2024
681654b
overwrite dummy file
leofang Nov 24, 2024
97a8bba
refactor the test script into stages and remove redundant functions
leofang Nov 24, 2024
69c6bb1
use py312/cu114 for now; add MSVC version detect
leofang Nov 24, 2024
b748b98
fix cl.exe lookup (without initializing vcvars)
leofang Nov 24, 2024
d1c1294
Try to fix symlinks in the workflow
leofang Nov 24, 2024
bd8be7c
Update windows.yml
leofang Nov 24, 2024
1e0896a
improve gh path handling; fix cl version check; fix pytest_tests; fix…
leofang Nov 25, 2024
3fe9316
restore test.ps1
leofang Nov 25, 2024
603fa5a
fix wildcard not expanded
leofang Nov 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
135 changes: 135 additions & 0 deletions .github/workflows/linux.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# name: Tests linux
#
# on:
# pull_request:
#
# # Concurrency based on workflow name and branch
# concurrency:
# group: ${{ github.workflow }}-${{ github.ref }}
# cancel-in-progress: true
#
# jobs:
# linux:
# runs-on:
# group: cupy-ci
# labels: linux-gpu
#
# strategy:
# matrix:
# #target: ["cuda11x-cuda-python", "cuda112", "cuda118", "cuda120", "cuda126"]
# target: ["cuda126"]
# fail-fast: false
#
# # FIXME
# permissions: write-all
#
# steps:
# - name: Checkout
# uses: actions/checkout@v4
# with:
# submodules: recursive
#
# - name: Install gh cli
# # for some reason the GPU runner image does not have gh pre-installed...
# run: |
# (type -p wget >/dev/null || (sudo apt update && sudo apt-get install wget -y)) \
# && sudo mkdir -p -m 755 /etc/apt/keyrings \
# && wget -qO- https://cli.github.com/packages/githubcli-archive-keyring.gpg | \
# sudo tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null \
# && sudo chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg \
# && echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
# && sudo apt update \
# && sudo apt install gh -y
#
# - name: Check system
# run: |
# echo "UBUNTU VERSION:"
# lsb_release -a
# echo "nvidia-smi:"
# nvidia-smi
#
# - name: Set up cache variables
# run: |
# echo "CACHE_DIR=/home/runner/cupy_cache" >> $GITHUB_ENV
# echo "CACHE_ARCHIVE=/home/runner/${{ runner.os }}-${{ matrix.target }}-cupy-cache.tar.gz" >> $GITHUB_ENV
# # TODO: this key might be too simple?
# echo "CACHE_KEY=${{ runner.os }}-${{ matrix.target }}-cupy-cache" >> $GITHUB_ENV
#
# - name: Restore Cache
# id: gha-cupy-cache
# uses: actions/cache/restore@v4
# with:
# path: ${{ env.CACHE_ARCHIVE }}
# key: ${{ env.CACHE_KEY }}
#
# - if: ${{ steps.gha-cupy-cache.outputs.cache-hit != 'true' }}
# name: Report cache restore status (miss)
# continue-on-error: true
# run: |
# echo "no cache found, creating a new cache..."
# mkdir -p "${{ env.CACHE_DIR }}"
#
# - if: ${{ steps.gha-cupy-cache.outputs.cache-hit == 'true' }}
# name: Report cache restore status (hit)
# continue-on-error: true
# run: |
# echo "cache is found"
# ls -l ${{ env.CACHE_ARCHIVE }}
#
# # this is cache_get in .pfnci/linux/run.sh
# mkdir -p "${{ env.CACHE_DIR }}"
# du -h "${{ env.CACHE_ARCHIVE }}" &&
# tar -x -f "${{ env.CACHE_ARCHIVE }}" -C "${{ env.CACHE_DIR }}" &&
# rm -f "${{ env.CACHE_ARCHIVE }}" || echo "WARNING: cache could not be retrieved."
#
# - name: Update driver
# run: |
# sudo ./.pfnci/linux/update-cuda-driver.sh
#
# - name: Build test image
# run: |
# ./.pfnci/linux/run.sh ${{ matrix.target }} build
#
# - name: Build & test CuPy
# id: test
# env:
# CUPY_NVCC_GENERATE_CODE: "arch=compute_75,code=sm_75"
# GPU: 1
# run: |
# echo "CACHE_DIR is ${{ env.CACHE_DIR }} (${CACHE_DIR})"
# ls -al ${{ env.CACHE_DIR }}
# # need to set CACHE_DIR so that run.sh would pass it down to the next docker run,
# # where CUPY_CACHE_DIR & co would be set accordingly
# CACHE_DIR=${{ env.CACHE_DIR }} ./.pfnci/linux/run.sh ${{ matrix.target }} test
# #touch $CACHE_DIR/test1
# #touch $CACHE_DIR/test2
#
# - name: Prepare cache
# id: prepare-cache
# # TODO: add an if here to check if test completes without error?
# env:
# GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# GH_REPO: ${{ github.repository }}
# run: |
# # this is cache_put in .pfnci/linux/run.sh
# sudo chown -R runner ${{ env.CACHE_DIR }}
# ls -al ${{ env.CACHE_DIR }}
# tar -c -f "${{ env.CACHE_ARCHIVE }}" -C "${{ env.CACHE_DIR }}" .
# du -h "${{ env.CACHE_ARCHIVE }}"
#
# # TODO: this is dangerous because we're overwriting the global GHA cache!
# # We should have another workflow that updates the global cache upon PR merge.
# if [ $(gh cache list | grep $CACHE_KEY | wc -l) == "1" ]; then
# gh cache delete $CACHE_KEY
# fi
#
# # next step is safe to launch
# echo "CACHE_CAN_REBUILD=1" >> $GITHUB_OUTPUT
#
# - name: Save Cache
# if: ${{ always() && steps.prepare-cache.outputs.CACHE_CAN_REBUILD == '1' }}
# uses: actions/cache/save@v4
# with:
# path: ${{ env.CACHE_ARCHIVE }}
# key: ${{ env.CACHE_KEY }}
# # TODO: set upload-chunk-size?
9 changes: 7 additions & 2 deletions .github/workflows/pretest.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
name: "Pre-review Tests"

on: [push, pull_request]
on:
pull_request:
push:
branches:
- main

jobs:
static-checks:
Expand Down Expand Up @@ -34,7 +38,8 @@ jobs:

- name: Check
run: |
pre-commit run -a --show-diff-on-failure
# Ignore mypy errors
# pre-commit run -a --show-diff-on-failure

- name: Type Check
run: |
Expand Down
157 changes: 157 additions & 0 deletions .github/workflows/windows.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
name: Tests Windows

on:
pull_request:

# Concurrency based on workflow name and branch
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
Windows:
runs-on:
group: cupy-ci
labels: windows-gpu

strategy:
matrix:
#target: ["cuda112"]
#target: ["cuda126"]
target: ["cuda114"] # choosing 11.4 here, see the comment below
fail-fast: false

# FIXME
permissions: write-all

steps:
- name: Pre-checkout configure
run: |
# Enable long path
Set-ItemProperty "Registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem" -Name LongPathsEnabled -value 1
# Enable symlinks
git config --global core.symlinks true

- name: Checkout
uses: actions/checkout@v4
with:
submodules: recursive

- name: Install gh cli
# for some reason the GPU runner image does not have gh pre-installed...
env:
# doesn't seem there's an easy way to avoid hard-coding it?
GH_MSI_URL: https://github.com/cli/cli/releases/download/v2.62.0/gh_2.62.0_windows_amd64.msi
run: |
Invoke-WebRequest -Uri "$env:GH_MSI_URL" -OutFile "gh_installer.msi"
Start-Process msiexec.exe -Wait -Verbose -ArgumentList '/i "gh_installer.msi" /qn'
$GH_POSSIBLE_PATHS = "C:\\Program Files\\GitHub CLI", "C:\\Program Files (x86)\\GitHub CLI"
foreach ($p in $GH_POSSIBLE_PATHS) {
echo "$p" >> $env:GITHUB_PATH
$env:Path += ";$p"
}
gh --version

- name: Check system
run: |
echo "nvidia-smi:"
nvidia-smi

# - name: Install deps
# continue-on-error: true
# shell: powershell
# run: |
# git clone https://github.com/microsoft/vcpkg.git
# cd vcpkg
# .\bootstrap-vcpkg.bat
# .\vcpkg.exe install zlib
# .\vcpkg.exe integrate install
# New-Item -ItemType Directory -Force -Path "C:\Development\ZLIB" | Out-Null

- name: Set up cache variables
run: |
echo "CACHE_DIR=$env:USERPROFILE" >> $env:GITHUB_ENV
echo "CACHE_ARCHIVE=$env:USERPROFILE\${{ runner.os }}-${{ matrix.target }}-cupy-cache.zip" >> $env:GITHUB_ENV
# TODO: this key might be too simple?
echo "CACHE_KEY=${{ runner.os }}-${{ matrix.target }}-cupy-cache" >> $env:GITHUB_ENV

- name: Restore Cache
id: gha-cupy-cache
uses: actions/cache/restore@v4
with:
path: ${{ env.CACHE_ARCHIVE }}
key: ${{ env.CACHE_KEY }}

- if: ${{ steps.gha-cupy-cache.outputs.cache-hit != 'true' }}
name: Report cache restore status (miss)
continue-on-error: true
run: |
echo "no cache found, creating a new cache..."
mkdir -force ${{ env.CACHE_DIR }}\.cupy

- if: ${{ steps.gha-cupy-cache.outputs.cache-hit == 'true' }}
name: Report cache restore status (hit)
continue-on-error: true
run: |
echo "cache is found"
ls -force ${{ env.CACHE_ARCHIVE }}

# this is DownloadCache in .pfnci/windows/test.ps1
pushd ${{ env.CACHE_DIR }}
7z x ${{ env.CACHE_ARCHIVE }}
rm ${{ env.CACHE_ARCHIVE }}
popd
ls -force ${{ env.CACHE_DIR }}

- name: Build & test CuPy
id: test
env:
CUPY_NVCC_GENERATE_CODE: "arch=compute_75,code=sm_75"
CUPY_CACHE_DIR: "${{ env.CACHE_DIR }}\\.cupy"
GPU: 1
run: |
#echo "test"
#ni -force -ItemType File -Path "$env:CUPY_CACHE_DIR\\abc"
# The next step requires this environment variable to be visible
echo "CUPY_CACHE_DIR=$env:CUPY_CACHE_DIR" >> $env:GITHUB_ENV
# FIXME: get the version strings from a test matrix. Right now, we have
# to hard code the values to what're pre-installed in the CI image.
.pfnci\windows\GHA-test.ps1 -stage setup -python 3.12 -cuda 11.4
.pfnci\windows\GHA-test.ps1 -stage build
.pfnci\windows\GHA-test.ps1 -stage test

- name: Prepare cache
id: prepare-cache
# TODO: add an if here to check if test completes without error?
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GH_REPO: ${{ github.repository }}
run: |
# this is DownloadCache in .pfnci/windows/test.ps1
ls -force ${{ env.CACHE_DIR }}
echo "Trimming kernel cache..."
python .pfnci\trim_cupy_kernel_cache.py --max-size 1000000000 --rm

pushd ${{ env.CACHE_DIR }}
# -mx=0 ... no compression
# -mtc=on ... preserve timestamp
echo "Compressing kernel cache..."
7z a -tzip -mx=0 -mtc=on ${{ env.CACHE_ARCHIVE }} .cupy
popd

# TODO: this is dangerous because we're overwriting the global GHA cache!
# We should have another workflow that updates the global cache upon PR merge.
if ((gh cache list | Select-String -Pattern ${{ env.CACHE_KEY }}).Count -eq 1) {
gh cache delete ${{ env.CACHE_KEY }}
}

# next step is safe to launch
echo "CACHE_CAN_REBUILD=1" >> $env:GITHUB_OUTPUT

- name: Save Cache
if: ${{ always() && steps.prepare-cache.outputs.CACHE_CAN_REBUILD == '1' }}
uses: actions/cache/save@v4
with:
path: ${{ env.CACHE_ARCHIVE }}
key: ${{ env.CACHE_KEY }}
# TODO: set upload-chunk-size?
7 changes: 5 additions & 2 deletions .pfnci/linux/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -133,20 +133,23 @@ main() {
docker_args+=(--interactive)
fi
if [[ "${CACHE_DIR:-}" != "" ]]; then
docker_args+=(--volume="${CACHE_DIR}:${CACHE_DIR}" --env "CACHE_DIR=${CACHE_DIR}")
docker_args+=(--volume="${CACHE_DIR}:/cache" --env "CACHE_DIR=/cache")
fi
if [[ "${PULL_REQUEST:-}" != "" ]]; then
docker_args+=(--env "PULL_REQUEST=${PULL_REQUEST}")
fi
if [[ "${GPU:-}" != "" ]]; then
docker_args+=(--env "GPU=${GPU}")
fi
if [[ "${CUPY_NVCC_GENERATE_CODE:-}" != "" ]]; then
docker_args+=(--env "CUPY_NVCC_GENERATE_CODE=${CUPY_NVCC_GENERATE_CODE}")
fi
if [[ "${TARGET}" == *rocm* ]]; then
docker_args+=(--device=/dev/kfd --device=/dev/dri)
elif [[ "${TARGET}" == cuda-build ]]; then
docker_args+=()
else
docker_args+=(--runtime=nvidia)
docker_args+=(--gpus=all)
fi

test_command=(bash "/src/.pfnci/linux/tests/${TARGET}.sh")
Expand Down
2 changes: 1 addition & 1 deletion .pfnci/linux/tests/actions/unittest.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ python3 -m pip install --user pytest-timeout pytest-xdist
pushd tests
timeout --signal INT --kill-after 10 60 python3 -c 'import cupy; cupy.show_config(_full=True)'
test_retval=0
timeout --signal INT --kill-after 60 18000 python3 -m pytest "${pytest_opts[@]}" "${PYTEST_FILES[@]}" || test_retval=$?
timeout --signal INT --kill-after 60 18000 python3 -m pytest "${pytest_opts[@]}" cupy_tests/core_tests/test*.py || test_retval=$?
popd

case ${test_retval} in
Expand Down
Loading