-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore ways to maximize coverage while minimizing cost of the CI test matrix #5
Comments
Capturing some miscellaneous thoughts from interacting with the matrices on rapidsai/shared-workflows#166. In general, I think whatever decisions we make about the support matrix should be:
idea 1. matrices in workflow configs should make more use of shared variables For example, it seems that we only do Today that looks like: export MATRIX="
- { CUDA_VER: '11.8.0', LINUX_VER: 'ubuntu22.04', ARCH: 'amd64', PY_VER: '3.10' }
- { CUDA_VER: '11.8.0', LINUX_VER: 'ubuntu22.04', ARCH: 'arm64', PY_VER: '3.10' }
- { CUDA_VER: '12.0.1', LINUX_VER: 'ubuntu22.04', ARCH: 'amd64', PY_VER: '3.10' }
- { CUDA_VER: '12.0.1', LINUX_VER: 'ubuntu22.04', ARCH: 'arm64', PY_VER: '3.10' }
" The intention there would be clearer, in my opinion, like this: BUILD_OS="ubuntu22.04"
CUDA_PREVIOUS="11.8.0"
CUDA_CURRENT="12.0.1"
PYTHON_VERSION="3.10"
export MATRIX="
- { CUDA_VER: '${CUDA_PREVIOUS}', LINUX_VER: '${BUILD_OS}', ARCH: 'amd64', PY_VER: '${PYTHON_VERSION}' }
- { CUDA_VER: '${CUDA_PREVIOUS}', LINUX_VER: '${BUILD_OS}', ARCH: 'arm64', PY_VER: '${PYTHON_VERSION}' }
- { CUDA_VER: '${CUDA_CURRENT}', LINUX_VER: '${BUILD_OS}', ARCH: 'amd64', PY_VER: '${PYTHON_VERSION}' }
- { CUDA_VER: '${CUDA_CURRENT}', LINUX_VER: '${BUILD_OS}', ARCH: 'arm64', PY_VER: '${PYTHON_VERSION}' }
" That'd make the patterns more obvious and reduce the diffs for changes like updating CUDA or Python version. If the post-processing of those variables that's already done with idea 2: desirable properties of matrices should be enforced in CI Here are some constraints I can imagine being desirable:
Instead of relying on code comments or convention, I think it's worth considering whether those constraints could be enforced in I'm imagining here a little script that reads in the matrix configuration, renders the full matrices, and then asserts all the conditions we want to be true and raises a big loud error if any are violated. |
One other passing thought.... right now all wheels are tested against only the latest driver supported on the NVIDIA-hosted runners. We should consider how much, if any, coverage of older drivers we want when testing wheels. |
I really like this idea in theory. I don't know how well it'll work in practice, but I definitely think it's worth experimenting with. |
From discussion in rapidsai/shared-workflows#176 (comment), adding another example of a constraint that could be enforced automatically: "the nightly matrix should be a strict superset of the PR matrix" |
I discussed this with @vyasr and @ajschmidt8 in the context of rapidsai/shared-workflows#184 and rapidsai/cudf#15201. We are planning to implement this: (Assume latest driver unless otherwise noted) 24.04 PR jobscondaamd64, CUDA 12 (newest Ubuntu, newest Python) wheelamd64, CUDA 12 (Rocky 8, oldest Python) 24.06 PR jobsNote that we drop CentOS 7, and "oldest Ubuntu" will change from 18.04 to 20.04 here. See #23. condaamd64, CUDA 12 (newest Ubuntu, newest Python) wheelamd64, CUDA 12 (Rocky 8, oldest Python) ReasoningWe came to this set of entries from the following steps:
|
@vyasr There are still some potential areas for exploration here but nothing I think is important in the short term. It seems like our recent changes to the CI matrix have improved things a lot. Would you want to identify any action items here, or should we consider closing this? |
Since we're about to add back full wheel arm runs, let's see how things look after that. I suspect that we might be partially back in a state where we will benefit from some of the changes in rapidsai/cudf#15201. |
Closing in favor of #95 |
There are various potential improvements we could make to our CI matrix to improve test coverage of critical components while reducing the overall load. Some possible improvements that have been suggested at various times include:
The text was updated successfully, but these errors were encountered: