-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
standardize and consolidate wheel installations in testing scripts #16575
Conversation
ci/cudf_pandas_scripts/run_tests.sh
Outdated
RAPIDS_PY_WHEEL_NAME="cudf_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-s3 ./dist | ||
# echo to expand wildcard before adding `[extra]` requires for pip | ||
python -m pip install \ | ||
"$(echo ./dist/cudf*.whl)[test,cudf-pandas-tests]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
splitting some of these single-line installations into multiple lines, in anticipation of them picking up pylibcudf
and / or libcudf
lines as well when those packages are split out of cudf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One change request, but I don't need to review again.
"$(echo ./dist/cudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test,pandas-tests]" \ | ||
"$(echo ./dist/pylibcudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test,pandas-tests]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't caused by this PR, but let's get these extras fixed here since they aren't quite right. pylibcudf doesn't have a pandas-tests extra. It does have a test extra, but I wouldn't expect it to be needed in this test run.
"$(echo ./dist/cudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test,pandas-tests]" \ | |
"$(echo ./dist/pylibcudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test,pandas-tests]" | |
"$(echo ./dist/cudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test,pandas-tests]" \ | |
"$(echo ./dist/pylibcudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like with this combined approach of installation, even though you're explicitly specifying to install pylibcudf that package doesn't get included in dependency resolution for cudf. The two step process might be necessary, or you can switch to adding --find-links
to the command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see the failure at https://github.com/rapidsai/cudf/actions/runs/10425017056/job/28875933840?pr=16575.
./dist./distLooking in indexes: https://pypi.org/simple, https://pypi.anaconda.org/rapidsai-wheels-nightly/simple, https://pypi.nvidia.com/
Processing ./dist/cudf_cu12-24.10.0a205-cp310-cp310-manylinux_2_28_x86_64.whl
Processing ./dist/pylibcudf_cu12-24.10.0a205-cp310-cp310-manylinux_2_28_x86_64.whl
...
ERROR: Could not find a version that satisfies the requirement pylibcudf-cu12==24.10.*,>=0.0.0a0 (from cudf-cu12[test]) (from versions: none)
ERROR: No matching distribution found for pylibcudf-cu12==24.10.*,>=0.0.0a0
But did you notice that the CUDA 11.8 version of that same job, using the same script, succeeded?
Processing ./dist/cudf_cu11-24.10.0a205-cp311-cp311-manylinux_2_28_aarch64.whl (from cudf-cu11==24.10.0a205)
Processing ./dist/pylibcudf_cu11-24.10.0a205-cp311-cp311-manylinux_2_28_aarch64.whl (from pylibcudf-cu11==24.10.0a205)
...
Successfully installed ... cudf-cu11-24.10.0a205 ... pylibcudf-cu11-24.10.0a205 ...
If this method of dependency resolution didn't work, I'd have expected those jobs to both fail in the same way. I'm going to merge in latest branch-24.10
+ apply your suggestion, let's see what happens on a re-run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that thought flashed in and out of my brain while reviewing your PR and left no trace 😂 let's see what we see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still failing, so there is something strange happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, thanks!! I hadn't noticed the different pip
versions!!
That's totally it.
I was able to reproduce the failure locally.
code to do that (click me)
docker run \
--rm \
--env RAPIDS_REPOSITORY=rapidsai/cudf \
--env _RAPIDS_SHA=7a12b67bd17056d4c980a3c6e89cc8ec9a6bd7c5_ \
--env RAPIDS_REF_NAME=pull-request/16575 \
--env RAPIDS_BUILD_TYPE="pull-request" \
-it rapidsai/citestwheel:cuda12.5.1-ubuntu22.04-py3.10 \
bash
RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"
# Download the cudf and pylibcudf built in the previous step
RAPIDS_PY_WHEEL_NAME="cudf_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-s3 ./dist
RAPIDS_PY_WHEEL_NAME="pylibcudf_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-s3 ./dist
python -m pip install -v \
"$(echo ./dist/cudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test]" \
"$(echo ./dist/pylibcudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test]"
Inspected the package metadata with pkginfo
, didn't see any issues... names, versions, and dependency lists all looked correct.
Upgraded pip
(to 24.2) and tried again:
pip install --upgrade pip
python -m pip install -v \
"$(echo ./dist/cudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test]" \
"$(echo ./dist/pylibcudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test]"
That succeeded 🎉
This makes me think we should be running pip install --upgrade pip
every time the images at https://github.com/rapidsai/ci-imgs are rebuilt. Can you think of strong reason we shouldn't do that? If not, I'll put up a PR proposing that and see what others say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main argument against updating pip unconditionally everywhere is that we shouldn't require our users to always have the latest version of pip to be able to install our software. This has come up before in devcontainers, for instance, where the latest version of pip available as a system-installable package is significantly older than the latest version available from PyPI, and the question is raised whether it's really a good idea to require a newer one.
For now, for the purpose of CI I think it's OK to always update so that at least we have consistent behavior. At some point we might want to introduce some form of testing to ensure that our packages are actually installable with different versions of pip, though. We have in the past done things (e.g. the rapids-dask-dependency shenanigans) that ultimately require a fairly new version of pip, which could be an onerous constraint on some users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, ok yeah that makes sense.
I've opened rapidsai/ci-imgs#171 proposing upgrading pip
in the wheel CI images.
At some point we might want to introduce some form of testing to ensure that our packages are actually installable with different versions of pip, though.
I agree. This type of thing should be tested directly (similar to rapidsai/build-planning#81) if we want to have high confidence in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the new CI images are published (https://github.com/rapidsai/ci-imgs/actions/runs/10457672191), I've restarted CI here. Merge latest branch-24.10
into this (figured I might as well, to get a bit more useful test coverage, if I'm going to re-run so many jobs anyway).
Hopefully with newer pip
across all the CI images, this newer, stricter pattern for these scripts will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 looks like that worked!
https://github.com/rapidsai/cudf/actions/runs/10458119570/job/28961862201?pr=16575
Thanks a lot for the help, @vyasr !
Co-authored-by: Vyas Ramasubramani <[email protected]>
/merge |
Removes unnecessary installation of `cudf` wheels in wheel testing for `cudf_polars`. `cudf_polars` doesn't depend on `cudf`, and neither do its tests. However, right now it's downloading `cudf` during it's wheel tests. I mistakenly introduced that in #16575. This introduced a race condition that could lead to CI failures whenever the `cudf` wheels aren't published yet by the time the `cudf_polars` tests. Because the `cudf_polars` wheel tests (rightly) do not wait for `cudf` wheels to be available: https://github.com/rapidsai/cudf/blob/555734dee7a8fb10f50c8609a8e4fb2c025e6305/.github/workflows/pr.yaml#L154-L155 https://github.com/rapidsai/cudf/blob/555734dee7a8fb10f50c8609a8e4fb2c025e6305/.github/workflows/pr.yaml#L145-L146 Noticed this in #16611 ```text [rapids-download-from-s3] Downloading and decompressing s3://rapids-downloads/ci/cudf/pull-request/16611/a6b7eff/cudf_wheel_python_cudf_cu12_py310_x86_64.tar.gz into ./dist download failed: s3://rapids-downloads/ci/cudf/pull-request/16611/a6b7eff/cudf_wheel_python_cudf_cu12_py310_x86_64.tar.gz to - An error occurred (404) when calling the HeadObject operation: Not Found ``` ([build link](https://github.com/rapidsai/cudf/actions/runs/10472939821/job/29004728278?pr=16611)) Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #16612
Contributes to rapidsai/build-planning#33. Proposes the following for `cuspatial` wheels: * add build and runtime dependencies on `libcudf` wheels * stop vendoring copies of `libcudf.so`, `libnvcomp.so`, `libnvcomp_bitcomp.so`, and `libnvcomp_gdeflate.so` - *(load `libcudf.so` dynamically at runtime instead)* And other related changes for development/CI: * combine all `pip install` calls into 1 in wheel-testing scripts - *like rapidsai/cudf#16575 - *to improve the chance that packaging issues are discovered in CI* * `dependencies.yaml` changes: - more use of YAML anchors = less duplication - use dedicated `depends_on_librmm` and `depends_on_libcudf` groups * explicitly pass a package type to `gha-tools` wheel uploading/downloading scripts ## Notes for Reviewers ### Benefits of these changes Unblocks CI in this repo (ref: #1444 (comment), #1441 (comment)). Reduces wheel sizes for `cuspatial` wheels by about 125MB 😁 | wheel | size (before) | size (this PR) | |:-----------:|-------------:|---------------:| | `cuspatial` | 146.0M | 21M | | `cuproj ` | 0.9M | 0.9M | |**TOTAL** | **146.9M** | **21.9M** | *NOTES: size = compressed, "before" = 2024-08-21 nightlies (c60bd4d), CUDA = 12, Python = 3.11* <details><summary>how I calculated those (click me)</summary> ```shell # note: 2024-08-21 because that was the most recent date with # successfully-built cuspatial nightlies # docker run \ --rm \ -v $(pwd):/opt/work:ro \ -w /opt/work \ --network host \ --env RAPIDS_NIGHTLY_DATE=2024-08-21 \ --env RAPIDS_NIGHTLY_SHA=c60bd4d \ --env RAPIDS_PR_NUMBER=1447 \ --env RAPIDS_PY_CUDA_SUFFIX=cu12 \ --env RAPIDS_REPOSITORY=rapidsai/cuspatial \ --env WHEEL_DIR_BEFORE=/tmp/wheels-before \ --env WHEEL_DIR_AFTER=/tmp/wheels-after \ -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.11 \ bash mkdir -p "${WHEEL_DIR_BEFORE}" mkdir -p "${WHEEL_DIR_AFTER}" py_projects=( cuspatial cuproj ) for project in "${py_projects[@]}"; do # before RAPIDS_BUILD_TYPE=nightly \ RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REF_NAME="branch-24.10" \ RAPIDS_SHA=${RAPIDS_NIGHTLY_SHA} \ rapids-download-wheels-from-s3 python "${WHEEL_DIR_BEFORE}" # after RAPIDS_BUILD_TYPE=pull-request \ RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \ RAPIDS_REF_NAME="pull-request/${RAPIDS_PR_NUMBER}" \ rapids-download-wheels-from-s3 python "${WHEEL_DIR_AFTER}" done du -sh ${WHEEL_DIR_BEFORE}/* du -sh ${WHEEL_DIR_BEFORE} du -sh ${WHEEL_DIR_AFTER}/* du -sh ${WHEEL_DIR_AFTER} ``` </details> Reduces the amount of additional work required to start shipping `libcuspatial` wheels. ### Background This is part of ongoing work towards packaging `libcuspatial` as a wheel. relevant prior work: * packaging `libcudf` wheels: rapidsai/cudf#15483 * consolidating `pip install` calls in CI scripts for `cudf`: rapidsai/cudf#16575 * `cudf` dropping its Arrow library dependency: rapidsai/cudf#16640 ### How I tested this Confirmed in local builds and CI logs that `cudf` is being *found*, not *built*, in `cuspatial` builds. ```text -- CPM: Using local package [email protected] ``` ([build link](https://github.com/rapidsai/cuspatial/actions/runs/10602971716/job/29386288614?pr=1447#step:9:23472)) Built `cuspatial` wheels locally and ran all the unit tests, without issue. # Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) - Matthew Roeschke (https://github.com/mroeschke) URL: #1447
## Summary Follow-up to #4690. Proposes consolidating stuff like this in CI scripts: ```shell pip install A pip install B pip install C ``` Into this: ```shell pip install A B C ``` ## Benefits of these changes Reduces the risk of creating a broken environment with incompatible packages. Unlike `conda`, `pip` does not evaluate the requirements of all installed packages when you run `pip` install. Installing `torch` and `cugraph-dgl` at the same time, for example, gives us a chance to find out about packaging issues like *"`cugraph-dgl` and `torch` have conflicting requirements on `{other_package}`"* at CI time. Similar change from `cudf`: rapidsai/cudf#16575 Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Alex Barghi (https://github.com/alexbarghi-nv) URL: #4701
Description
I noticed some common changes to wheel-testing scripts in the PRs splitting off
pylibcudf
(#16299) andlibcudf
(#15483).pip install
's into 1pip
replacing a previously-installed CI package with another one from a remote package repository)These can go onto
branch-24.10
right now, so proposing them in a separate PR so thatcudf
CI can benefit from them without having to wait on those large PRs.Checklist
Notes for Reviewers
This will have conflicts with #16299. I'll update this once that's merged.