Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardize and consolidate wheel installations in testing scripts #16575

Merged
merged 8 commits into from
Aug 19, 2024

Conversation

jameslamb
Copy link
Member

@jameslamb jameslamb commented Aug 15, 2024

Description

I noticed some common changes to wheel-testing scripts in the PRs splitting off pylibcudf (#16299) and libcudf (#15483).

  • consolidating multiple pip install's into 1
    • (this is safer, as it removes the risk of pip replacing a previously-installed CI package with another one from a remote package repository)
  • standardizing the approach used for "install some wheels built earlier in this same CI run"

These can go onto branch-24.10 right now, so proposing them in a separate PR so that cudf CI can benefit from them without having to wait on those large PRs.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Notes for Reviewers

This will have conflicts with #16299. I'll update this once that's merged.

@jameslamb jameslamb added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 15, 2024
RAPIDS_PY_WHEEL_NAME="cudf_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-s3 ./dist
# echo to expand wildcard before adding `[extra]` requires for pip
python -m pip install \
"$(echo ./dist/cudf*.whl)[test,cudf-pandas-tests]"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splitting some of these single-line installations into multiple lines, in anticipation of them picking up pylibcudf and / or libcudf lines as well when those packages are split out of cudf.

@jameslamb jameslamb changed the title WIP: standardize and consolidate wheel installations in testing scripts standardize and consolidate wheel installations in testing scripts Aug 16, 2024
@jameslamb jameslamb marked this pull request as ready for review August 16, 2024 14:52
@jameslamb jameslamb requested a review from a team as a code owner August 16, 2024 14:52
@jameslamb jameslamb mentioned this pull request Aug 16, 2024
12 tasks
@jameslamb jameslamb mentioned this pull request Aug 16, 2024
3 tasks
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change request, but I don't need to review again.

ci/cudf_pandas_scripts/pandas-tests/run.sh Show resolved Hide resolved
Comment on lines 21 to 22
"$(echo ./dist/cudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test,pandas-tests]" \
"$(echo ./dist/pylibcudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test,pandas-tests]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't caused by this PR, but let's get these extras fixed here since they aren't quite right. pylibcudf doesn't have a pandas-tests extra. It does have a test extra, but I wouldn't expect it to be needed in this test run.

Suggested change
"$(echo ./dist/cudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test,pandas-tests]" \
"$(echo ./dist/pylibcudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test,pandas-tests]"
"$(echo ./dist/cudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test,pandas-tests]" \
"$(echo ./dist/pylibcudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like with this combined approach of installation, even though you're explicitly specifying to install pylibcudf that package doesn't get included in dependency resolution for cudf. The two step process might be necessary, or you can switch to adding --find-links to the command.

Copy link
Member Author

@jameslamb jameslamb Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see the failure at https://github.com/rapidsai/cudf/actions/runs/10425017056/job/28875933840?pr=16575.

./dist./distLooking in indexes: https://pypi.org/simple, https://pypi.anaconda.org/rapidsai-wheels-nightly/simple, https://pypi.nvidia.com/
Processing ./dist/cudf_cu12-24.10.0a205-cp310-cp310-manylinux_2_28_x86_64.whl
Processing ./dist/pylibcudf_cu12-24.10.0a205-cp310-cp310-manylinux_2_28_x86_64.whl
...
ERROR: Could not find a version that satisfies the requirement pylibcudf-cu12==24.10.*,>=0.0.0a0 (from cudf-cu12[test]) (from versions: none)
ERROR: No matching distribution found for pylibcudf-cu12==24.10.*,>=0.0.0a0

But did you notice that the CUDA 11.8 version of that same job, using the same script, succeeded?

Processing ./dist/cudf_cu11-24.10.0a205-cp311-cp311-manylinux_2_28_aarch64.whl (from cudf-cu11==24.10.0a205)
Processing ./dist/pylibcudf_cu11-24.10.0a205-cp311-cp311-manylinux_2_28_aarch64.whl (from pylibcudf-cu11==24.10.0a205)
...
Successfully installed ... cudf-cu11-24.10.0a205 ... pylibcudf-cu11-24.10.0a205 ...

(build link)

If this method of dependency resolution didn't work, I'd have expected those jobs to both fail in the same way. I'm going to merge in latest branch-24.10 + apply your suggestion, let's see what happens on a re-run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that thought flashed in and out of my brain while reviewing your PR and left no trace 😂 let's see what we see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still failing, so there is something strange happening here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, thanks!! I hadn't noticed the different pip versions!!

That's totally it.

I was able to reproduce the failure locally.

code to do that (click me)
docker run \
    --rm \
    --env RAPIDS_REPOSITORY=rapidsai/cudf \
    --env _RAPIDS_SHA=7a12b67bd17056d4c980a3c6e89cc8ec9a6bd7c5_ \
    --env RAPIDS_REF_NAME=pull-request/16575 \
    --env RAPIDS_BUILD_TYPE="pull-request" \
    -it rapidsai/citestwheel:cuda12.5.1-ubuntu22.04-py3.10 \
    bash

RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"

# Download the cudf and pylibcudf built in the previous step
RAPIDS_PY_WHEEL_NAME="cudf_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-s3 ./dist
RAPIDS_PY_WHEEL_NAME="pylibcudf_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-s3 ./dist

python -m pip install -v \
  "$(echo ./dist/cudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test]" \
  "$(echo ./dist/pylibcudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test]"

Inspected the package metadata with pkginfo, didn't see any issues... names, versions, and dependency lists all looked correct.

Upgraded pip (to 24.2) and tried again:

pip install --upgrade pip

python -m pip install -v \
  "$(echo ./dist/cudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test]" \
  "$(echo ./dist/pylibcudf_${RAPIDS_PY_CUDA_SUFFIX}*.whl)[test]"

That succeeded 🎉

This makes me think we should be running pip install --upgrade pip every time the images at https://github.com/rapidsai/ci-imgs are rebuilt. Can you think of strong reason we shouldn't do that? If not, I'll put up a PR proposing that and see what others say.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main argument against updating pip unconditionally everywhere is that we shouldn't require our users to always have the latest version of pip to be able to install our software. This has come up before in devcontainers, for instance, where the latest version of pip available as a system-installable package is significantly older than the latest version available from PyPI, and the question is raised whether it's really a good idea to require a newer one.

For now, for the purpose of CI I think it's OK to always update so that at least we have consistent behavior. At some point we might want to introduce some form of testing to ensure that our packages are actually installable with different versions of pip, though. We have in the past done things (e.g. the rapids-dask-dependency shenanigans) that ultimately require a fairly new version of pip, which could be an onerous constraint on some users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, ok yeah that makes sense.

I've opened rapidsai/ci-imgs#171 proposing upgrading pip in the wheel CI images.

At some point we might want to introduce some form of testing to ensure that our packages are actually installable with different versions of pip, though.

I agree. This type of thing should be tested directly (similar to rapidsai/build-planning#81) if we want to have high confidence in it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the new CI images are published (https://github.com/rapidsai/ci-imgs/actions/runs/10457672191), I've restarted CI here. Merge latest branch-24.10 into this (figured I might as well, to get a bit more useful test coverage, if I'm going to re-run so many jobs anyway).

Hopefully with newer pip across all the CI images, this newer, stricter pattern for these scripts will work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 6ccc2c2 into rapidsai:branch-24.10 Aug 19, 2024
84 of 86 checks passed
@jameslamb jameslamb deleted the misc/ci-installs branch August 19, 2024 21:39
rapids-bot bot pushed a commit that referenced this pull request Aug 20, 2024
Removes unnecessary installation of `cudf` wheels in wheel testing for `cudf_polars`.

`cudf_polars` doesn't depend on `cudf`, and neither do its tests. However, right now it's downloading `cudf` during it's wheel tests. I mistakenly introduced that in #16575.

This introduced a race condition that could lead to CI failures whenever the `cudf` wheels aren't published yet by the time the `cudf_polars` tests. Because the `cudf_polars` wheel tests (rightly) do not wait for `cudf` wheels to be available:

https://github.com/rapidsai/cudf/blob/555734dee7a8fb10f50c8609a8e4fb2c025e6305/.github/workflows/pr.yaml#L154-L155

https://github.com/rapidsai/cudf/blob/555734dee7a8fb10f50c8609a8e4fb2c025e6305/.github/workflows/pr.yaml#L145-L146

Noticed this in #16611

```text
[rapids-download-from-s3] Downloading and decompressing s3://rapids-downloads/ci/cudf/pull-request/16611/a6b7eff/cudf_wheel_python_cudf_cu12_py310_x86_64.tar.gz into ./dist
download failed: s3://rapids-downloads/ci/cudf/pull-request/16611/a6b7eff/cudf_wheel_python_cudf_cu12_py310_x86_64.tar.gz to - An error occurred (404) when calling the HeadObject operation: Not Found
```

([build link](https://github.com/rapidsai/cudf/actions/runs/10472939821/job/29004728278?pr=16611))

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #16612
rapids-bot bot pushed a commit to rapidsai/cuspatial that referenced this pull request Aug 29, 2024
Contributes to rapidsai/build-planning#33.

Proposes the following for `cuspatial` wheels:

* add build and runtime dependencies on `libcudf` wheels
* stop vendoring copies of `libcudf.so`, `libnvcomp.so`, `libnvcomp_bitcomp.so`, and `libnvcomp_gdeflate.so`
  - *(load `libcudf.so` dynamically at runtime instead)*

And other related changes for development/CI:

* combine all `pip install` calls into 1 in wheel-testing scripts
  - *like rapidsai/cudf#16575
  - *to improve the chance that packaging issues are discovered in CI*
* `dependencies.yaml` changes:
   - more use of YAML anchors = less duplication
   - use dedicated `depends_on_librmm` and `depends_on_libcudf` groups
* explicitly pass a package type to `gha-tools` wheel uploading/downloading scripts

## Notes for Reviewers

### Benefits of these changes

Unblocks CI in this repo (ref: #1444 (comment), #1441 (comment)).

Reduces wheel sizes for `cuspatial` wheels by about 125MB 😁 

| wheel          | size (before)  | size (this PR) |
|:-----------:|-------------:|---------------:|
| `cuspatial` |   146.0M        |   21M               |
| `cuproj `     |       0.9M       |   0.9M              |
|**TOTAL**   |  **146.9M** | **21.9M**        |

*NOTES: size = compressed, "before" = 2024-08-21 nightlies (c60bd4d), CUDA = 12, Python = 3.11*

<details><summary>how I calculated those (click me)</summary>

```shell
# note: 2024-08-21 because that was the most recent date with
#           successfully-built cuspatial nightlies
#
docker run \
    --rm \
    -v $(pwd):/opt/work:ro \
    -w /opt/work \
    --network host \
    --env RAPIDS_NIGHTLY_DATE=2024-08-21 \
    --env RAPIDS_NIGHTLY_SHA=c60bd4d \
    --env RAPIDS_PR_NUMBER=1447 \
    --env RAPIDS_PY_CUDA_SUFFIX=cu12 \
    --env RAPIDS_REPOSITORY=rapidsai/cuspatial \
    --env WHEEL_DIR_BEFORE=/tmp/wheels-before \
    --env WHEEL_DIR_AFTER=/tmp/wheels-after \
    -it rapidsai/ci-wheel:cuda12.5.1-rockylinux8-py3.11 \
    bash

mkdir -p "${WHEEL_DIR_BEFORE}"
mkdir -p "${WHEEL_DIR_AFTER}"

py_projects=(
    cuspatial
    cuproj
)

for project in "${py_projects[@]}"; do
    # before
    RAPIDS_BUILD_TYPE=nightly \
    RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \
    RAPIDS_REF_NAME="branch-24.10" \
    RAPIDS_SHA=${RAPIDS_NIGHTLY_SHA} \
        rapids-download-wheels-from-s3 python "${WHEEL_DIR_BEFORE}"

    # after
    RAPIDS_BUILD_TYPE=pull-request \
    RAPIDS_PY_WHEEL_NAME="${project}_${RAPIDS_PY_CUDA_SUFFIX}" \
    RAPIDS_REF_NAME="pull-request/${RAPIDS_PR_NUMBER}" \
        rapids-download-wheels-from-s3 python "${WHEEL_DIR_AFTER}"
done

du -sh ${WHEEL_DIR_BEFORE}/*
du -sh ${WHEEL_DIR_BEFORE}
du -sh ${WHEEL_DIR_AFTER}/*
du -sh ${WHEEL_DIR_AFTER}
```

</details>

Reduces the amount of additional work required to start shipping `libcuspatial` wheels.

### Background

This is part of ongoing work towards packaging `libcuspatial` as a wheel.

relevant prior work:

* packaging `libcudf` wheels: rapidsai/cudf#15483
* consolidating `pip install` calls in CI scripts for `cudf`: rapidsai/cudf#16575
* `cudf` dropping its Arrow library dependency: rapidsai/cudf#16640

### How I tested this

Confirmed in local builds and CI logs that `cudf` is being *found*, not *built*, in `cuspatial` builds.

```text
-- CPM: Using local package [email protected]
```

([build link](https://github.com/rapidsai/cuspatial/actions/runs/10602971716/job/29386288614?pr=1447#step:9:23472))

Built `cuspatial` wheels locally and ran all the unit tests, without issue.

#

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #1447
rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this pull request Oct 10, 2024
## Summary

Follow-up to #4690.

Proposes consolidating stuff like this in CI scripts:

```shell
pip install A
pip install B
pip install C
```

Into this:

```shell
pip install A B C
```

## Benefits of these changes

Reduces the risk of creating a broken environment with incompatible packages. Unlike `conda`, `pip` does not evaluate the requirements of all installed packages when you run `pip` install.

Installing `torch` and `cugraph-dgl` at the same time, for example, gives us a chance to find out about packaging issues like *"`cugraph-dgl` and `torch` have conflicting requirements on `{other_package}`"* at CI time.

Similar change from `cudf`: rapidsai/cudf#16575

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)
  - Alex Barghi (https://github.com/alexbarghi-nv)

URL: #4701
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants