Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Test against old versions of key dependencies #16570

Merged
merged 33 commits into from
Sep 4, 2024

Conversation

seberg
Copy link
Contributor

@seberg seberg commented Aug 15, 2024

This adds explicit tests with old versions of key dependencies. Specifically:

  • numba==0.57
  • numpy==1.23
  • pandas==2.0
  • fsspec==0.6.0 excluded it. transformers==4.39.3 requires huggingface_hub which requires fsspec>=2023.5.0. In principle one could include it e.g. only for conda which doesn't pull in transformers, but that seemed not worth the trouble?
  • cupy==12.0.0
  • pyarrow==16.1.0

See also rapidsai/build-planning#81

(Marking as draft until I see that things work.)

@seberg seberg added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 15, 2024
This adds explicit tests with old versions of key dependencies.
Specifically:
- `numba==0.57`
- `numpy==1.23`
- `pandas==2.0`
- `fsspec==0.6.0`
- `cupy==12.0.0`
- `pyarrow==16.1.0`
@seberg seberg force-pushed the test-oldest branch 2 times, most recently from 5af6f41 to 8edb468 Compare August 16, 2024 11:14
Copy link

copy-pr-bot bot commented Aug 16, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

huggingface is pulled in for some jobs, which in turns pulls in
fsspec but with a higher version constraint of 2023.5.0.
So while the old fsspec is probably fine as a direct dependency
we cannot (generally) enforce it in all tests.

Signed-off-by: Sebastian Berg <[email protected]>
@seberg
Copy link
Contributor Author

seberg commented Aug 16, 2024

@jameslamb maybe you can have a look w.r.t. to the setup/yaml? One of the builds currently failed, but I doubt a bit that it is related to the change (because build steps should be unaffected).

@mroeschke pinging maybe you know well what to do with the failures already showing up? Many seem just simple but e.g. this run has a surprising amount of failures.

(Have to check that this doesn't accidentally drop running "latest" tests for any setup.)

EDIT: Hmmm, I guess #15100 might be a good start for things that need to be readded if running <2.2.

@github-actions github-actions bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Aug 16, 2024
ci/test_python_common.sh Outdated Show resolved Hide resolved
@jameslamb
Copy link
Member

@jameslamb maybe you can have a look w.r.t. to the setup/yaml? One of the builds currently failed, but I doubt a bit that it is related to the change (because build steps should be unaffected).

At a quick glance, changes look like what I'd expect. Nice trick including cupy== alongside cupy-cu{major}x together! I agree with you, that's a nice advantage of --constraint over --requirement.

If you fix conflicts + apply the one small suggestion I left, hopefully you'll see builds pass here. Once you've done that, @ me and I'd be happy to review again and more carefully.

@jameslamb
Copy link
Member

@seberg I just merged #16575 which is going to create even more merge conflicts for you, sorry 😬

Could you try to follow the patterns from that PR? They're a little bit stricter and safer than what we had before in the case where a CI job is installing multiple other CI artifacts (e.g. cudf_polars jobs installing cudf, pylibcudf, and cudf_polars wheels).

@mroeschke
Copy link
Contributor

pinging maybe you know well what to do with the failures already showing up?

From a glance the flavors of errors appears to be:

  • A potential bug/behavior that existed in pandas 2.0 that is different on pandas 2.2.x (which cudf tries to align with)
  • A new API that was introduced post pandas 2.0 (I recently made cudf APIs align with pandas 2.2.x)

IIRC from offline discussion of testing multiple pandas versions that we didn't want to overly complicate the test suite with version checking due to discrepancies in minor versions. I suspect we can largely use this pytest mark pattern for these:

@pytest.mark.skipif(
    PANDAS_VERSION < PANDAS_CURRENT_SUPPORTED_VERSION,
    reason="Fails in older versions of pandas",
)

dependencies.yaml Outdated Show resolved Hide resolved
Comment on lines 21 to 28
if [[ $RAPIDS_DEPENDENCIES == "oldest" ]]; then
# `test_python` constraints are for `[test]` not `[cudf-pandas-tests]`
rapids-dependency-file-generator \
--output requirements \
--file-key test_python \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES}" \
| tee ./constraints.txt
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think this is going to be a common pattern that's worth extracting into a gha-tool generate-oldest-constraints $file_key or similar? @jameslamb WDYT? It occurs many times in this PR and I assume we'll do something similar in other repos.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's worth doing!

The only part that will vary from repo-to-repo is --file-key, everything else here is mechanical and a great candidate for a gha-tools script. I could put one up in a day or two, unless you or @seberg want to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest one of the two of you pick this up. Sebastian is out for a bit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could tackle this that would be great @jameslamb. No rush. @mroeschke if you get everything else here finalized and have it ready to merge, I'm fine merging this PR with this code in and refactoring in a follow-up when James has the gha-tool ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put up a proposal in rapidsai/gha-tools#114

ci/test_wheel_cudf.sh Outdated Show resolved Hide resolved

else:

def expect_inner(left, right, sort):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain which tests you've chosen to xfail and which ones are skipped in this PR? AFAIK since all the not PANDAS_GE_220 checks are negated, all the conditions are applying to tests that failed in older versions of pandas but pass on current pandas, which seems like they would be just as OK to skip. My comment above may have been unclear; to me, xfailing is most useful when we anticipate eventually getting an xpass as a signal. In these cases I don't think we ever will because we're xfailing tests on old versions that will only pass on newer versions. I'm fine using a different rationale if you think it's appropriate, but that's the source of my confusion here.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with all the changes here! Thanks team.

Copy link
Member

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving dask-cuDF code changes

@mroeschke
Copy link
Contributor

Looks like with this new configuration, any ad-hoc job using test_python_common dependency will use the oldest dependencies by default e.g unit-tests-cudf-pandas, pandas-tests, wheel-tests-cudf-polars, wheel-tests-cudf-dask

Shouldn't we use the latest dependencies by default?

Specifically, the pandas-tests job clone the pandas unit tests from the version 2.2 tests but this job installs pandas 2.0, so there may be some incompatibility that's causing this job to time out

@vyasr
Copy link
Contributor

vyasr commented Sep 4, 2024

The oldest dependency usage is entirely governed by the logic in the various ci/* scripts, which are reading the value of RAPIDS_DEPENDENCIES and setting up a constraint accordingly. If there are jobs that should not respect that and should always run with the latest deps (and you're right, I think there are) then those scripts should remove that logic entirely.

@mroeschke
Copy link
Contributor

mroeschke commented Sep 4, 2024

Do other have thoughts whether the wheel-tests-cudf-polars or wheel-tests-cudf-dask jobs should just run with the oldest dependencies or should they run with the latest dependencies?

@vyasr
Copy link
Contributor

vyasr commented Sep 4, 2024

I think it's sufficient for just the one job running the main cudf tests to be run with oldest dependencies for now. We'll probably want to do something similar for polars once we start supporting a range. For dask, I don't think we'll ever be able to support a range anyway; we always pin tightly.

@mroeschke
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit ad1369d into rapidsai:branch-24.10 Sep 4, 2024
88 checks passed
@seberg seberg deleted the test-oldest branch September 10, 2024 16:49
res-life pushed a commit to res-life/cudf that referenced this pull request Sep 11, 2024
This adds explicit tests with old versions of key dependencies. Specifically:
- `numba==0.57`
- `numpy==1.23`
- `pandas==2.0`
- ~`fsspec==0.6.0`~ excluded it. `transformers==4.39.3` requires `huggingface_hub` which requires `fsspec>=2023.5.0`.  In principle one could include it e.g. only for conda which doesn't pull in `transformers`, but that seemed not worth the trouble?
- `cupy==12.0.0`
- `pyarrow==16.1.0`

See also rapidsai/build-planning#81

(Marking as draft until I see that things work.)

Authors:
  - Sebastian Berg (https://github.com/seberg)
  - Matthew Roeschke (https://github.com/mroeschke)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

URL: rapidsai#16570
rapids-bot bot pushed a commit that referenced this pull request Oct 31, 2024
…CI (#17131)

Follow-up to #16570 (comment)

Proposes using the new `rapids-generate-pip-constraints` tool from `gha-tools` to generate a list of pip constraints pinning to the oldest supported verisons of dependencies here.

## Notes for Reviewers

### How I tested this

rapidsai/gha-tools#114 (comment)

You can also see one the most recent `wheel-tests-cudf` builds here:

* oldest-deps: numpy 1.x ([build link](https://github.com/rapidsai/cudf/actions/runs/11615430314/job/32347576688?pr=17131)) 
* latest-deps: numpy 2.x ([build link](https://github.com/rapidsai/cudf/actions/runs/11615430314/job/32347577095?pr=17131))

#

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #17131
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf.pandas Issues specific to cudf.pandas improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

9 participants