-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] wheel tests do not fail when raft-dask
wheel has unsatisfiable dependency requirements
#2348
Comments
This was referenced May 31, 2024
Wow, well written! Thanks for documenting this bug |
rapids-bot bot
pushed a commit
that referenced
this issue
Jun 3, 2024
… run, fix wheel dependencies (#2349) Fixes #2348 #2331 introduced `rapids-build-backend` (https://github.com/rapidsai/rapids-build-backend) as the build backend for `pylibraft`, `raft-dask`, and `raft-ann-bench`. That library handles automatically modifying a wheel's dependencies based on the target CUDA version. Unfortunately, we missed a few cases in #2331, and as a result the last few days of nightly `raft-dask` wheels had the following issues: * depending on `pylibraft` - *(does not exist, it's called `pylibraft-cu12`)* * depending on `ucx-py==0.39.*` - *(does not exist, it's called `ucx-py-cu12`)* * depending on `distributed-ucxx-cu11==0.39.*` instead of `distributed-ucxx-cu11==0.39.*,>=0.0.0a0` - *(without that alpha specifier, `pip install --pre` is required to install pre-release versions)* This wasn't caught in `raft`'s CI, but in downstream CI like `cuml` and `cugraph`, with errors like this: ```text ERROR: ResolutionImpossible: The conflict is caused by: raft-dask-cu12 24.8.0a20 depends on pylibraft==24.8.* and >=0.0.0a0 raft-dask-cu12 24.8.0a19 depends on pylibraft==24.8.* and >=0.0.0a0 ``` ([example cugraph build](https://github.com/rapidsai/cugraph/actions/runs/9315062495/job/25656684762?pr=4454#step:7:1811)) This PR: * fixes those dependency issues * modifies `raft`'s CI so that similar issues would be caught here in the future, before publishing wheels ## Notes for Reviewers ### What was the root cause of CI missing this, and how does this PR fix it? The `raft-dask` test CI jobs use this pattern to install the `raft-dask` wheel built earlier in the CI pipeline. ```shell pip install "raft_dask-cu12[test]>=0.0.0a0" --find-links dist/ ``` As described in the `pip` docs ([link](https://pip.pypa.io/en/stable/cli/pip_install/#finding-packages)), `--find-links` just adds a directory to the list of other places `pip` searches for packages. Because the wheel there had unsatisfiable constraints (e.g. `pylibraft==24.8.*` does not exist anywhere), `pip install` silently disregarded that locally-downloaded `raft_dask` wheel and backtracked (i.e. downloaded older and older wheels from https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/) until it found one that wasn't problematic. This PR ensures that won't happen by telling `pip` to install **exactly that locally-downloaded file**, like this ```shell pip install "$(echo ./dist/raft_dask_cu12*.whl)[test]" ``` If that file is uninstallable, `pip install` fails and you find out via a CI failure. ### How I tested this Initially pushed a commit with just the changes to the test script. Saw the `wheel-tests-raft-dask` CI jobs fail in the expected way, instead of silently falling back to an older wheel and passing 🎉 . ```text ERROR: Could not find a version that satisfies the requirement ucx-py-cu12==0.39.* (from raft-dask-cu12[test]) (from versions: 0.32.0, 0.33.0, 0.34.0, 0.35.0, 0.36.0, 0.37.0, 0.38.0a4, 0.38.0a5, 0.38.0a6, 0.39.0a0) ERROR: No matching distribution found for ucx-py-cu12==0.39.* ``` ([build link](https://github.com/rapidsai/raft/actions/runs/9323598882/job/25668146747?pr=2349)) Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) - Dante Gama Dessavre (https://github.com/dantegd) URL: #2349
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
We recently observed a situation where
raft-dask
nightly wheels were being published with duplicated dependencies:pylibraft-cu12==24.8.*,>=0.0.0a0
ANDpylibraft==24.8.*,>=0.0.0a0
ucx-py-cu12==0.39.*
ANDucx-py==0.39.*
The unsuffixed ones are a mistake, fixed in #2347. However... that was only caught by
cugraph
's CI (build link).It should have been caught here in
raft
's CI, probably here:raft/ci/test_wheel_raft_dask.sh
Line 14 in 8ef71de
Steps/Code to reproduce bug
Trying to reproduce a very recent CI build that passed despite using wheels that suffer from the issue fixed in #2437 (build link).
Ran a container mimicking what was used in that CI run.
And then the following code mirroring
ci/test_wheel_raft_dask.sh
(code link), with a bit of extra debugging stuff added.setup mimicking what happens in CI (click me)
Checked if there was extra
pip
configuration setup in the image.Just one, an index URL.
Checked the version of
pip
.pip --version # 23.0.1
Installed
pkginfo
to inspect the wheels.Downloaded wheels from the same CI run and put them in separate directories.
Inspected them to confirm that:
name
fields have-cu12
suffixraft_dask
wheel depends on bothpylibraft-cu12
andpylibraft
They do.
Installed the
pylibraft
wheel, just as the test script does.python -m pip -v install --no-deps ./local-pylibraft-dep/pylibraft*.whl
That worked as expected.
With that set up (a
raft_dask-cu12
wheel in./dist
andpylibraft-cu12
already installed), I ran the following:python -m pip -v install "raft_dask-${RAPIDS_PY_CUDA_SUFFIX}[test]>=0.0.0a0" --find-links dist/
Just like we observed in CI:
pylibraft
(unsuffixed) is not mentioned in the logs, but all other dependencies areHOWEVER... this alternative form fails in the expected way.
python -m pip -v install ./dist/*.whl
Expected behavior
I expected CI to fail because the constraints
pylibraft==24.8.*
anducx-py==0.39.*
are not satisfiable (those packages do not exist).Environment details (please complete the following information):
nvidia-smi (click me)
Additional context
The particular unsatisfiable dependency issue was likely introduced by recent changes adding
rapids-build-backend
(#2331, for rapidsai/build-planning#31). But in theory this could just as easily happen with some other unrelated issue with dependencies, like a typo of the formjoblibbbbb
or something.I am actively investigating this (along with @bdice and @nv-rliu). Just posting for documentation purposes.
The text was updated successfully, but these errors were encountered: