Re-enable cucim and xgboost in CUDA 12 rapids builds. #669

bdice · 2023-07-25T21:08:08Z

PR #664 temporarily disabled CUDA 12 packages for cucim and xgboost in rapids. This re-enables those.

This reverts commit cc272c4.

This can be merged once the following issues are closed:

This reverts commit cc272c4.

…-12-cucim-xgboost

bdice · 2023-07-26T17:15:43Z

I also enabled testing of conda packages. I think that even if we can't import GPU libraries, we should be able to ensure that the packages contained in rapids or rapids-xgboost are solvable. I'm going to cut down the tests (by commenting out any problems) until they pass. This will help keep us from merging broken metapackages in the future.

jakirkham

Thanks Bradley! 🙏

Made a few minor comments below

conda/recipes/rapids-xgboost/meta.yaml

conda/recipes/versions.yaml

Co-authored-by: jakirkham <[email protected]>

bdice · 2023-07-26T18:16:03Z

We're going to have to disable tests for rapids since it depends on rapids-xgboost which is built separately. I would like to fix this in a follow-up by combining the two recipes into a single recipe with two outputs. For now, testing on rapids-xgboost found a bug that I fixed in d1aab9c. I am currently testing locally with rapids-xgboost removed from rapids (so that it can solve) but I am facing difficulty with solving custreamz. I thought this would be fixed by rapidsai/cudf#13754 but mamba doesn't seem to be finding the new packages...

jakirkham · 2023-07-26T19:08:28Z

Updating this thread, sounds like we need PR ( rapidsai/cudf#13769 ) to fix a couple more build/strings

jakirkham · 2023-07-26T21:10:38Z

That PR has been merged and packages uploaded

…ration into enable-cuda-12-cucim-xgboost

jakirkham

Thanks Bradley! 🙏

Think we may need --use-local to pick up locally built packages when testing

ci/build_python.sh

Co-authored-by: jakirkham <[email protected]>

jakirkham · 2023-07-26T22:03:22Z

Seeing this on CI:

The reported errors are:
- Encountered problems while solving:
-   - nothing provides __cuda needed by xgboost-1.7.4-rapidsai_cuda112py38hf635370_1

Think we need to set the environment variable CONDA_OVERRIDE_CUDA to some value to "convince" Conda it is ok to install

ci/build_python.sh

…ration into enable-cuda-12-cucim-xgboost

jakirkham · 2023-07-26T22:11:53Z

Should we add this to both of these (as we've done with other recipes like cuDF)?

test:
  requires:
    - cuda-version ={{ cuda_version }}

Added suggestions below:

ci/build_python.sh

conda/recipes/rapids-xgboost/meta.yaml

conda/recipes/rapids/meta.yaml

Co-authored-by: jakirkham <[email protected]>

bdice · 2023-07-26T22:45:46Z

We’ll need to disable the import tests because this runs on CPU — but just solving the environment with no imports is better than no testing.

vyasr · 2023-07-26T22:46:40Z

Looks like we can't actually run the tests on the runner we're using because it doesn't have a working CUDA installation, but at least testing that the environment solves is helpful.

vyasr · 2023-07-26T22:47:04Z

We’ll need to disable the import tests because this runs on CPU — but just solving the environment with no imports is better than no testing.

Jinx

bdice · 2023-07-26T23:02:15Z

ci/build_python.sh

@@ -6,20 +6,21 @@ set -euo pipefail
 source rapids-env-update

 CONDA_CONFIG_FILE="conda/recipes/versions.yaml"
+export CONDA_OVERRIDE_CUDA="${RAPIDS_CUDA_VERSION}"


We should probably consider the implications here. Today, I think rapids is installable even on CPU-only machines. The new rapids-xgboost package design requires __cuda to install. This is important to support for cases like HPC systems with CPU login nodes and GPU worker nodes that use the same environment.

Yeah they can also install by setting CONDA_OVERRIDE_CUDA to some value

In any event, this is coming from libxgboost. So we could move this just to the rapids-xgboost if we prefer

The new rapids-xgboost package design

What is the relevant new change? Is it in libxgboost or in something about how rapids-xgboost is packaged?

Okay. Well, maybe this requirement already existed. I'm not sure.

The question is whether __cuda should be a hard requirement for installation, which is coming from xgboost-related packages. I'm not sure if it was that way for the old xgboost packages we shipped in 23.06 or not. Regardless, it feels funny that no other RAPIDS package has this requirement besides xgboost.

Dropping this in PR ( #673 ), which pulls in the new xgboost packages

conda/recipes/rapids-xgboost/meta.yaml

jakirkham · 2023-07-26T23:11:18Z

conda/recipes/rapids/meta.yaml

-  imports:         # [linux64]
-    - cucim        # [linux64]
-    - cudf         # [linux64]
-    - cudf_kafka   # [linux64]
-    - cugraph      # [linux64]
-    - cuml         # [linux64]
-    {% if cuda_major == "11" %}
-    - cusignal     # [linux64]
-    {% endif %}
-    - cuspatial    # [linux64]
-    - custreamz    # [linux64]
-    - cuxfilter    # [linux64]
-    - dask_cuda    # [linux64]
-    - dask_cudf    # [linux64]
-    - pylibcugraph # [linux64]
-    - rmm          # [linux64]


We could run these through pkgutil.find_loader in a run_test.py script in the recipe

This would let us test for their existence without needing to import them (and thus not need a GPU to test)

I don't think this is necessary for this PR, maybe file an issue or PR with this proposal later on. I feel comfortable with the current level of testing, which is higher than what we had before.

Co-authored-by: jakirkham <[email protected]>

jakirkham

LGTM. Thanks all! 🙏

Had a few comments above, but none of them are blocking

bdice · 2023-07-26T23:58:35Z

Potentially blocking issue with xgboost builds requiring __cuda:

Previous release installation did not require __cuda to install:

$ CONDA_OVERRIDE_CUDA="" mamba create -n rapids-23.06-cpu -c rapidsai -c conda-forge -c nvidia rapids=23.06 python=3.10 cudatoolkit=11.8

The above command succeeds.

The proposed changes here would require __cuda to install. This was previously not a constraint for users installing rapids and poses a challenge for users of (for example) HPC systems where login nodes do not have GPUs, and only worker nodes have GPUs.

$ CONDA_OVERRIDE_CUDA="" mamba create -n rapids-23.08-cpu -c rapidsai-nightly -c conda-forge -c nvidia rapids=23.08 python=3.10 cuda-version=12.0 'xgboost=1.7.4*=rapidsai_cuda*'

...

The following package could not be installed
└─ xgboost 1.7.4* rapidsai_cuda* is uninstallable because it requires
   └─ __cuda  , which is missing on the system.

The above command fails on CPU-only systems. This is testing the currently nightly rapids which doesn't include xgboost, but with xgboost manually added as the recipe does in this PR.

I think this issue is blocking for the 23.08 release, but not necessarily for this PR. I'd be fine with merging this PR and discussing this issue separately. I can revisit this tomorrow with others.

jakirkham · 2023-07-27T00:34:54Z

After discussion offline, we concluded the XGBoost issue is non-blocking for this PR. So it should be good to merge

We are evaluating options to fix XGBoost packages to not require __cuda with stakeholders. Once a fix is deployed we can come back and remove the CONDA_OVERRIDE_CUDA line to test the fix and simplify things here

ajschmidt8 · 2023-07-27T13:04:41Z

ci/build_python.sh

  --variant-config-files "${CONDA_CONFIG_FILE}" \
  conda/recipes/rapids-xgboost

 rapids-logger "Build rapids"

 rapids-mamba-retry mambabuild \
-  --no-test \
+  --use-local \


just want to mention that for other repos, we use --channel "${RAPIDS_CONDA_BLD_OUTPUT_DIR}": https://github.com/rapidsai/cudf/blob/abb59c83128f956c7edcb4d7744cb0faecf0026c/ci/build_python.sh#L18-L39

RAPIDS_CONDA_BLD_OUTPUT_DIR is set in our CI images: https://github.com/rapidsai/ci-imgs/blob/75cad918c44c6e00480001b24cb764e1b43fa0a5/Dockerfile#L108-L113

It looks like --use-local works, I'm just pointing this out in case we want consistency.

Yeah --use-local will check the same things. Please see this list of paths that Conda checks when --use-local is set

raydouglass · 2023-07-27T14:41:22Z

/merge

Re-enable cucim and xgboost in CUDA 12 rapids builds.

7da6367

This reverts commit cc272c4.

jakirkham mentioned this pull request Jul 25, 2023

Temporarily disable cucim and xgboost in CUDA 12 rapids builds. #664

Merged

bdice marked this pull request as ready for review July 26, 2023 13:43

bdice requested a review from a team as a code owner July 26, 2023 13:43

bdice added 3 commits July 26, 2023 09:51

Update xgboost pinnings.

921f35e

Merge remote-tracking branch 'upstream/branch-23.08' into enable-cuda…

f4660ee

…-12-cucim-xgboost

Enable testing of conda packages.

9417308

Fix cuda_version to cuda-version.

d1aab9c

jakirkham reviewed Jul 26, 2023

View reviewed changes

conda/recipes/rapids-xgboost/meta.yaml Outdated Show resolved Hide resolved

conda/recipes/rapids-xgboost/meta.yaml Outdated Show resolved Hide resolved

conda/recipes/versions.yaml Show resolved Hide resolved

bdice and others added 2 commits July 26, 2023 13:01

Update conda/recipes/rapids-xgboost/meta.yaml

8b16f6b

Co-authored-by: jakirkham <[email protected]>

Update conda/recipes/rapids-xgboost/meta.yaml

e9a1ff8

Co-authored-by: jakirkham <[email protected]>

bdice added 2 commits July 26, 2023 14:39

Skip tests for rapids.

8e32ca7

Merge branch 'enable-cuda-12-cucim-xgboost' of github.com:bdice/integ…

9a1ab9d

…ration into enable-cuda-12-cucim-xgboost

jakirkham reviewed Jul 26, 2023

View reviewed changes

ci/build_python.sh Show resolved Hide resolved

ci/build_python.sh Outdated Show resolved Hide resolved

jakirkham reviewed Jul 26, 2023

View reviewed changes

ci/build_python.sh Outdated Show resolved Hide resolved

Enable --use-local.

1b21545

Co-authored-by: jakirkham <[email protected]>

raydouglass approved these changes Jul 26, 2023

View reviewed changes

jakirkham reviewed Jul 26, 2023

View reviewed changes

ci/build_python.sh Show resolved Hide resolved

bdice added 2 commits July 26, 2023 15:06

Use CONDA_OVERRIDE_CUDA.

e7ebe5d

Merge branch 'enable-cuda-12-cucim-xgboost' of github.com:bdice/integ…

328f27f

…ration into enable-cuda-12-cucim-xgboost

jakirkham reviewed Jul 26, 2023

View reviewed changes

ci/build_python.sh Outdated Show resolved Hide resolved

jakirkham reviewed Jul 26, 2023

View reviewed changes

conda/recipes/rapids-xgboost/meta.yaml Show resolved Hide resolved

jakirkham reviewed Jul 26, 2023

View reviewed changes

conda/recipes/rapids/meta.yaml Show resolved Hide resolved

Update ci/build_python.sh

264fa10

Co-authored-by: jakirkham <[email protected]>

bdice and others added 2 commits July 26, 2023 17:30

Update conda/recipes/rapids/meta.yaml

6b0f77b

Co-authored-by: jakirkham <[email protected]>

Update conda/recipes/rapids-xgboost/meta.yaml

2d55e20

Co-authored-by: jakirkham <[email protected]>

Stop testing imports but ensure an environment is created

a2bcb83

bdice commented Jul 26, 2023

View reviewed changes

jakirkham reviewed Jul 26, 2023

View reviewed changes

conda/recipes/rapids-xgboost/meta.yaml Show resolved Hide resolved

jakirkham reviewed Jul 26, 2023

View reviewed changes

Update conda/recipes/rapids-xgboost/meta.yaml

3743b2b

Co-authored-by: jakirkham <[email protected]>

jakirkham approved these changes Jul 26, 2023

View reviewed changes

vyasr approved these changes Jul 27, 2023

View reviewed changes

ajschmidt8 reviewed Jul 27, 2023

View reviewed changes

raydouglass approved these changes Jul 27, 2023

View reviewed changes

rapids-bot bot merged commit f0c7766 into rapidsai:branch-23.08 Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-enable cucim and xgboost in CUDA 12 rapids builds. #669

Re-enable cucim and xgboost in CUDA 12 rapids builds. #669

bdice commented Jul 25, 2023

bdice commented Jul 26, 2023 •

edited

Loading

jakirkham left a comment

bdice commented Jul 26, 2023 •

edited

Loading

jakirkham commented Jul 26, 2023

jakirkham commented Jul 26, 2023

jakirkham left a comment

jakirkham commented Jul 26, 2023

jakirkham commented Jul 26, 2023 •

edited

Loading

bdice commented Jul 26, 2023

vyasr commented Jul 26, 2023

vyasr commented Jul 26, 2023

bdice Jul 26, 2023

jakirkham Jul 26, 2023

vyasr Jul 26, 2023

bdice Jul 26, 2023

jakirkham Aug 1, 2023

jakirkham Jul 26, 2023

bdice Jul 26, 2023

jakirkham left a comment

bdice commented Jul 26, 2023

jakirkham commented Jul 27, 2023

ajschmidt8 Jul 27, 2023

jakirkham Aug 1, 2023

raydouglass commented Jul 27, 2023

Re-enable cucim and xgboost in CUDA 12 rapids builds. #669

Re-enable cucim and xgboost in CUDA 12 rapids builds. #669

Conversation

bdice commented Jul 25, 2023

bdice commented Jul 26, 2023 • edited Loading

jakirkham left a comment

Choose a reason for hiding this comment

bdice commented Jul 26, 2023 • edited Loading

jakirkham commented Jul 26, 2023

jakirkham commented Jul 26, 2023

jakirkham left a comment

Choose a reason for hiding this comment

jakirkham commented Jul 26, 2023

jakirkham commented Jul 26, 2023 • edited Loading

bdice commented Jul 26, 2023

vyasr commented Jul 26, 2023

vyasr commented Jul 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakirkham left a comment

Choose a reason for hiding this comment

bdice commented Jul 26, 2023

jakirkham commented Jul 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raydouglass commented Jul 27, 2023

bdice commented Jul 26, 2023 •

edited

Loading

bdice commented Jul 26, 2023 •

edited

Loading

jakirkham commented Jul 26, 2023 •

edited

Loading