-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
split up CUDA-suffixed dependencies in dependencies.yaml #4552
Conversation
I think we need this applied to rapidsai/cugraph-gnn as well |
Sure @alexbarghi-nv I can do that! You're right, I'd missed that because |
I've updated this based on the suggestions from rapidsai/cudf#16183. Ran |
@alexbarghi-nv I created the following in
|
dependencies.yaml
Outdated
# NOTE: this is not broken into groups by a 'cuda_suffixed' selector like | ||
# other packages with -cu{nn}x suffixes in this file because devcontainers-based | ||
# builds that build it from source expect a package with a `-cuda{nn}x` suffix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that build it from source
The antecedent is a little unclear here. "it" refers to cugraph, not cupy here. We never build cupy from source in RAPIDS. Maybe we can simplify the comment like:
# NOTE: this is not broken into groups by a 'cuda_suffixed' selector like | |
# other packages with -cu{nn}x suffixes in this file because devcontainers-based | |
# builds that build it from source expect a package with a `-cuda{nn}x` suffix | |
# cupy is not built from source in devcontainers, so cuda_suffixed isn't needed |
or even
# NOTE: this is not broken into groups by a 'cuda_suffixed' selector like | |
# other packages with -cu{nn}x suffixes in this file because devcontainers-based | |
# builds that build it from source expect a package with a `-cuda{nn}x` suffix | |
# cupy is not built from source so cuda_suffixed is not needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My choice to say "devcontainers-based builds" was a bad one. It's very unclear.
I should have said "DLFW builds".
We do not build cupy
from source in RAPIDS devcontainers, but we DO build it from source in the RAPIDS DLFW builds. And there, pass the following through to pip wheel
.
--build-option="--cupy-package-name=cupy-cuda$(cut -d'.' -f1 <<< "${CUDA_VERSION}")x"
Here's where cupy
exposes that option in its configuration:
So it's important, in the DFLW context, that cugraph
depend on e.g. "cupy-cuda12x"
, not "cupy"
.
Maybe we should change that in DLFW in the future, or maybe the cupy
dependency should be broken out into a separate block called like depends_on_cupy
here in dependencies.yaml
, so that these cuda_suffixed: "false"
entries could be removed.
But with code freeze for 24.08 so close, I wasn't looking to change any behavior... just to reflect the current state in the smallest, least controversial changeset possible to get these merged in.
Given all that... would you support leaving this as-is and instead having a comment like the one I'm proposing on cuml
's cupy
dependency over in rapidsai/cuml#5974?
# NOTE: cupy still has a "-cuda12x" suffix here, because it's suffixed
# in DLFW builds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are on the same page with everything but the wording of this comment -- the behavior here is what I expect, but I don't think the comment needs to describe only DLFW.
To the best of my knowledge, there is no such pip package called cupy
(public or internal/DLFW), so there's never a case where we want cupy to be unsuffixed in requirements
or pyproject
lists. cupy
is only ever used as the conda package name.
Note that for RAPIDS, devcontainers (including outside of DLFW) need the unsuffixed name because that matches the exclusion list.
So either way, nothing here is DLFW-specific, if I understand correctly. We should be able to reword this to not mention DLFW and just say devcontainers. Perhaps the "from source" piece of my previous suggestion is irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think about it and merge whatever you think fits best for the comment. The behavior is what I expect, so I don't want to get stuck on the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thanks!
I was thinking cugraph
went into code freeze today, but I realize that wasn't correct. So we do have a bit more time with this one to make it look the way we long-term want it to. I'll push a change here (tomorrow) with a separated-out depends_on_cupy
list and a less-confusing comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need the unsuffixed name because that matches the exclusion list.
And just to note... the exclusion list there includes both the suffixed and unsuffixed versions. That's what this line there does:
pip_noinstall+=("${pkg}" "${pkg}-cu.*");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, cupy is not captured by that exclusion list -- that list is constructed from RAPIDS packages (things in manifest.yaml
). cupy is handled in a special case here, where we transform it into:
cupy-cuda[0-9]+x
-> cupy-cuda${cuda_version_major}x
This is to work around issues in previous releases' dependencies.yaml
that always output cupy-cuda11x
instead of outputting the correct version based on the cuda
matrix entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying that, appreciated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed, updating the comment to this:
# NOTE: This is intentionally not broken into groups by a 'cuda_suffixed' selector like
# other packages with -cu{nn}x suffixes in this file.
# All RAPIDS wheel builds (including in devcontainers) expect cupy to be suffixed.
Hopefully that more clearly captures the intent.
...maybe the
cupy
dependency should be broken out into a separate block called like depends_on_cupy here independencies.yaml
🤦🏻 that's already how it works here in cugraph
, sorry... too many very-similar-looking PRs yesterday.
Cancelling CI here (and will on other commits in the next 24 hours) to save some CI capacity for other projects that are about to enter code freeze. |
conda tests here are failing like this:
full stacktrace (click me)
Manually retriggering all jobs so builds will be done against latest |
@jameslamb this is a known issue. I think @nv-rliu is looking into it. |
Ah ok thanks @alexbarghi-nv ! Then I'll cancel the CI run here, to give back some capacity while it's being worked on. @nv-rliu if you have an issue or PR, could you link it here so I can follow along? |
restarted all CI here, to hopefully pick up new artifacts with the fixes from rapidsai/raft#2394 and rapidsai/raft#2398 |
I've cancelled the 3 These have been stuck for about 2 hours. There's some issue causing 1 test to not complete (unrelated to this PR), and it's being temporarily skipped in #4559. |
/merge |
Description
Contributes to rapidsai/build-planning#31
In short, RAPIDS DLFW builds want to produce wheels with unsuffixed dependencies, e.g.
cudf
depending onrmm
, notrmm-cu12
.This PR is part of a series across all of RAPIDS to try to support that type of build by setting up CUDA-suffixed and CUDA-unsuffixed dependency lists in
dependencies.yaml
.For more details, see:
Notes for Reviewers
Why target 24.08?
This is targeting 24.08 because: