-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pynvjitlink
as a dependency
#14763
Add pynvjitlink
as a dependency
#14763
Conversation
@brandon-b-miller Here are some TODO items:
|
Do we know what is needed for the devcontainer build? Had looked at the log, but wasn't quite grasping what the error was |
The most thorough way of testing would be to add a CI job that has CUDA driver If we were to start building cuDF packages with CUDA |
I think it's this:
|
Thanks Brandon! 🙏 That's very helpful. Where did you find this in the logs? Think this PR ( rapidsai/pynvjitlink#45 ) should fix it |
In the failing job, click the dropdown arrow next to the "Run build in devcontainer" job marked by the red x. From there, there's a smaller dropdown that can be clicked into through the small white triangle under the step marked "run command in container". This expands the log inside which I found the error. |
Ah now I see. Thank you! 🙏
This is what I was missing |
The fix above is now in Restarting the failed CI jobs. Let's see how things go |
Looks like there was an issue with |
Ok with Bradley's help we no have |
It looks like |
Ok looks like the wheel build has an issue ModuleNotFoundError: No module named 'scikit_build_core' Note: Had to look at the raw logs (as the GHA GUI had some issue rendering) |
Merging in latest from |
Failing due to timeout. Likely this issue ( conda/infrastructure#869 ) Let's wait for that to clear up and retry the failing builds then |
@brandon-b-miller @bdice I'm not sure how close this PR is to merging, but #14770 is needed in the interim. If you anticipate this merging today then we can probably wait. Otherwise let's merge the other PR and revert it as part of this changeset. |
To me the pieces seem to be in place for everything to work as expected fairly soon, although I have a murkier picture of some of the conda issues we've encountered on the pynvjitlink side. |
conda tests seem to be failing at collection time in CUDA 11.x with
I think there's some extra logic needed here since |
Hoping 92c6bb1 resolves the latest set of failures. |
@@ -135,7 +132,9 @@ def _setup_numba(): | |||
if driver_version < (12, 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the comment above to mention pynvjitlink and the corresponding role of that package? This comment:
# ptxcompiler is a requirement for cuda 11.x packages but not
# cuda 12.x packages. However its version checking machinery
# is still necessary. If a user happens to have ptxcompiler
# in a cuda 12 environment, it's use for the purposes of
# checking the driver and runtime versions is harmless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brandon-b-miller I would generally advocate reviewing this entire file and any other files that relate to ptxcompiler/pynvjitlink to make sure things are named sensibly, etc. in a way that will support both CUDA 11 and CUDA 12+. I want the code comments and docs to reflect the implemented design going forward.
Keep in mind that we don't want to name things "CUDA 12" in the code if we can avoid it if it is likely that later versions will act in the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about something like 7dbf9f2 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a CUDA 12.x environment, ptxcompiler provides version checking, but not MVC directly
Is this true? We don't use ptxcompiler in CUDA 12 environments. No environment should have both ptxcompiler and pynvjitlink installed at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's technically _ptxcompiler.py
in this case - our slimmed down, vendored version of the few functions we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooooo. But I don't know how to distinguish ptxcompiler
the package (only used when on CUDA 11) from _ptxcompiler.py
the internal helper file (always active) from the text of this comment. Documenting that kind of thing clearly is what I want to achieve before merging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some reworking in e8a90b9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much clearer! Thanks for iterating on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving with one typo fix that I will commit.
@@ -135,7 +132,9 @@ def _setup_numba(): | |||
if driver_version < (12, 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much clearer! Thanks for iterating on this.
@@ -98,6 +98,7 @@ requirements: | |||
# xref: https://github.com/rapidsai/cudf/issues/12822 | |||
- cuda-nvrtc | |||
- cuda-python >=12.0,<13.0a0 | |||
- pynvjitlink |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we have a clearer idea on intended compatibility ( rapidsai/pynvjitlink#48 ), we may want to add some version constraints here
This could be done in a separate PR though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is reasonable. John proposed pynvjitlink >=0.1.11,<0.2.0a0
offline, which seems appropriate to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah though let's discuss in the issue and we can do this as follow up (after this PR is merged)
/merge |
The merge is being blocked by what seems like unrelated issues building the libcudf docs
This branch has the latest though, so it's possibly a problem on 24.02 - this ring any bells to anyone? |
Yeah this error probably comes from merging #13846 without it being fully up-to-date because some other PR merged bad docs changes. I'll take a look. |
Hoping that #14780 resolves this. |
/merge |
Thanks all! 🙏 |
This PR adds
pynvjitlink
as a hard dependency for cuDF. This should allow for MVC when launching numba kernels across minor versions of CUDA 12 up to the version ofnvjitlink
statically shipped withpynvjitlink
.cc @bdice