Add `pynvjitlink` as a dependency #14763

brandon-b-miller · 2024-01-16T17:22:36Z

This PR adds pynvjitlink as a hard dependency for cuDF. This should allow for MVC when launching numba kernels across minor versions of CUDA 12 up to the version of nvjitlink statically shipped with pynvjitlink.

cc @bdice

bdice · 2024-01-16T19:01:12Z

@brandon-b-miller Here are some TODO items:

Update cudf's conda recipe to depend on pynvjitlink (conda/recipes/cudf/meta.yaml)
Update the error message if pynvjitlink isn't present?
Is there any configuration / logic needed to align our use of pynvjitlink with the way we've supported ptxcompiler/cubinlinker?
Add tests? Or are we covered already?

jakirkham · 2024-01-16T19:53:28Z

Do we know what is needed for the devcontainer build?

Had looked at the log, but wasn't quite grasping what the error was

brandon-b-miller · 2024-01-16T19:56:42Z

Add tests? Or are we covered already?

The most thorough way of testing would be to add a CI job that has CUDA driver 12.X and CTK 12.Y where Y>X. In this situation, every numba kernel launched would require pynvjitlink to work, even outside of contexts where cuDF is involved at all.

If we were to start building cuDF packages with CUDA 12.Y, and did not update to the 12.Y driver in CI, then certain cuDF tests would invoke the pynvjitlink codepath. If the driver, cuDF build version, and CTK all align in CI, the pynvjitlink codepath will not be tested.

brandon-b-miller · 2024-01-16T19:57:56Z

Do we know what is needed for the devcontainer build?

Had looked at the log, but wasn't quite grasping what the error was

I think it's this:

  Could not solve for environment specs
  The following packages are incompatible
  ├─ cuda-version 12.0**  is requested and can be installed;
  └─ pynvjitlink is not installable because it requires
     └─ cuda-version >=12.2,<12.3 , which conflicts with any installable versions previously reported.

jakirkham · 2024-01-16T20:06:55Z

Thanks Brandon! 🙏

That's very helpful. Where did you find this in the logs?

Think this PR ( rapidsai/pynvjitlink#45 ) should fix it

brandon-b-miller · 2024-01-16T20:10:18Z

That's very helpful. Where did you find this in the logs?

In the failing job, click the dropdown arrow next to the "Run build in devcontainer" job marked by the red x. From there, there's a smaller dropdown that can be clicked into through the small white triangle under the step marked "run command in container". This expands the log inside which I found the error.

jakirkham · 2024-01-16T20:12:21Z

Ah now I see. Thank you! 🙏

...the small white triangle under the step marked "run command in container"

This is what I was missing

jakirkham · 2024-01-17T01:07:18Z

The fix above is now in pynvjitlink version 0.1.10 with packages already published

Restarting the failed CI jobs. Let's see how things go

jakirkham · 2024-01-17T01:43:25Z

Looks like there was an issue with pin_compatible. Fixing as part of a 0.1.11 release of pynvjitlink in PR ( rapidsai/pynvjitlink#47 ). Included results from conda render later in the PR to confirm we are getting the expected behavior

jakirkham · 2024-01-17T02:26:07Z

Ok with Bradley's help we no have 0.1.11. Rerunning CI

jakirkham · 2024-01-17T02:31:31Z

It looks like pynvjitlink is now being installed 🎉

jakirkham · 2024-01-17T02:50:58Z

Ok looks like the wheel build has an issue

ModuleNotFoundError: No module named 'scikit_build_core'

Note: Had to look at the raw logs (as the GHA GUI had some issue rendering)

jakirkham · 2024-01-17T02:51:51Z

Merging in latest from branch-24.02 in case that helps

jakirkham · 2024-01-17T03:07:28Z

Failing due to timeout. Likely this issue ( conda/infrastructure#869 )

Let's wait for that to clear up and retry the failing builds then

ci/build_wheel.sh

vyasr · 2024-01-17T17:10:12Z

@brandon-b-miller @bdice I'm not sure how close this PR is to merging, but #14770 is needed in the interim. If you anticipate this merging today then we can probably wait. Otherwise let's merge the other PR and revert it as part of this changeset.

brandon-b-miller · 2024-01-17T17:12:38Z

To me the pieces seem to be in place for everything to work as expected fairly soon, although I have a murkier picture of some of the conda issues we've encountered on the pynvjitlink side.

bdice · 2024-01-17T17:20:43Z

@vyasr I'd like to finalize and merge this PR today. I will approve #14770 in the meantime but let's only merge if this PR is stalled.

brandon-b-miller · 2024-01-17T18:00:42Z

conda tests seem to be failing at collection time in CUDA 11.x with

E   ModuleNotFoundError: No module named 'pynvjitlink'

I think there's some extra logic needed here since pynvjitlink is a cuda 12.x dependency. I'll update.

brandon-b-miller · 2024-01-17T18:04:34Z

Hoping 92c6bb1 resolves the latest set of failures.

bdice · 2024-01-17T19:37:29Z

python/cudf/cudf/utils/_numba.py

@@ -135,7 +132,9 @@ def _setup_numba():
            if driver_version < (12, 0):


Can we update the comment above to mention pynvjitlink and the corresponding role of that package? This comment:

# ptxcompiler is a requirement for cuda 11.x packages but not # cuda 12.x packages. However its version checking machinery # is still necessary. If a user happens to have ptxcompiler # in a cuda 12 environment, it's use for the purposes of # checking the driver and runtime versions is harmless

@brandon-b-miller I would generally advocate reviewing this entire file and any other files that relate to ptxcompiler/pynvjitlink to make sure things are named sensibly, etc. in a way that will support both CUDA 11 and CUDA 12+. I want the code comments and docs to reflect the implemented design going forward.

Keep in mind that we don't want to name things "CUDA 12" in the code if we can avoid it if it is likely that later versions will act in the same way.

how about something like 7dbf9f2 ?

In a CUDA 12.x environment, ptxcompiler provides version checking, but not MVC directly

Is this true? We don't use ptxcompiler in CUDA 12 environments. No environment should have both ptxcompiler and pynvjitlink installed at the same time.

It's technically _ptxcompiler.py in this case - our slimmed down, vendored version of the few functions we need.

Ooooo. But I don't know how to distinguish ptxcompiler the package (only used when on CUDA 11) from _ptxcompiler.py the internal helper file (always active) from the text of this comment. Documenting that kind of thing clearly is what I want to achieve before merging this.

some reworking in e8a90b9

Much clearer! Thanks for iterating on this.

bdice

Approving with one typo fix that I will commit.

python/cudf/cudf/utils/_numba.py

bdice · 2024-01-17T23:17:25Z

python/cudf/cudf/utils/_numba.py

@@ -135,7 +132,9 @@ def _setup_numba():
            if driver_version < (12, 0):


Much clearer! Thanks for iterating on this.

jakirkham · 2024-01-17T23:46:02Z

conda/recipes/cudf/meta.yaml

@@ -98,6 +98,7 @@ requirements:
    # xref: https://github.com/rapidsai/cudf/issues/12822
    - cuda-nvrtc
    - cuda-python >=12.0,<13.0a0
+    - pynvjitlink


Once we have a clearer idea on intended compatibility ( rapidsai/pynvjitlink#48 ), we may want to add some version constraints here

This could be done in a separate PR though

Yes, this is reasonable. John proposed pynvjitlink >=0.1.11,<0.2.0a0 offline, which seems appropriate to me.

Yeah though let's discuss in the issue and we can do this as follow up (after this PR is merged)

bdice · 2024-01-18T00:42:38Z

/merge

brandon-b-miller · 2024-01-18T15:07:14Z

The merge is being blocked by what seems like unrelated issues building the libcudf docs

/__w/cudf/cudf/docs/cudf/source/libcudf_docs/api_docs/column_classes.rst:4: WARNING: Error when parsing function declaration.
If the function has no return type:
  Error in declarator or parameters-and-qualifiers
  Invalid C++ declaration: Expecting "(" in parameters-and-qualifiers. [error at 24]
    inline CUDF_HOST_DEVICE column_device_view slice (size_type offset, size_type size) const noexcept
    ------------------------^

This branch has the latest though, so it's possibly a problem on 24.02 - this ring any bells to anyone?

vyasr · 2024-01-18T15:30:17Z

Yeah this error probably comes from merging #13846 without it being fully up-to-date because some other PR merged bad docs changes. I'll take a look.

vyasr · 2024-01-18T16:42:31Z

Hoping that #14780 resolves this.

brandon-b-miller · 2024-01-19T03:57:33Z

/merge

jakirkham · 2024-01-19T19:02:06Z

Thanks all! 🙏

add pynvjitlink to run_cudf

532da9f

github-actions bot added the conda label Jan 16, 2024

jakirkham added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 16, 2024

brandon-b-miller added 2 commits January 16, 2024 11:39

add pynvjitlink to meta.yaml for cudf

02ef286

unconditionally import patching function from pynvjitlink

c6ae80c

github-actions bot added the Python Affects Python cuDF API. label Jan 16, 2024

jakirkham mentioned this pull request Jan 16, 2024

Relax cuda-version constraint rapidsai/pynvjitlink#45

Merged

Merge branch 'branch-24.02' into add-pynvjitlink-dep

6c110b4

brandon-b-miller added 2 commits January 17, 2024 06:48

Merge branch 'branch-24.02' into add-pynvjitlink-dep

482be0f

add pynvjitlink to test_python_common

d301cb9

brandon-b-miller marked this pull request as ready for review January 17, 2024 14:51

brandon-b-miller requested review from a team as code owners January 17, 2024 14:51

brandon-b-miller requested review from shwina and bdice January 17, 2024 14:51

Remove testing dependency on pynvjitlink.

aefd0c0

brandon-b-miller commented Jan 17, 2024

View reviewed changes

ci/build_wheel.sh Outdated Show resolved Hide resolved

bdice and others added 3 commits January 17, 2024 07:56

Remove double suffix.

c6b3982

Merge branch 'branch-24.02' into add-pynvjitlink-dep

c424796

style

687225a

locally import patch_numba_linker

92c6bb1

bdice reviewed Jan 17, 2024

View reviewed changes

brandon-b-miller added 2 commits January 17, 2024 11:46

update comment

7dbf9f2

update comment

e8a90b9

bdice approved these changes Jan 17, 2024

View reviewed changes

bdice added 2 commits January 17, 2024 17:17

Fix typo.

698f1a2

Merge branch 'branch-24.02' into add-pynvjitlink-dep

011038d

raydouglass approved these changes Jan 17, 2024

View reviewed changes

jakirkham reviewed Jan 17, 2024

View reviewed changes

vyasr mentioned this pull request Jan 18, 2024

Ignore numba CEC warning for now #14770

Closed

3 tasks

bdice and others added 2 commits January 18, 2024 17:37

Merge branch 'branch-24.02' into add-pynvjitlink-dep

b0972a9

Merge branch 'branch-24.02' into add-pynvjitlink-dep

016c237

Merge branch 'branch-24.02' into add-pynvjitlink-dep

d5c1efa

rapids-bot bot merged commit e0905ac into rapidsai:branch-24.02 Jan 19, 2024
66 of 67 checks passed

		@@ -135,7 +132,9 @@ def _setup_numba():
		if driver_version < (12, 0):

Add pynvjitlink as a dependency #14763

Add pynvjitlink as a dependency #14763

Conversation

brandon-b-miller commented Jan 16, 2024

bdice commented Jan 16, 2024 • edited Loading

jakirkham commented Jan 16, 2024 • edited Loading

brandon-b-miller commented Jan 16, 2024 • edited by bdice Loading

brandon-b-miller commented Jan 16, 2024

jakirkham commented Jan 16, 2024

brandon-b-miller commented Jan 16, 2024

jakirkham commented Jan 16, 2024

jakirkham commented Jan 17, 2024

jakirkham commented Jan 17, 2024

jakirkham commented Jan 17, 2024

jakirkham commented Jan 17, 2024

jakirkham commented Jan 17, 2024

jakirkham commented Jan 17, 2024

jakirkham commented Jan 17, 2024

vyasr commented Jan 17, 2024

brandon-b-miller commented Jan 17, 2024

bdice commented Jan 17, 2024

brandon-b-miller commented Jan 17, 2024

brandon-b-miller commented Jan 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice commented Jan 18, 2024

brandon-b-miller commented Jan 18, 2024

vyasr commented Jan 18, 2024

vyasr commented Jan 18, 2024

brandon-b-miller commented Jan 19, 2024

jakirkham commented Jan 19, 2024

Add `pynvjitlink` as a dependency #14763

Add `pynvjitlink` as a dependency #14763

bdice commented Jan 16, 2024 •

edited

Loading

jakirkham commented Jan 16, 2024 •

edited

Loading

brandon-b-miller commented Jan 16, 2024 •

edited by bdice

Loading