Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pynvjitlink as a dependency #14763

Merged
merged 19 commits into from
Jan 19, 2024

Conversation

brandon-b-miller
Copy link
Contributor

This PR adds pynvjitlink as a hard dependency for cuDF. This should allow for MVC when launching numba kernels across minor versions of CUDA 12 up to the version of nvjitlink statically shipped with pynvjitlink.

cc @bdice

@github-actions github-actions bot added the conda label Jan 16, 2024
@jakirkham jakirkham added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 16, 2024
@bdice
Copy link
Contributor

bdice commented Jan 16, 2024

@brandon-b-miller Here are some TODO items:

  • Update cudf's conda recipe to depend on pynvjitlink (conda/recipes/cudf/meta.yaml)
  • Update the error message if pynvjitlink isn't present?
  • Is there any configuration / logic needed to align our use of pynvjitlink with the way we've supported ptxcompiler/cubinlinker?
  • Add tests? Or are we covered already?

@github-actions github-actions bot added the Python Affects Python cuDF API. label Jan 16, 2024
@jakirkham
Copy link
Member

jakirkham commented Jan 16, 2024

Do we know what is needed for the devcontainer build?

Had looked at the log, but wasn't quite grasping what the error was

@brandon-b-miller
Copy link
Contributor Author

brandon-b-miller commented Jan 16, 2024

  • Add tests? Or are we covered already?

The most thorough way of testing would be to add a CI job that has CUDA driver 12.X and CTK 12.Y where Y>X. In this situation, every numba kernel launched would require pynvjitlink to work, even outside of contexts where cuDF is involved at all.

If we were to start building cuDF packages with CUDA 12.Y, and did not update to the 12.Y driver in CI, then certain cuDF tests would invoke the pynvjitlink codepath. If the driver, cuDF build version, and CTK all align in CI, the pynvjitlink codepath will not be tested.

@brandon-b-miller
Copy link
Contributor Author

Do we know what is needed for the devcontainer build?

Had looked at the log, but wasn't quite grasping what the error was

I think it's this:

  Could not solve for environment specs
  The following packages are incompatible
  ├─ cuda-version 12.0**  is requested and can be installed;
  └─ pynvjitlink is not installable because it requires
     └─ cuda-version >=12.2,<12.3 , which conflicts with any installable versions previously reported.

@jakirkham
Copy link
Member

Thanks Brandon! 🙏

That's very helpful. Where did you find this in the logs?

Think this PR ( rapidsai/pynvjitlink#45 ) should fix it

@brandon-b-miller
Copy link
Contributor Author

That's very helpful. Where did you find this in the logs?

In the failing job, click the dropdown arrow next to the "Run build in devcontainer" job marked by the red x. From there, there's a smaller dropdown that can be clicked into through the small white triangle under the step marked "run command in container". This expands the log inside which I found the error.

@jakirkham
Copy link
Member

Ah now I see. Thank you! 🙏

...the small white triangle under the step marked "run command in container"

This is what I was missing

@jakirkham
Copy link
Member

The fix above is now in pynvjitlink version 0.1.10 with packages already published

Restarting the failed CI jobs. Let's see how things go

@jakirkham
Copy link
Member

Looks like there was an issue with pin_compatible. Fixing as part of a 0.1.11 release of pynvjitlink in PR ( rapidsai/pynvjitlink#47 ). Included results from conda render later in the PR to confirm we are getting the expected behavior

@jakirkham
Copy link
Member

Ok with Bradley's help we no have 0.1.11. Rerunning CI

@jakirkham
Copy link
Member

It looks like pynvjitlink is now being installed 🎉

@jakirkham
Copy link
Member

Ok looks like the wheel build has an issue

ModuleNotFoundError: No module named 'scikit_build_core'

Note: Had to look at the raw logs (as the GHA GUI had some issue rendering)

@jakirkham
Copy link
Member

Merging in latest from branch-24.02 in case that helps

@jakirkham
Copy link
Member

Failing due to timeout. Likely this issue ( conda/infrastructure#869 )

Let's wait for that to clear up and retry the failing builds then

@brandon-b-miller brandon-b-miller marked this pull request as ready for review January 17, 2024 14:51
@brandon-b-miller brandon-b-miller requested review from a team as code owners January 17, 2024 14:51
ci/build_wheel.sh Outdated Show resolved Hide resolved
@vyasr
Copy link
Contributor

vyasr commented Jan 17, 2024

@brandon-b-miller @bdice I'm not sure how close this PR is to merging, but #14770 is needed in the interim. If you anticipate this merging today then we can probably wait. Otherwise let's merge the other PR and revert it as part of this changeset.

@brandon-b-miller
Copy link
Contributor Author

To me the pieces seem to be in place for everything to work as expected fairly soon, although I have a murkier picture of some of the conda issues we've encountered on the pynvjitlink side.

@bdice
Copy link
Contributor

bdice commented Jan 17, 2024

@vyasr I'd like to finalize and merge this PR today. I will approve #14770 in the meantime but let's only merge if this PR is stalled.

@brandon-b-miller
Copy link
Contributor Author

conda tests seem to be failing at collection time in CUDA 11.x with

E   ModuleNotFoundError: No module named 'pynvjitlink'

I think there's some extra logic needed here since pynvjitlink is a cuda 12.x dependency. I'll update.

@brandon-b-miller
Copy link
Contributor Author

Hoping 92c6bb1 resolves the latest set of failures.

@@ -135,7 +132,9 @@ def _setup_numba():
if driver_version < (12, 0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the comment above to mention pynvjitlink and the corresponding role of that package? This comment:

    # ptxcompiler is a requirement for cuda 11.x packages but not
    # cuda 12.x packages. However its version checking machinery
    # is still necessary. If a user happens to have ptxcompiler
    # in a cuda 12 environment, it's use for the purposes of
    # checking the driver and runtime versions is harmless

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brandon-b-miller I would generally advocate reviewing this entire file and any other files that relate to ptxcompiler/pynvjitlink to make sure things are named sensibly, etc. in a way that will support both CUDA 11 and CUDA 12+. I want the code comments and docs to reflect the implemented design going forward.

Keep in mind that we don't want to name things "CUDA 12" in the code if we can avoid it if it is likely that later versions will act in the same way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about something like 7dbf9f2 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a CUDA 12.x environment, ptxcompiler provides version checking, but not MVC directly

Is this true? We don't use ptxcompiler in CUDA 12 environments. No environment should have both ptxcompiler and pynvjitlink installed at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically _ptxcompiler.py in this case - our slimmed down, vendored version of the few functions we need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooo. But I don't know how to distinguish ptxcompiler the package (only used when on CUDA 11) from _ptxcompiler.py the internal helper file (always active) from the text of this comment. Documenting that kind of thing clearly is what I want to achieve before merging this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some reworking in e8a90b9

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much clearer! Thanks for iterating on this.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with one typo fix that I will commit.

python/cudf/cudf/utils/_numba.py Outdated Show resolved Hide resolved
@@ -135,7 +132,9 @@ def _setup_numba():
if driver_version < (12, 0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much clearer! Thanks for iterating on this.

@@ -98,6 +98,7 @@ requirements:
# xref: https://github.com/rapidsai/cudf/issues/12822
- cuda-nvrtc
- cuda-python >=12.0,<13.0a0
- pynvjitlink
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have a clearer idea on intended compatibility ( rapidsai/pynvjitlink#48 ), we may want to add some version constraints here

This could be done in a separate PR though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is reasonable. John proposed pynvjitlink >=0.1.11,<0.2.0a0 offline, which seems appropriate to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah though let's discuss in the issue and we can do this as follow up (after this PR is merged)

@bdice
Copy link
Contributor

bdice commented Jan 18, 2024

/merge

@brandon-b-miller
Copy link
Contributor Author

The merge is being blocked by what seems like unrelated issues building the libcudf docs

/__w/cudf/cudf/docs/cudf/source/libcudf_docs/api_docs/column_classes.rst:4: WARNING: Error when parsing function declaration.
If the function has no return type:
  Error in declarator or parameters-and-qualifiers
  Invalid C++ declaration: Expecting "(" in parameters-and-qualifiers. [error at 24]
    inline CUDF_HOST_DEVICE column_device_view slice (size_type offset, size_type size) const noexcept
    ------------------------^

This branch has the latest though, so it's possibly a problem on 24.02 - this ring any bells to anyone?

@vyasr
Copy link
Contributor

vyasr commented Jan 18, 2024

Yeah this error probably comes from merging #13846 without it being fully up-to-date because some other PR merged bad docs changes. I'll take a look.

@vyasr
Copy link
Contributor

vyasr commented Jan 18, 2024

Hoping that #14780 resolves this.

@vyasr vyasr mentioned this pull request Jan 18, 2024
3 tasks
@brandon-b-miller
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit e0905ac into rapidsai:branch-24.02 Jan 19, 2024
66 of 67 checks passed
@jakirkham
Copy link
Member

Thanks all! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants