Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to CUDA 12.6 #97

Merged
merged 5 commits into from
Oct 24, 2024
Merged

Update to CUDA 12.6 #97

merged 5 commits into from
Oct 24, 2024

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Aug 3, 2024

  • Bump CUDA version to 12.6

@jakirkham jakirkham requested review from a team as code owners August 3, 2024 02:06
@jakirkham jakirkham requested a review from AyodeAwe August 3, 2024 02:06
@jakirkham jakirkham marked this pull request as draft August 3, 2024 02:06
@jakirkham jakirkham force-pushed the cuda_12.6 branch 2 times, most recently from e368ff3 to 1a0b9ae Compare August 3, 2024 03:01
@jakirkham jakirkham changed the title [WIP] Update to CUDA 12.6 Update to CUDA 12.6 Aug 3, 2024
@jakirkham jakirkham marked this pull request as ready for review August 3, 2024 03:05
@jakirkham
Copy link
Member Author

With CUDA 12.6 am seeing the following test failure on CI:

___________________ test_duplicate_symbols_cubin_and_fatbin ____________________

device_functions_cubin = ('test_device_functions.cubin', b"\x7fELF\x02\x01\x013\x07\x00\x00\x00\x00\x00\x00\x00\x01\x00\xbe\x00x\x00\x00\x00\x0...x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\t\x00\x00\x18\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00")
device_functions_fatbin = ('test_device_functions.fatbin', b'P\xedU\xba\x01\x00\x10\x00\x00\t\x00\x00\x00\x00\x00\x00\x02\x00\x01\x01@\x00\x00\x...x1e\x08@\x00\x1f\x08@\x00\x00\x1f\xa4@\x00\x05\x1e\t@\x00\x1a\t@\x00P\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
gpu_arch_flag = '-arch=sm_70'

    def test_duplicate_symbols_cubin_and_fatbin(
        device_functions_cubin, device_functions_fatbin, gpu_arch_flag
    ):
        # This link errors because the cubin and the fatbin contain the same
        # symbols.
        nvjitlinker = NvJitLinker(gpu_arch_flag)
        name, cubin = device_functions_cubin
        nvjitlinker.add_cubin(cubin, name)
        name, fatbin = device_functions_fatbin
>       with pytest.raises(NvJitLinkError, match="NVJITLINK_ERROR_INVALID_INPUT error"):
E       Failed: DID NOT RAISE <class 'pynvjitlink.api.NvJitLinkError'>

test_pynvjitlink_api.py:90: Failed

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me.

@brandon-b-miller
Copy link
Contributor

CI failure seems to be from a test where the underlying lib is correctly erroring but I think there error isn't being translated into an NvJitLinkError correctly somehow. Looking into it

Copy link
Member Author

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the package hashing error noted above: #97 (comment)

This uses a build time generated variant file as noted below

ci/build_conda.sh Outdated Show resolved Hide resolved
ci/build_conda.sh Outdated Show resolved Hide resolved
conda/recipes/pynvjitlink/meta.yaml Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member Author

Have separated the bulk of these changes into PR: #101

As that is now in, will rebase this so it contains only the CUDA 12.6 update

@bdice
Copy link
Contributor

bdice commented Sep 20, 2024

@brandon-b-miller @gmarkall Would one of you be able to follow up on this failure?

This appears to be a change in upstream nvJitLink behaviour
@jakirkham
Copy link
Member Author

Thanks for the assist Bradley! 🙏

@jakirkham jakirkham merged commit 7d9a7ee into rapidsai:main Oct 24, 2024
39 checks passed
@jakirkham jakirkham deleted the cuda_12.6 branch October 24, 2024 23:12
@jakirkham jakirkham mentioned this pull request Oct 24, 2024
gmarkall pushed a commit that referenced this pull request Oct 25, 2024
* Fix building tests in multi-gpu environment (#98) 
* Change cmake.verbose = true to build.verbose = true (#99) 
* Use build-system.requires to set scikit-build-core minimum version
(#100)
* Set CUDA version in one file (and use everywhere else) (#101) 
* Drop Python 3.9 support (#102) 
* Use CI workflow branch 'branch-24.10' again (#105) 
* Use conda strict channel priority. (#109) 
* Update to CUDA 12.6 (#97)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants