Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PyTorch to >=2.4.0 to get fix for CUDA array interface bug, and drop CUDA 11 PyTorch tests. #17475

Merged
merged 1 commit into from
Dec 2, 2024

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Dec 2, 2024

Description

This PR updates our PyTorch lower bound to 2.4.0 to get the bugfix from pytorch/pytorch#121458.

Also, this PR drops CUDA 11 tests because conda-forge no longer produces CUDA 11 builds of PyTorch. This was causing a failure on Hopper GPUs because the last available CUDA 11 builds from conda-forge do not include sm90 support.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Dec 2, 2024

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@bdice
Copy link
Contributor Author

bdice commented Dec 2, 2024

/ok to test

@github-actions github-actions bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Dec 2, 2024
@bdice bdice added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Dec 2, 2024
@bdice
Copy link
Contributor Author

bdice commented Dec 2, 2024

The failing Hopper tests looked like this:

=========================== short test summary info ============================
FAILED tests/test_cuda_array_interface.py::test_cuda_array_interface_pytorch - torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 
NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_35 sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_89 compute_89.
If you want to use the NVIDIA H100 PCIe GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

This occurs because pytorch 2.5.1 cuda118_py310h920319e_303 from conda-forge lacks sm90 support.

@bdice bdice marked this pull request as ready for review December 2, 2024 22:29
@bdice bdice requested review from a team as code owners December 2, 2024 22:29
@bdice
Copy link
Contributor Author

bdice commented Dec 2, 2024

/merge

@rapids-bot rapids-bot bot merged commit 852338e into rapidsai:branch-25.02 Dec 2, 2024
112 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants