Update PyTorch to >=2.4.0 to get fix for CUDA array interface bug, and drop CUDA 11 PyTorch tests. #17475

bdice · 2024-12-02T19:16:19Z

Description

This PR updates our PyTorch lower bound to 2.4.0 to get the bugfix from pytorch/pytorch#121458.

Also, this PR drops CUDA 11 tests because conda-forge no longer produces CUDA 11 builds of PyTorch. This was causing a failure on Hopper GPUs because the last available CUDA 11 builds from conda-forge do not include sm90 support.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…d drop CUDA 11 PyTorch tests.

copy-pr-bot · 2024-12-02T19:16:23Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

bdice · 2024-12-02T19:16:29Z

/ok to test

bdice · 2024-12-02T19:18:16Z

The failing Hopper tests looked like this:

=========================== short test summary info ============================
FAILED tests/test_cuda_array_interface.py::test_cuda_array_interface_pytorch - torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 
NVIDIA H100 PCIe with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_35 sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_89 compute_89.
If you want to use the NVIDIA H100 PCIe GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

This occurs because pytorch 2.5.1 cuda118_py310h920319e_303 from conda-forge lacks sm90 support.

bdice · 2024-12-02T22:39:53Z

/merge

Update PyTorch to >=2.4.0 to get fix for CUDA array interface bug, an…

d16063c

…d drop CUDA 11 PyTorch tests.

github-actions bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Dec 2, 2024

github-actions bot assigned bdice Dec 2, 2024

bdice added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Dec 2, 2024

bdice marked this pull request as ready for review December 2, 2024 22:29

bdice requested review from a team as code owners December 2, 2024 22:29

bdice requested review from jameslamb, Matt711 and brandon-b-miller December 2, 2024 22:29

jameslamb approved these changes Dec 2, 2024

View reviewed changes

mroeschke approved these changes Dec 2, 2024

View reviewed changes

rapids-bot bot merged commit 852338e into rapidsai:branch-25.02 Dec 2, 2024
112 checks passed

bdice mentioned this pull request Dec 20, 2024

make cugraph-ops optional for cugraph-gnn packages rapidsai/cugraph-gnn#99

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update PyTorch to >=2.4.0 to get fix for CUDA array interface bug, and drop CUDA 11 PyTorch tests. #17475

Update PyTorch to >=2.4.0 to get fix for CUDA array interface bug, and drop CUDA 11 PyTorch tests. #17475

bdice commented Dec 2, 2024

copy-pr-bot bot commented Dec 2, 2024

bdice commented Dec 2, 2024

bdice commented Dec 2, 2024

bdice commented Dec 2, 2024

Update PyTorch to >=2.4.0 to get fix for CUDA array interface bug, and drop CUDA 11 PyTorch tests. #17475

Update PyTorch to >=2.4.0 to get fix for CUDA array interface bug, and drop CUDA 11 PyTorch tests. #17475

Conversation

bdice commented Dec 2, 2024

Description

Checklist

copy-pr-bot bot commented Dec 2, 2024

bdice commented Dec 2, 2024

bdice commented Dec 2, 2024

bdice commented Dec 2, 2024