Set TORCH_CUDA_ARCH_LIST=8.0;8.6+PTX #1

weiji14 · 2024-05-07T22:21:13Z

Compile for CUDA compatibility 8.0 and above (Ampere generation or newer), continuing on work from conda-forge/staged-recipes#26239 (comment)

Note that build number is kept at 0, since the initial commit's (c75ac8e) build failed due to Azure pipelines running out of disk space.

Checklist

Used a personal fork of the feedstock to propose changes
Bumped the build number (if the version is unchanged)
Reset the build number to 0 (if the version changed)
Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
Ensured the license file is being packaged.

Continuing from conda-forge/staged-recipes#26239

Xref [email protected]:weiji14/flash-attn-feedstock.git

Expand to CUDA compatibility 8.0 and above, xref https://developer.nvidia.com/cuda-gpus. Also increase MAX_JOBS.

conda-forge-webservices · 2024-05-07T22:21:20Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

weiji14 · 2024-05-07T22:41:53Z

@conda-forge-admin, please rerender

…nda-forge-pinning 2024.05.07.15.53.14

Xref https://conda-forge.org/docs/maintainer/conda_forge_yml/#azure

…nda-forge-pinning 2024.05.07.15.53.14

carterbox · 2024-05-08T17:00:10Z

Probably, there isn't enough time on Azure to complete these builds. If it does build in the 6 hours, great! Let's publish at least one build.

In parallel, if you feel there are significant performance to be gained by building for '8.0,9.0+PTX' or even more archs, then please start doing the steps in this checklist in order to get this feedstock and yourself added to the allowlist for the gpu-long-running runners which have a time limit longer than 6 hours.

weiji14 · 2024-05-08T20:09:43Z

Probably, there isn't enough time on Azure to complete these builds. If it does build in the 6 hours, great! Let's publish at least one build.

So I've tried:

8.0;8.6;8.9+PTX at 5c1c507, all builds failed at about the 25/49 mark.
8.0;8.6+PTX at 3348c5a. 2 finished on time, 2 builds failed respectively at the 31/49 and 37/49 mark.

In parallel, if you feel there are significant performance to be gained by building for '8.0,9.0+PTX' or even more archs, then please start doing the steps in this checklist in order to get this feedstock and yourself added to the allowlist for the gpu-long-running runners which have a time limit longer than 6 hours.

Oo yes, I was keeping an eye on those, thanks for pointing to the docs! I'll take a look at that.

For now, let's just stick with 8.0+PTX so we have at least one build completed as you said. I'll close this PR, and focus on #2 instead.

weiji14 added 2 commits May 8, 2024 10:12

Free up disk space on Azure Pipelines

763f76b

Xref [email protected]:weiji14/flash-attn-feedstock.git

Set TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTX

cb8b616

Expand to CUDA compatibility 8.0 and above, xref https://developer.nvidia.com/cuda-gpus. Also increase MAX_JOBS.

weiji14 self-assigned this May 7, 2024

weiji14 mentioned this pull request May 7, 2024

Add flash-attn conda-forge/staged-recipes#26239

Merged

10 tasks

conda-forge-webservices[bot] and others added 3 commits May 7, 2024 22:43

MNT: Re-rendered with conda-build 24.3.0, conda-smithy 3.35.0, and co…

963d3ca

…nda-forge-pinning 2024.05.07.15.53.14

Set swapfile_size to 10GiB

95eecf0

Xref https://conda-forge.org/docs/maintainer/conda_forge_yml/#azure

MNT: Re-rendered with conda-build 24.3.0, conda-smithy 3.35.0, and co…

aa65558

…nda-forge-pinning 2024.05.07.15.53.14

weiji14 mentioned this pull request May 8, 2024

Rebuild for CUDA 12 w/arch + python312 #2

Merged

Skip compiling for CUDA compute capability 8.9 and 9.0

3348c5a

weiji14 changed the title ~~Set TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTX~~ Set TORCH_CUDA_ARCH_LIST=8.0;8.6+PTX May 8, 2024

weiji14 closed this May 8, 2024

weiji14 mentioned this pull request May 8, 2024

Build on OpenStack server for TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTX #4

Closed

weiji14 deleted the recompile branch May 27, 2024 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set TORCH_CUDA_ARCH_LIST=8.0;8.6+PTX #1

Set TORCH_CUDA_ARCH_LIST=8.0;8.6+PTX #1

weiji14 commented May 7, 2024 •

edited

Loading

conda-forge-webservices bot commented May 7, 2024

weiji14 commented May 7, 2024

carterbox commented May 8, 2024

weiji14 commented May 8, 2024

Set TORCH_CUDA_ARCH_LIST=8.0;8.6+PTX #1

Set TORCH_CUDA_ARCH_LIST=8.0;8.6+PTX #1

Conversation

weiji14 commented May 7, 2024 • edited Loading

conda-forge-webservices bot commented May 7, 2024

weiji14 commented May 7, 2024

carterbox commented May 8, 2024

weiji14 commented May 8, 2024

weiji14 commented May 7, 2024 •

edited

Loading