Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set TORCH_CUDA_ARCH_LIST=8.0;8.6+PTX #1

Closed
wants to merge 6 commits into from

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented May 7, 2024

Compile for CUDA compatibility 8.0 and above (Ampere generation or newer), continuing on work from conda-forge/staged-recipes#26239 (comment)

Note that build number is kept at 0, since the initial commit's (c75ac8e) build failed due to Azure pipelines running out of disk space.

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Continuing from conda-forge/staged-recipes#26239

weiji14 added 2 commits May 8, 2024 10:12
Xref [email protected]:weiji14/flash-attn-feedstock.git
Expand to CUDA compatibility 8.0 and above, xref https://developer.nvidia.com/cuda-gpus. Also increase MAX_JOBS.
@weiji14 weiji14 self-assigned this May 7, 2024
@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@weiji14
Copy link
Member Author

weiji14 commented May 7, 2024

@conda-forge-admin, please rerender

@weiji14 weiji14 changed the title Set TORCH_CUDA_ARCH_LIST=8.0;8.6;8.9;9.0+PTX Set TORCH_CUDA_ARCH_LIST=8.0;8.6+PTX May 8, 2024
@carterbox
Copy link
Member

Probably, there isn't enough time on Azure to complete these builds. If it does build in the 6 hours, great! Let's publish at least one build.

In parallel, if you feel there are significant performance to be gained by building for '8.0,9.0+PTX' or even more archs, then please start doing the steps in this checklist in order to get this feedstock and yourself added to the allowlist for the gpu-long-running runners which have a time limit longer than 6 hours.

@weiji14
Copy link
Member Author

weiji14 commented May 8, 2024

Probably, there isn't enough time on Azure to complete these builds. If it does build in the 6 hours, great! Let's publish at least one build.

So I've tried:

  • 8.0;8.6;8.9+PTX at 5c1c507, all builds failed at about the 25/49 mark.
  • 8.0;8.6+PTX at 3348c5a. 2 finished on time, 2 builds failed respectively at the 31/49 and 37/49 mark.

In parallel, if you feel there are significant performance to be gained by building for '8.0,9.0+PTX' or even more archs, then please start doing the steps in this checklist in order to get this feedstock and yourself added to the allowlist for the gpu-long-running runners which have a time limit longer than 6 hours.

Oo yes, I was keeping an eye on those, thanks for pointing to the docs! I'll take a look at that.

For now, let's just stick with 8.0+PTX so we have at least one build completed as you said. I'll close this PR, and focus on #2 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants