Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update c_stdlib_version for aarch + cuda 12.6 #6745

Closed

Conversation

hmaarrfk
Copy link
Contributor

@hmaarrfk hmaarrfk commented Nov 27, 2024

I think this was missed in the cuda 12.6 upgrade.

In our previous attempt at building for CUDA 12.6 we had the c_stdlib version increased for aarch, and even linux64:
https://github.com/conda-forge/pytorch-cpu-feedstock/pull/293/files#diff-ff61408cdc05bc9667deeadb55e4aaceb1371972076b6bf6934f9008920f2bd2

cc: @h-vetinari

conda-forge/pytorch-cpu-feedstock#293 (comment)

Closes: conda-forge/cudnn-feedstock#85

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@hmaarrfk hmaarrfk requested a review from a team as a code owner November 27, 2024 10:22
@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I was trying to look for recipes to lint for you, but it appears we have a merge conflict. Please try to merge or rebase with the base branch to resolve this conflict.

Please ping the 'conda-forge/core' team (using the @ notation in a comment) if you believe this is a bug.

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our previous attempt at building for CUDA 12.6 we had the c_stdlib version increased for aarch, and even linux64:

That's because at the time, we still had a coupling of c_stdlib_version with docker_image, and changing the image implied changing the stdlib. When using the newest docker images though (as we recently started doing), the bump to the c_stdlib_version is not necessary anymore -- except for packages which fail during compilation against our default sysroot due to some missing symbols or somesuch.

To the best of my understanding, the situation in pytorch would not be changed by this PR.

  • it happens at runtime, and not compilation time.
  • it happens due to loading a dependency likely having incorrect metadata
  • Even then the failure makes no sense (as I was trying to say in the comment you referenced), because the image that the build was running in was an alma8 image, which has a 2.28 glibc in it that is available (and should be the only one!).

The fact loading libcufile.so fails to load with /lib64/libm.so.6: version 'GLIBC_2.27' not found -- in other words, a symbol from an older glibc version than present in the image (and one that should thus definitely be there) -- to me sounds like something went seriously wrong somewhere (or perhaps alternatively, that I severely misunderstand some aspect of this).

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@h-vetinari
Copy link
Member

h-vetinari commented Nov 27, 2024

Even then the failure makes no sense (as I was trying to say in the comment you referenced), because the image that the build was running in was an alma8 image, which has a 2.28 glibc in it that is available (and should be the only one!).

OK, that got me thinking. I rechecked the logs (warning, large), and in the test environment, there's actually another glibc hiding:

TEST START: /home/conda/feedstock_root/build_artifacts/linux-aarch64/pytorch-2.5.1-cuda126_py312h8a24fa9_204.conda
WARNING: Multiple meta files found. The meta.yaml file in the base directory (/tmp/tmpzrij0llj/info/recipe) will be used.
Reloading output folder (local): ...working... done
Solving environment (_test_env): ...working... done

## Package Plan ##

  environment location: $PREFIX


The following NEW packages will be INSTALLED:

    [...]
    libcufile:                     1.11.1.6-h3d08a35_2             conda-forge
    [...]
    pytorch:                       2.5.1-cuda126_py312h8a24fa9_204 local
    [...]
    sysroot_linux-aarch64:         2.17-h5b4a56d_18                conda-forge   !!!!
    [...]

So what this means, that loading libcufile.so fails not only if it happens in a too-old image, it also happens if there's a too-old sysroot lying around in the environment (in this case probably as some transitive dependency).

So the immediate solution there is to add

run_constrained:
  - sysroot_{{ target_platform }} >={{ c_stdlib_version }}

to libcufile. The large-scale solution would be conda-forge/linux-sysroot-feedstock#63.

@hmaarrfk
Copy link
Contributor Author

ok i'll let you work with the cuda team to help streamline this and hopefully get to close conda-forge/cudnn-feedstock#85

@hmaarrfk hmaarrfk closed this Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cudnn 9 migration???
3 participants