Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial CUDA 11.3 Conda Packages #62

Closed
jakirkham opened this issue Jul 14, 2021 · 51 comments
Closed

Initial CUDA 11.3 Conda Packages #62

jakirkham opened this issue Jul 14, 2021 · 51 comments

Comments

@jakirkham
Copy link
Member

We have published some new Conda packages for public consumption. These contain the redistributable libraries, compilers, profiling tools, etc. Also they are currently in the nvidia channel ( https://anaconda.org/nvidia ). To get started one can just run conda install -c nvidia cuda=11.3. This would include everything that cudatoolkit contains today.

We would like to collect some feedback from the community here to inform how we package these going forward. Once we are more sure these fill the needs here, we can follow up on integrating them into conda-forge.

@jakirkham
Copy link
Member Author

Just to add the ARM packages are for SBSA. So will not work on Jetson for example

@jaimergp
Copy link
Member

jaimergp commented Jul 19, 2021

Nice, I'll have a look now.

First thing I notice, the version strings are sometimes different across components:

$> CONDA_SUBDIR="linux-64" mamba create -n cuda cuda=11.3 -c nvidia
...

  cuda                           11.3.0  h3b286be_0  nvidia/linux-64       2 KB
  cuda-command-line-tools        11.3.0  h3b286be_0  nvidia/linux-64       2 KB
  cuda-compiler                  11.3.0  h3b286be_0  nvidia/linux-64       2 KB
  cuda-cudart                   11.3.58  hc1aae59_0  nvidia/linux-64       1 MB
  cuda-cuobjdump                11.3.58  hc78e225_0  nvidia/linux-64     115 KB
  cuda-cupti                    11.3.58  h9a3dd33_0  nvidia/linux-64      19 MB
  cuda-cuxxfilt                 11.3.58  he670d9e_0  nvidia/linux-64      32 KB
  cuda-gdb                      11.3.58  h531059a_0  nvidia/linux-64      39 MB
  cuda-libraries                 11.3.0  h3b286be_0  nvidia/linux-64       2 KB
  cuda-libraries-dev             11.3.0  h3b286be_0  nvidia/linux-64       2 KB
  cuda-memcheck                 11.3.58  h8711ecb_0  nvidia/linux-64     156 KB
  cuda-nvcc                     11.3.58  h2467b9f_0  nvidia/linux-64      54 MB
  cuda-nvdisasm                 11.3.58  hd2ea46e_0  nvidia/linux-64      32 MB
  cuda-nvml-dev                 11.3.58  h70090ce_0  nvidia/linux-64      62 KB
  cuda-nvprof                   11.3.58  h860cd9e_0  nvidia/linux-64       4 MB
  cuda-nvprune                  11.3.58  hb917323_0  nvidia/linux-64      47 KB
  cuda-nvrtc                    11.3.58  he300756_0  nvidia/linux-64      30 MB
  cuda-nvtx                     11.3.58  h3fa534a_0  nvidia/linux-64      44 KB
  cuda-nvvp                     11.3.58  hd16380c_0  nvidia/linux-64     115 MB
  cuda-runtime                   11.3.0  h3b286be_0  nvidia/linux-64       2 KB
  cuda-samples                  11.3.58  hc6eff01_0  nvidia/linux-64      65 MB
  cuda-sanitizer-api            11.3.58  h58da6c8_0  nvidia/linux-64      15 MB
  cuda-thrust                   11.3.58  h7b74f08_0  nvidia/linux-64       1 MB
  cuda-toolkit                   11.3.0  h3b286be_0  nvidia/linux-64       2 KB
  cuda-tools                     11.3.0  h3b286be_0  nvidia/linux-64       2 KB
  cuda-visual-tools              11.3.0  h3b286be_0  nvidia/linux-64       2 KB
  libcublas                11.4.2.10064  h8a72295_0  nvidia/linux-64     375 MB
  libcufft                    10.4.2.58  h58ccd86_0  nvidia/linux-64     358 MB
  libcurand                   10.2.4.58  h99380db_0  nvidia/linux-64      98 MB
  libcusolver                 11.1.1.58  hec68242_0  nvidia/linux-64     150 MB
  libcusparse                 11.5.0.58  hf5aa513_0  nvidia/linux-64     294 MB
  libnpp                      11.3.3.44  h8df316f_0  nvidia/linux-64     202 MB
  libnvjpeg                   11.4.1.58  h3d06750_0  nvidia/linux-64       4 MB

I see 11.1, 11.3, 11.4, 10.4 (?), 10.2...

It'd be nice to see the underlying recipe(s). I guess they are included in the metadata, but it would be more convenient to have a Gist or something like that, if possible.

@jaimergp
Copy link
Member

The dependency tree, if anybody is curious:

$> CONDA_SUBDIR=linux-64 mamba repoquery depends -q -c nvidia --tree cuda

Executing the query cuda

cuda[11.3.0]
  ├─ cuda-runtime[11.3.0]
  │  └─ cuda-libraries[11.3.0]
  │     ├─ cuda-cudart[11.3.58]
  │     ├─ cuda-nvrtc[11.3.58]
  │     ├─ libcublas[11.4.2.10064]
  │     ├─ libcufft[10.4.2.58]
  │     ├─ libcurand[10.2.4.58]
  │     ├─ libcusolver[11.1.1.58]
  │     ├─ libcusparse[11.5.0.58]
  │     ├─ libnpp[11.3.3.44]
  │     └─ libnvjpeg[11.4.1.58]
  └─ cuda-toolkit[11.3.0]
     ├─ cuda-compiler[11.3.0]
     │  ├─ cuda-cuobjdump[11.3.58]
     │  ├─ cuda-cuxxfilt[11.3.58]
     │  ├─ cuda-nvcc[11.3.58]
     │  └─ cuda-nvprune[11.3.58]
     ├─ cuda-libraries already visited
     ├─ cuda-libraries-dev[11.3.0]
     │  └─ cuda-thrust[11.3.58]
     ├─ cuda-nvml-dev[11.3.58]
     ├─ cuda-samples[11.3.58]
     └─ cuda-tools[11.3.0]
        ├─ cuda-command-line-tools[11.3.0]
        │  ├─ cuda-cupti[11.3.58]
        │  ├─ cuda-gdb[11.3.58]
        │  ├─ cuda-memcheck[11.3.58]
        │  ├─ cuda-nvdisasm[11.3.58]
        │  ├─ cuda-nvprof[11.3.58]
        │  ├─ cuda-nvtx[11.3.58]
        │  └─ cuda-sanitizer-api[11.3.58]
        └─ cuda-visual-tools[11.3.0]
           ├─ cuda-libraries-dev already visited
           ├─ cuda-nvml-dev already visited
           └─ cuda-nvvp[11.3.58]

@jakirkham
Copy link
Member Author

Thanks Jaime! Would be great to have your feedback 😄

Also cc-ing @h-vetinari who may also be interested in these and have thoughts 🙂

@jakirkham
Copy link
Member Author

First thing I notice, the version strings are sometimes different across components

These are multiple separate recipes. So it makes sense that these differ. These would probably live as different feedstocks unless there is a reason to do something different

@jaimergp
Copy link
Member

How many feedstocks would that be? This includes several packages and a handful of metapackages as well, which might be difficult to maintain and keep in sync. I don't know if it would be better to have them all in the same recipe, with multiple outputs, or if that would get extra complicated (I need to take a look at the recipes to see how bad the situation would get in a single meta.yaml).

The current split makes sense to me. I like that users can choose different types of granularity. It'd be nice to have the cudatoolkit-major and cudatoolkit-major-minor metapackages too, so maintainers can choose the level of ABI compat they want to keep, as per #48. (Although with the current subdivision, they could even choose they individual components they depend on so... maybe less of a reason?).

Also, is nvcc now redistributable? Or conda-forge wouldn't have access to that one?

@jakirkham
Copy link
Member Author

AIUI combining them into a single recipe would be complicated.

Is that still needed with CUDA Enhanced Compatibility?

My understanding is all of this would be available to conda-forge. I don't know about general redistributability.

@jaimergp
Copy link
Member

Having nvcc as a conda package would simplify parts of the setup, but I guess we still need the drivers in place.

I guess we'll have to do with several feedstocks, and have cudatoolkit-feedstock as the main hub for discussions.

@jaimergp
Copy link
Member

jaimergp commented Jul 26, 2021

I am trying out the packages to build OpenMM locally (maybe I'll put a PR to showcase how it's working once I get there).

One thing I noticed is that cuda-nvcc is not platform-suffixed, so compiler('cuda') (mapped with cuda_compiler: cuda-nvcc in conda_build_config.yaml) won't work because conda will try to find cuda-nvcc_linux-64 :/

@jaimergp
Copy link
Member

jaimergp commented Jul 27, 2021

Ok, so I have this working with OpenMM / 11.3 / Linux-64! Check conda-forge/openmm-feedstock#60. It's failing only because I am using mambabuild to speed up the debugging.

A few notes:

  • OpenMM does not use CMake's lang features, only FindCUDAToolkit.
  • For the autodetection to work, I needed to add both cuda-nvcc and cuda-cudart to build. No extra CMake / env variables were needed. It was all automatic!
  • host needed libcufft, cuda-cudart, cuda-nvrtc, cuda-nvprof.
  • The packages are not setting run_exports at all, so these need to be added manually to run. conda-build will add helpful warnings otherwise.
  • I used the 11.2 Docker image. I believe this is not interfering too much (just providing a recent enough driver I guess?), since the CUDA location found is BUILD_PREFIX, version is 11.3.58, and the build does not work if I remove the cuda-* packages from BUILD_PREFIX.

So, in short, the feedback for now is super positive (at least for OpenMM), but:

  • we need run_exports definitions.
  • cuda-nvcc needs to be renamed compiler-style with platform suffixes.

I'd recommend that CUDA-heavier projects take this example and adapt it in their feedstocks in a WIP PR to confirm my findings.

@jaimergp
Copy link
Member

jaimergp commented Jul 30, 2021

Trying out Windows now. Initial feedback below:

  • The contents tree uses the root PREFIX directly. Previous cudatoolkit packages used PREFIX\Library aka %LIBRARY_PREFIX%. This causes some issues with the expected paths.
  • Had to add all CUDA packages to both build and host for headers to be found. I don't know if that's making CMake find headers in BUILD_PREFIX, or that CMake is detecting PREFIX as the CUDA location.

@jaimergp
Copy link
Member

One more thought, looks like CMake needs both nvcc and cudart to be present in the same PREFIX to detect a CUDA location accurately. In that case, we might just leave the packages named as they are (e.g. no need to have cuda-nvcc_linux-64 and the like), but just have our nvcc wrapper point to these dependencies for CUDA 11.3+? For CUDA <=11.2, we would still use the wrapper?

@jakirkham
Copy link
Member Author

It might make sense to do away with building against every minor version at the same time we move to these packages and use CUDA Enhanced Compatibility instead. This will cutdown the amount of rebuilding we need to do while still supporting CUDA 11.0+

@jakirkham
Copy link
Member Author

cc @seibert

@jaimergp
Copy link
Member

@jakirkham Is there anything else you want me test/try here?

@jaimergp
Copy link
Member

Oh, by the way, I forgot to mention that thanks to these packages, the Windows setup script for CUDA is not needed! Note how it fails because 11.3 is not available yet there and still the build succeeds for OpenMM.

@leofang

This comment has been minimized.

@leofang
Copy link
Member

leofang commented Aug 16, 2021

Ah, it's cuda-nvprof.

@leofang
Copy link
Member

leofang commented Aug 17, 2021

Hi @jakirkham, looks like the entire folder /usr/local/cuda/include/crt/ is not included in cuda-cudart as it should be?

@leofang
Copy link
Member

leofang commented Aug 22, 2021

It seems CuPy can be compiled with the new package layout on linux64 and windows (conda-forge/cupy-feedstock#143). The steps I took is similar to @jaimergp's tests for OpenMM but with some differences. Here it is:

For CBC:

  1. Add nvidia to channel_sources
  2. For linux64, use the non-CUDA cos7 docker image (to avoid accidentally using non-conda CTK)
  3. Change nvcc to cuda_nvcc in cuda_compiler
  4. Add CUDA 11.3 (to ensure CTK comes from nvidia and not from conda-forge)

For meta.yaml:

  1. For linux64, append -L${PREFIX}/lib/stubs to CFLAGS (to let the linker find the libcuda.so stub, as I do not use any docker image with CUDA pre-installed)
  2. For linux64, append -ccbin $CXX to nvcc so that it knows we're using conda-forge's compiler (this is a job done in nvcc-feedstock and should also be done in cuda-nvcc [1]).
  3. Set CUDA_PATH to $PREFIX (again, a job done in nvcc-feedstock)
  4. Temporarily remove all optional "satellite" CUDA dependencies like cuDNN, NCCL, etc (they need to be rebuilt against the new CUDA package to fix their dependencies)
  5. Remove {{ compiler("cuda") }} in the build requirement [2]
  6. Add cuda-nvcc and cuda-cudart to the host requirement [2]
  7. Add needed CUDA components to the host section (in CuPy's case this includes cuda-cudart, cuda-nvrtc, cuda-nvprof and all math libraries) [3]

[1]: On Windows, nvcc for some reason picks up the host compiler correctly, but I think it's a bit suspicious considering the CI setup might do something when installing CUDA in the CI runner...

[2]: I had a setup like this

build:
  - ...
  - cuda-nvcc

host:
  - cuda-nvcc
  - cuda-cudart
  - ...

but apparently the nvcc from build's cuda-nvcc alone cannot find cuda_runtime.h from host's cuda-cudart so they must come in pair. I bet they can both be moved to the build section. Another reason that they must come in pair: without cuda-cudart the CUDA runtime headers in include/crt cannot be found. I believe this makes the naive substitution s/{{ compiler("cuda") }}/cuda-nvcc difficult.

[3]: Currently there's no version pinning of each CUDA components to its parent CTK version. So, for example, when I request libcufft the one installed is actually from CTK 11.4.1, not from 11.3.x. I can, of course, find out the exact version that comes with CTK 11.3.0 and use this info, but it's too tedious for downstream feedstock maintainers and I consider it a bug to be fixed.

Note: I haven't checked if all CUDA components have the correct run_exports -- I suspect they don't, given @jaimergp's discussions above.

cc: @kmaehashi @jakirkham

@leofang
Copy link
Member

leofang commented Aug 22, 2021

I haven't checked if all CUDA components have the correct run_exports -- I suspect they don't

Confirmed run_exports is not correctly set up, so during the test phase libcudart.so cannot be found:
https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=366350&view=logs&j=d0d954b5-f111-5dc4-4d76-03b6c9d0cf7e&t=841356e0-85bb-57d8-dbbc-852e683d1642&l=6833
Looks like among all the components only cuda-nvcc has run_exports?

@jakirkham
Copy link
Member Author

So to summarize what are the changes we would like to see here? Could we come up with a bullet point list?

@leofang leofang mentioned this issue Nov 22, 2021
1 task
@kkraus14
Copy link
Contributor

Hey all, is there any updates or plans on upstreaming the new packages that are being produced in the nvidia channel to conda forge? They're a huge quality of life improvement versus the current state in conda-forge!

@jakirkham
Copy link
Member Author

@jaimergp would you have time to look at the 11.6 packages on the nvidia channel and let us know if your concerns have been addressed? 🙂

@jaimergp
Copy link
Member

I'll try next week!

@jakirkham
Copy link
Member Author

@jaimergp have you had a chance to look? If not, no worries 🙂

@jaimergp
Copy link
Member

jaimergp commented Feb 4, 2022

I opened a PR on the openmm-feedstock as usual: conda-forge/openmm-feedstock#69

It's just the same we had for 11.5, but using 11.6. Let's see if we can clean it up!

@hansenms
Copy link

@jakirkham, I have just randomly bounced into this thread. I have been using the cuda packages from the nvidia channel for a little bit but recently (yesterday) the 11.6.0 subchannel (label) packages broke. I wrote this post on the nvidia developer forum: https://forums.developer.nvidia.com/t/conda-packages-from-the-nvidia-label-cuda-11-6-0-failing-to-install/204281 but I am not actually sure anybody is picking it up. Where is the right place to provide feedback on these packages?

@jakirkham
Copy link
Member Author

@hansenms Keeping that discussion in that developer forum seems like the right choice atm

This thread is about integrating those packages into conda-forge as well as periodic testing to confirm issues raised previously have been addressed

@hansenms
Copy link

@jakirkham I will see if somebody has thoughts or a response on the developer forum. If you have any contacts you can direct there, it would be much appreciated. I would at least like to understand it a bit better. Thanks.

@jakirkham
Copy link
Member Author

I've already asked someone internally to take a look 🙂

@hansenms
Copy link

@jakirkham, I hate to ping you on this again, but nobody has responded on that thread on the nvidia developer forum. Is there a better way to reach the team that make the conda packages in the nvidia channel?

@hansenms
Copy link

hansenms commented Mar 7, 2022

@jakirkham not sure if there is something that you could do, but I would really like to know where to post an issue/bug/problem with the official nvidia conda packages if it is not on the developer forum (where there is no response) and not here. Any good suggestions for where to raise the issue would be much appreciated.

@hansenms
Copy link

@jakirkham the problem mentioned above just happened overnight again for the cuda 11.6.1 packages. It is just massively disruptive that these packages keep getting broken overnight and nobody is responding to the issue raised on the nvidia developer forum. Where can one go to raise an issue about these packages?

@leofang
Copy link
Member

leofang commented Mar 25, 2022

Looks like the old packages are removed whenever a new package is uploaded.

@hansenms
Copy link

@leofang yes, but that is massively disruptive. And nvidia's own instructions are broken, so for instance:

conda install -c nvidia/label/cuda-11.6.1 cuda-toolkit

Will not work because they removed the samples package and other packages in that label.

@ngam
Copy link

ngam commented Jun 7, 2022

@jakirkham let's get this done 🚀 🌕 🙌 💎

boom boom, buzz buzz

@ngam
Copy link

ngam commented Jun 7, 2022

Is there anything non-nvidia people can do to help get this done? Are there any blockers or can we get an update on the status of this? Thanks all for the great work!

@ngam
Copy link

ngam commented Aug 15, 2022

Another gentle nudge to get this going. The JAX package requires ptxas and without it, our hard work and efforts to have cuda-enabled builds are incomplete. (Likewise for newer optimizations in tensorflow btw.) It would be really good to push this through. Please let us know how we can help. Or if there is anyway we can do to get ptxas added to the cudatoolkit package.

@conda-forge/core, is there anything we can do to push this forward?

@jakirkham
Copy link
Member Author

As we are now doing this work and it is being tracked in issue ( conda-forge/staged-recipes#21382 ), closing this issue in favor of discussion there and reviews of PRs linked from that staged-recipes issue. Thanks everyone! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants