Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Cuda Jobs #3611

Closed
RudolfWeeber opened this issue Mar 31, 2020 · 5 comments
Closed

CI Cuda Jobs #3611

RudolfWeeber opened this issue Mar 31, 2020 · 5 comments

Comments

@RudolfWeeber
Copy link
Contributor

This is the suggested list of GPU CI jobs to run in the future:

  • Ubuntu 20.04 with Ubuntu-supplied Cuda 10.1
  • Ubuntu 18.04 with Ubuntu-supplied Cuda 9.1
  • NVidia supplied Ubuntu with Cuda-latest (currently 10.2)
  • OSX with Cuda 10.2
  • Ubuntu 20.04 with Rocm (3, I suppose?)

This ticket only pertains to GPU jobs. Everything else CI/docker is a different matter

@mkuron
Copy link
Member

mkuron commented Mar 31, 2020

We can drop CUDA on OSX. Apple hasn't shipped Macs with Nvidia GPUs since the Kepler generation and CUDA 10.2 is the last version to support Kepler.

@jngrad
Copy link
Member

jngrad commented Apr 25, 2020

Here's the status:

  • Ubuntu 20.04 with Ubuntu-supplied Cuda 10.1 -> done, we can drop our cuda10.1 image
  • Ubuntu 18.04 with Ubuntu-supplied Cuda 9.1 -> done, we can drop our cuda9.2 image
  • Ubuntu 20.04 with ROCm -> base image dev-ubuntu-20.04 not yet available
  • Nvidia-supplied Ubuntu with Cuda-latest -> base image nvidia/cuda:10.2-devel-ubuntu20.04 not yet available
  • OSX with Cuda 10.2 -> removed
  • Ubuntu 16.04 with Cuda 9 and Clang 6 -> removed

jngrad added a commit that referenced this issue Apr 27, 2020
Description of changes:
- reduce number of CI images for CUDA jobs (partial fix for #3611)
- test CUDA 9.1 and 10.1 using compatible compilers without patches (fixes #3654)
- drop support for Ubuntu 16.04
- bump minimal Boost version to 1.65 (partial fix for #3093)
- bump Python packages to the versions available in Ubuntu 18.04 (partial fix for #3421)
- add missing lxml package (fixes #3686)
- fix issues in docs revealed by the new Doxygen and Sphinx versions
@jngrad
Copy link
Member

jngrad commented Jun 15, 2020

Concerning the NVidia-supplied Ubuntu image, we run into the following CMake issue when looking for CUDA package cublas with find_library(CUDA_CUBLAS_LIBRARIES NAMES cublas PATHS ...):

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cublas_device_LIBRARY (ADVANCED)
    linked by target "sd_gpu" in directory /home/espresso/espresso/libs/stokesian_dynamics/src

-- Configuring incomplete, errors occurred!

According to https://stackoverflow.com/a/54350198 that's because CMake versions below 3.12.2 don't know that cublas_device was removed in CUDA 9.2. I can confirm that in the intel:19 image: several CMake version <= 3.12.0 fail while CMake 3.13.0 succeeds. Stokesian Dynamics currently fails CI on the intel image. Our Ubuntu image with CUDA 10.1 succeeds though, so the maintainers of nvidia-cuda-toolkit must have patched it.

@KaiSzuttor
Copy link
Member

@jngrad any update here?

@jngrad
Copy link
Member

jngrad commented Aug 3, 2020

I think the issue with the NVidia-supplied Ubuntu image was solved in the intel:19 image. So we're mostly done with this ticket. The last item in the list is ROCm, for which there is still no Ubuntu 20.04 image. The ROCm 3.5.0/3.5.1 releases are also not working properly due to an incorrect path in their HIP wrapper script.

Since this ticket was opened, CUDA 11 was released. If we want to test it in CI, we'll need the NVidia-supplied Ubuntu image. Providing support for CUDA 11 shouldn't be too difficult, IIRC we only need to update a library path (a few shared objects are now in the CUDA compat folder) and fix a compiler warning about a C-style string.

kodiakhq bot added a commit that referenced this issue Aug 21, 2020
Follow-up to #3611

Description of changes:
- Add support for CUDA 11
- Add CUDA 11 CI job
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants