Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable building for CUDA #1

Merged
merged 13 commits into from
Jun 6, 2020
Merged

enable building for CUDA #1

merged 13 commits into from
Jun 6, 2020

Conversation

h-vetinari
Copy link
Member

Checklist

  • Used a fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Follow-up from conda-forge/staged-recipes#11337.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@h-vetinari
Copy link
Member Author

@conda-forge-admin, please rerender

@h-vetinari
Copy link
Member Author

@mdouze @beauby
Do you package builds for cuda 9.2? Here it fails with

/usr/local/cuda/bin/nvcc -I /usr/local/cuda/targets/x86_64-linux/include/ -Xcompiler -fPIC -Xcudafe --diag_suppress=unrecognized_attribute -gencode=arch=compute_35,code=compute_35 -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_60,code=compute_60 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_75,code=compute_75 -lineinfo -ccbin /home/conda/feedstock_root/build_artifacts/faiss-split_1589216379077/_build_env/bin/x86_64-conda_cos6-linux-gnu-c++ -std=c++11 -DFAISS_USE_FLOAT16 -I. -g -O3 -c gpu/GpuDistance.cu -o gpu/GpuDistance.o
nvcc fatal   : Unsupported gpu architecture 'compute_75'

Seems we could take out the compilation for that architecture, or is it unsupported already from your side?

@h-vetinari
Copy link
Member Author

h-vetinari commented May 11, 2020

The cuda >=10 builds fail with:

> /usr/local/cuda/bin/nvcc -I /usr/local/cuda/targets/x86_64-linux/include/ -Xcompiler -fPIC -Xcudafe --diag_suppress=unrecognized_attribute -gencode=arch=compute_35,code=compute_35 -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_60,code=compute_60 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_75,code=compute_75 -lineinfo -ccbin /home/conda/feedstock_root/build_artifacts/faiss-split_1589216379886/_build_env/bin/x86_64-conda_cos6-linux-gnu-c++ -std=c++11 -DFAISS_USE_FLOAT16 -I. -g -O3 -c gpu/impl/PQScanMultiPassNoPrecomputed.cu -o gpu/impl/PQScanMultiPassNoPrecomputed.o

gpu/impl/PQScanMultiPassNoPrecomputed.cu(27): error: function "faiss::gpu::isSupportedNoPrecomputedSubDimSize" has already been defined
gpu/impl/PQScanMultiPassNoPrecomputed.cu(51): error: class template "faiss::gpu::LoadCodeDistances" has already been defined
gpu/impl/PQScanMultiPassNoPrecomputed.cu(123): error: function template "faiss::gpu::pqScanNoPrecomputedMultiPass" has already been defined
3 errors detected in the compilation of "/tmp/tmpxft_00000ed3_00000000-11_PQScanMultiPassNoPrecomputed.compute_75.cpp1.ii".

@mdouze @beauby @jakirkham, I have no experience with building GPU packages, so comments/pointers are welcome!

Since the makefiles are quite vanilla, I'd be tempted to say this is an error in the upstream code, but then, you've managed to build it for the pytorch channel...

@h-vetinari
Copy link
Member Author

@dantegd: [...] are there plans to package also the C++ API in the conda package/binary at some point? That would be super useful.

Hey @dantegd, not sure if you unsubscribed from the frenzy of commits in conda-forge/staged-recipes#11337, but if yes, the good news is that the C++ API is now in fact being packaged separately. Right now, it's only the CPU-part of the C++-API, but once this PR is successfully merged, you should be able to just use libfaiss =*=*gpu in your dependencies (you can replace the first asterisk with the desired faiss-version).

@mdouze
Copy link

mdouze commented May 11, 2020

The cuda >=10 builds fail with:

gpu/impl/PQScanMultiPassNoPrecomputed.cu(27): error: function "faiss::gpu::isSupportedNoPrecomputedSubDimSize" has already been defined
gpu/impl/PQScanMultiPassNoPrecomputed.cu(51): error: class template "faiss::gpu::LoadCodeDistances" has already been defined
gpu/impl/PQScanMultiPassNoPrecomputed.cu(123): error: function template "faiss::gpu::pqScanNoPrecomputedMultiPass" has already been defined
3 errors detected in the compilation of "/tmp/tmpxft_00000ed3_00000000-11_PQScanMultiPassNoPrecomputed.compute_75.cpp1.ii".

@mdouze @beauby @jakirkham, I have no experience with building GPU packages, so comments/pointers are welcome!

Since the makefiles are quite vanilla, I'd be tempted to say this is an error in the upstream code, but then, you've managed to build it for the pytorch channel...

@wickedfoo, any idea what this could be?

@h-vetinari
Copy link
Member Author

h-vetinari commented May 11, 2020

I mentioned in the other PR that one remaining difference is that we're not passing $CUDA_ARCH, and hence this seems to be building for several of them...

@h-vetinari: Also note that upstream uses --with-cuda as well as --with-cuda-arch, but I'm not sure what to pass as $CUDA_ARCH.

However, it doesn't read to me like the error is related to that (because several modules got correctly compiled by nvcc before), but in any case, here's the full call that failed:

> /usr/local/cuda/bin/nvcc -I /usr/local/cuda/targets/x86_64-linux/include/ -Xcompiler -fPIC -Xcudafe --diag_suppress=unrecognized_attribute -gencode=arch=compute_35,code=compute_35 -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_60,code=compute_60 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_75,code=compute_75 -lineinfo -ccbin [...snip...]/_build_env/bin/x86_64-conda_cos6-linux-gnu-c++ -std=c++11 -DFAISS_USE_FLOAT16 -I. -g -O3 -c gpu/impl/PQScanMultiPassNoPrecomputed.cu -o gpu/impl/PQScanMultiPassNoPrecomputed.o

@wickedfoo
Copy link

@h-vetinari @mdouze The compilation error looks like you have a stray faiss/gpu/impl/PQScanMultiPassNoPrecomputed.cu file which has been deleted from the repository as of this commit:

facebookresearch/faiss@64dd988#diff-948369eef79ac94c4b1a946328ee607f

@h-vetinari
Copy link
Member Author

@wickedfoo
Thanks for the quick tip!

Seems we'll need to restrict $CUDA_ARCH in some way (cf. my last comment):

+ python conda/faiss-gpu/run_test.py
Faiss assertion 'err__ == cudaSuccess' failed in int faiss::gpu::getNumDevices() at gpu/utils/DeviceUtils.cu:36; details: CUDA error 35 CUDA driver version is insufficient for CUDA runtime version
/home/conda/feedstock_root/build_artifacts/faiss-split_1589265142331/test_tmp/run_test.sh: line 8:  9487 Aborted                 (core dumped) python conda/faiss-gpu/run_test.py
Tests failed for faiss-1.6.3-py37h4315b5e_1_gpu.tar.bz2 - moving package to /home/conda/feedstock_root/build_artifacts/broken

By default, it's building for:

compute_35
compute_52
compute_60
compute_61
compute_70
compute_75

Not sure how this is handled elsewhere in conda-forge @conda-forge/core (side note: how about forming a team @conda-forge/help-gpu ...?)

@wickedfoo
Copy link

@h-vetinari You might also want to get these commits as well that fix CUDA 8 and CUDA 10 warnings, since it appears that you are getting this warning in your CUDA 10 build:

2020-05-12T07:08:21.5654836Z gpu/utils/DeviceUtils.cu:114:19: warning: 'cudaPointerAttributes::memoryType' is deprecated [-Wdeprecated-declarations]
2020-05-12T07:08:21.5655909Z    } else if (att.memoryType == cudaMemoryTypeHost) {

CUDA 8 fixes:
facebookresearch/faiss@e05f773

CUDA 10 fixes:
facebookresearch/faiss@ba061ff

@h-vetinari
Copy link
Member Author

h-vetinari commented May 12, 2020

@wickedfoo
Thanks a bunch!
The compiler warnings weren't high enough on my list of priorities to tackle so far, but I've included the patches you mentioned. If you look in the runs for linux-cpu as well as OSX, you'll see see several more of them (many more in the case of OSX).

@h-vetinari h-vetinari force-pushed the gpu branch 2 times, most recently from d568885 to 5e8fb54 Compare May 12, 2020 18:54
@jakirkham
Copy link
Member

@mdouze @beauby @jakirkham, I have no experience with building GPU packages, so comments/pointers are welcome!

Thanks for the ping! Am taking some time off this week. Will take a look next week though 🙂

@h-vetinari
Copy link
Member Author

@jakirkham: Thanks for the ping! Am taking some time off this week. Will take a look next week though 🙂

No worries, thanks for responding nevertheless!

@h-vetinari h-vetinari force-pushed the gpu branch 2 times, most recently from 6ea0585 to 9a2f309 Compare May 13, 2020 07:42
@h-vetinari
Copy link
Member Author

h-vetinari commented May 13, 2020

Pinging the collaborators on the linux-anvil-cuda docker image... @mbargull @isuruf @scopatz @bdice

It seems that my hunt for the right CUDA_ARCH was a red herring, as I always get the error

Faiss assertion 'err__ == cudaSuccess' failed in int faiss::gpu::getNumDevices() at gpu/utils/DeviceUtils.cu:36; details: CUDA error 35 CUDA driver version is insufficient for CUDA runtime version

This made me wonder: are there NVidia drivers in the image at all (it doesn't seem so from the dockerfiles for linux-anvil-cuda or the upstream nvidia/cuda:${CUDA_VER}-devel-centos6), and if no, how should I install them?

Looking around for "cuda driver" in conda-forge, it seems that pyarrow has a similar problem?

recipe/build-lib.sh Outdated Show resolved Hide resolved
recipe/build-lib.sh Show resolved Hide resolved
recipe/build-lib.sh Outdated Show resolved Hide resolved
Copy link

@teju85 teju85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed changes in the build script, especially the support GPU archs LGTM (please also add sm_50 in the list like we discussed above). Explicitly tagging @cjnolet @dantegd in case they can bring another perspective here.

@h-vetinari
Copy link
Member Author

Gentle ping @jakirkham :)
I think this should be in good order, but since it's my first GPU build, I'd just really like to confirm if everything works as it should, and I can't for lack of a GPU in the CI...

@jakirkham
Copy link
Member

Yep, thanks for the reminder. It's on my list :)

@h-vetinari
Copy link
Member Author

Another gentle ping @jakirkham (I'm sure that list of yours is mighty full, but new week, new luck 😉)

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @h-vetinari both for all of the work and the reminder! Apologies for letting this slip.

Largely things seem good here. I made a few small comments, but I think all of the reviewers comments and your work have put this in a pretty good place already. 🙂

Please let me know if you have any questions or specific things you'd like me to look at and will happily look at those as well.

# docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#major-components__table-cuda-toolkit-driver-versions
# docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list

# the following are all the x86-relevant gpu arches; for building aarch64-packages, add: 53, 62, 72
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we go ahead and add a if for this case in preparation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to cross this bridge when we come to it. ;-)
My comment is the preparation 😋

@@ -8,7 +28,7 @@ fi
# Build vanilla version (no avx)
./configure --prefix=${PREFIX} --exec-prefix=${PREFIX} \
--with-blas=-lblas --with-lapack=-llapack \
${CUDA_CONFIG_ARG}
${CUDA_CONFIG_ARG} --with-cuda-arch="${CUDA_ARCH}" || exit 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add this to the CUDA_CONFIG_ARG above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this first of course, but couldn't get the quotes to get passed through correctly, and then the compilation would fail.

Comment on lines 227 to +230
For best performance, the maintainers of the package
[recommend](https://github.com/conda-forge/staged-recipes/pull/11337#issuecomment-623718460)
using the MKL implementation of blas/lapack. You can ensure that this is installed
by adding "libblas =*=mkl" to your dependencies.
by adding "libblas =*=*mkl" to your dependencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this to a post-link script for greater visibility?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will look into this.

Copy link
Member Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look @jakirkham!

Would you also have a GPU locally to try if the gpu-specific tests would run through?

# docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#major-components__table-cuda-toolkit-driver-versions
# docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list

# the following are all the x86-relevant gpu arches; for building aarch64-packages, add: 53, 62, 72
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to cross this bridge when we come to it. ;-)
My comment is the preparation 😋

@@ -8,7 +28,7 @@ fi
# Build vanilla version (no avx)
./configure --prefix=${PREFIX} --exec-prefix=${PREFIX} \
--with-blas=-lblas --with-lapack=-llapack \
${CUDA_CONFIG_ARG}
${CUDA_CONFIG_ARG} --with-cuda-arch="${CUDA_ARCH}" || exit 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this first of course, but couldn't get the quotes to get passed through correctly, and then the compilation would fail.

Comment on lines 227 to +230
For best performance, the maintainers of the package
[recommend](https://github.com/conda-forge/staged-recipes/pull/11337#issuecomment-623718460)
using the MKL implementation of blas/lapack. You can ensure that this is installed
by adding "libblas =*=mkl" to your dependencies.
by adding "libblas =*=*mkl" to your dependencies.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will look into this.

@cjnolet
Copy link

cjnolet commented Jun 3, 2020

@h-vetinari I've followed your instructions here but for some reason it's not picking up the cuda compiler version when I run the build-locally.py script. Any idea as to what I might be doing wrong?

I've also updated the conda_build_config.yml changing None to 10.2.

valid configs are {'linux_cuda_compiler_versionNone', 'osx_'}
config not selected, please choose from the following:

1. linux_cuda_compiler_versionNone
2. osx_

>        

I should point out that I'm on Centos7.

@h-vetinari
Copy link
Member Author

@cjnolet
Thanks a bunch for giving this a shot! Not sure what other tweaks are necessary to build a GPU package locally. I remember seeing some instructions at https://github.com/AnacondaRecipes/aggregate/tree/master/pytorch-feedstock/recipe, but not sure they apply generally (probably should use the condaforge/linux-anvil-cuda:10.2 image instead, for example). Any tips @jakirkham @isuruf?

@jakirkham
Copy link
Member

@cjnolet, do you have shyaml installed?

@h-vetinari
Copy link
Member Author

Merging this, let's try how it fares in real life. Post-link script can be added in another PR.

@h-vetinari h-vetinari changed the title retry building for GPU enable building for CUDA Jun 6, 2020
@h-vetinari h-vetinari merged commit b2e44fc into conda-forge:master Jun 6, 2020
@h-vetinari h-vetinari mentioned this pull request Jun 7, 2020
@h-vetinari
Copy link
Member Author

With the disk space issues figured out now, GPU builds for faiss have been live for ~2 days. Has anyone had a time to give them a twirl so far?

@h-vetinari h-vetinari deleted the gpu branch June 15, 2020 21:20
@jakirkham
Copy link
Member

With the disk space issues figured out now, GPU builds for faiss have been live for ~2 days. Has anyone had a time to give them a twirl so far?

@cjnolet @teju85 @dantegd ? 🙂

@jakirkham
Copy link
Member

With the disk space issues figured out now, GPU builds for faiss have been live for ~2 days. Has anyone had a time to give them a twirl so far?

We tried. Unfortunately it looks like we are missing some packages for CUDA 10.1 and 10.2. Raised as issue ( #9 ).

@h-vetinari h-vetinari mentioned this pull request Jun 24, 2020
1 task
@h-vetinari
Copy link
Member Author

@jakirkham: We tried. Unfortunately it looks like we are missing some packages for CUDA 10.1 and 10.2. Raised as issue ( #9 ).

Now that #9 is closed, would you mind re-trying? 🙂

@jakirkham
Copy link
Member

Yep, certainly. Thanks for the help here @h-vetinari 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants