-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA 11 (libfaiss) conda package triggers JIT compilation on Turing GPUs #36
Comments
I'm guessing you have already looked at nvidia-smi or used other profiling tools to see what is going on. Would be interesting to include that info if you have it. Maybe that sheds light on where things are getting stuck |
Hey, sorry to hear this isn't working well. I'll happily admit that the GPU build options around real vs. virtual, JIT, PTX, etc. are a bit over my head - I've often received help on this from the nVidia folks (e.g. @kkraus14 & @teju85 helping out in #1). With the changes of the upstream build system to CMake, I've done the best I can based on the CMake documentation, but it's possible that I'm doing this wrong - pertinent parts are in https://github.com/conda-forge/faiss-split-feedstock/blob/master/recipe/build-lib.sh. My goal was building for maximum compatibility, and at the time, PTX JIT compilation was recommended to me. If this should be removed and or amended somehow, I'll happily accept PRs (or guidance what to do). |
@h-vetinari thanks for the response! Indeed @teju85's advice is the best recommendation and what faiss did explicitly in version 1.6.3 with the prior build system, so faiss 1.6.3 works smoothly for all its intended archs. I think the issue was a minor mixup in the usage of the fairly recent -DCMAKE_CUDA_ARCHITECTURES=52-virtual;60-virtual;61-virtual;70-virtual;75-virtual;80-virtual;86-virtual;86-real which causes it to include device code for compute 86 (i.e. 3070/80/90) and PTX for anything under it, so that is what causes it to trigger a JIT compilation when say Turing (75) or Pascal (60s) call it, and we can also inspect (as @teju85 recommended): (ns0311-110) ➜ lib cuobjdump libfaiss.so -lelf
ELF file 1: libfaiss.1.sm_80.cubin
ELF file 2: libfaiss.2.sm_80.cubin
ELF file 3: libfaiss.3.sm_80.cubin
ELF file 4: libfaiss.4.sm_80.cubin
... Now most RAPIDS libraries are in the process of migrating to using -DCMAKE_CUDA_ARCHITECTURES=60-real;70-real;75-real;80 This causes what is (if I'm not mistaken) our intended result, having device code for supported archs (so that supported GPUs can just just cuDF without needing a long JIT compilation step), and then including the PTX for 80 so say if a future GPU with 90+ (or say 50 assuming compatibility) would be able to JIT compile and still use cuDF. And inspecting we can see (ns0311-110) ➜ lib cuobjdump libcudf.so -lelf
ELF file 1: libcudf.1.sm_60.cubin
ELF file 2: libcudf.2.sm_70.cubin
ELF file 3: libcudf.3.sm_75.cubin
ELF file 4: libcudf.4.sm_80.cubin
ELF file 5: libcudf.5.sm_60.cubin
ELF file 6: libcudf.6.sm_70.cubin
ELF file 7: libcudf.7.sm_75.cubin
ELF file 8: libcudf.8.sm_80.cubin
... Which is the similar to how faiss 1.6.3 was: lib cuobjdump libfaiss.so -lelf
ELF file 1: GpuIndex.sm_35.cubin
ELF file 2: GpuIndex.sm_50.cubin
ELF file 3: GpuIndex.sm_52.cubin
ELF file 4: GpuIndex.sm_60.cubin
ELF file 5: GpuIndex.sm_61.cubin
ELF file 6: GpuIndex.sm_70.cubin
ELF file 7: GpuIndex.sm_75.cubin
ELF file 8: GpuIndex.sm_80.cubin
ELF file 9: GpuIndexBinaryFlat.sm_35.cubin
ELF file 10: GpuIndexBinaryFlat.sm_50.cubin
ELF file 11: GpuIndexBinaryFlat.sm_52.cubin
ELF file 12: GpuIndexBinaryFlat.sm_60.cubin
ELF file 13: GpuIndexBinaryFlat.sm_61.cubin
ELF file 14: GpuIndexBinaryFlat.sm_70.cubin
ELF file 15: GpuIndexBinaryFlat.sm_75.cubin
ELF file 16: GpuIndexBinaryFlat.sm_80.cubin
... Difference in name of the So that was a very verbose way of describing the solution proposed in #37 |
Thanks for the analysis! |
Issue: Installing the current CUDA 11 conda package (
libfaiss
in particular) on computers with Turing GPUs (tested on RTX 8000 and 2070S) triggers a JIT compilation in the first call that uses GPU resources, causing a delay of minutes. It works fine on Ampere GPUs (tested on 3080), also works fine on CUDA 10.2 with Turing. The packages are:Reproduced with the following code:
This was an issue that we saw first in cuML (that uses FAISS): rapidsai/cuml#3602
Environment (
conda list
):Details about
conda
and system (conda info
):cc @viclafargue @hcho3 @jakirkham
The text was updated successfully, but these errors were encountered: