Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor for cmake-based infra in 1.6.4; add py3.9 & cpu-only windows #17

Merged
merged 31 commits into from
Dec 12, 2020

Conversation

h-vetinari
Copy link
Member

@h-vetinari h-vetinari commented Oct 26, 2020

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Not sure why the regro bot didn't pick up the new version, but let's try building for it. The build system was refactored pretty extensively upstream (cf. facebookresearch/faiss#1313), and windows support has been added, so this will probably take a bit to get right.

Closes #2
Closes #18

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@h-vetinari
Copy link
Member Author

@conda-forge-admin, please rerender

@h-vetinari
Copy link
Member Author

@conda-forge/core

I'm having a failure coming from conda inspect linkages -p $PREFIX $PKG_NAME (that I added out of an abundance of caution in the original PR), but this is a part of the conda bowels I've never had exposure to, and would really appreciate some help here:

linux
+ conda inspect linkages -p /home/conda/feedstock_root/build_artifacts/faiss-split_1606946911821/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac libfaiss
WARNING: pyldd disagrees with ldd/otool. This will not cause any
WARNING: problems for this build, but please file a bug at:
WARNING: https://github.com/conda/conda-build
WARNING: and (if possible) attach file $PREFIX/lib/libfaiss.so
WARNING: 
ldd/otool gives:
('linux-vdso.so.1', '')
('libblas.so.3', '$PREFIX/lib/./libblas.so.3')
('libgomp.so.1', '$PREFIX/lib/./libgomp.so.1')
('libpthread.so.0', '/lib64/libpthread.so.0')
('libstdc++.so.6', '$PREFIX/lib/./libstdc++.so.6')
('libm.so.6', '/lib64/libm.so.6')
('libgcc_s.so.1', '$PREFIX/lib/./libgcc_s.so.1')
('libc.so.6', '/lib64/libc.so.6')
('libgfortran.so.5', '$PREFIX/lib/././libgfortran.so.5')
('librt.so.1', '/lib64/librt.so.1')
('libdl.so.2', '/lib64/libdl.so.2')
('libquadmath.so.0', '$PREFIX/lib/././libquadmath.so.0')
pyldd gives:
('libblas.so.3', 'libblas.so.3')
('liblapack.so.3', 'liblapack.so.3')
('libgomp.so.1', 'libgomp.so.1')
('libpthread.so.0', 'libpthread.so.0')
('libstdc++.so.6', 'libstdc++.so.6')
('libm.so.6', 'libm.so.6')
('libgcc_s.so.1', 'libgcc_s.so.1')
('libc.so.6', 'libc.so.6')
('ld-linux-x86-64.so.2', 'ld-linux-x86-64.so.2')
('libgfortran.so.5', 'libgfortran.so.5')
('libquadmath.so.0', 'libquadmath.so.0')
('librt.so.1', 'librt.so.1')
('libdl.so.2', 'libdl.so.2')

Diffs
{('libstdc++.so.6', '$PREFIX/lib/./libstdc++.so.6'), ('libm.so.6', '/lib64/libm.so.6'), ('linux-vdso.so.1', ''), ('libblas.so.3', '$PREFIX/lib/./libblas.so.3'), ('libgomp.so.1', '$PREFIX/lib/./libgomp.so.1'), ('libc.so.6', '/lib64/libc.so.6'), ('librt.so.1', '/lib64/librt.so.1'), ('libgfortran.so.5', '$PREFIX/lib/././libgfortran.so.5'), ('libgcc_s.so.1', '$PREFIX/lib/./libgcc_s.so.1'), ('libpthread.so.0', '/lib64/libpthread.so.0'), ('libquadmath.so.0', '$PREFIX/lib/././libquadmath.so.0'), ('libdl.so.2', '/lib64/libdl.so.2')}
Diffs
{('libgfortran.so.5', 'libgfortran.so.5'), ('libgcc_s.so.1', 'libgcc_s.so.1'), ('libblas.so.3', 'libblas.so.3'), ('liblapack.so.3', 'liblapack.so.3'), ('libpthread.so.0', 'libpthread.so.0'), ('librt.so.1', 'librt.so.1'), ('libc.so.6', 'libc.so.6'), ('ld-linux-x86-64.so.2', 'ld-linux-x86-64.so.2'), ('libgomp.so.1', 'libgomp.so.1'), ('libquadmath.so.0', 'libquadmath.so.0'), ('libdl.so.2', 'libdl.so.2'), ('libm.so.6', 'libm.so.6'), ('libstdc++.so.6', 'libstdc++.so.6')}
libfaiss
--------

_openmp_mutex-4.5-1_gnu:
    libgomp.so.1 (lib/libgomp.so.1)

libblas-3.9.0-3_openblas:
    libblas.so.3 (lib/libblas.so.3)

libgcc-ng-9.3.0-h5dbcf3e_17:
    libgcc_s.so.1 (lib/libgcc_s.so.1)
    libquadmath.so.0 (lib/libquadmath.so.0)

libgfortran5-9.3.0-he4bcb1c_17:
    libgfortran.so.5 (lib/libgfortran.so.5)

libstdcxx-ng-9.3.0-h2ae2ef3_17:
    libstdc++.so.6 (lib/libstdc++.so.6)

system:
    libc.so.6 (/lib64/libc.so.6)
    libdl.so.2 (/lib64/libdl.so.2)
    libm.so.6 (/lib64/libm.so.6)
    libpthread.so.0 (/lib64/libpthread.so.0)
    librt.so.1 (/lib64/librt.so.1)
    linux-vdso.so.1 ()

not found:

osx
+ conda inspect linkages -p /Users/runner/miniforge3/conda-bld/faiss-split_1606947140101/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl libfaiss
WARNING: pyldd disagrees with ldd/otool. This will not cause any
WARNING: problems for this build, but please file a bug at:
WARNING: https://github.com/conda/conda-build
WARNING: and (if possible) attach file $PREFIX/lib/libfaiss.dylib
WARNING: 
ldd/otool gives:
('libfaiss.dylib', '@rpath/libfaiss.dylib')
('Accelerate', '/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate')
('libSystem.B.dylib', '/usr/lib/libSystem.B.dylib')
('libomp.dylib', '@rpath/libomp.dylib')
('libc++.1.dylib', '@rpath/libc++.1.dylib')
pyldd gives:
('Accelerate', '/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate')
('libSystem.B.dylib', '/usr/lib/libSystem.B.dylib')
('libomp.dylib', '@rpath/libomp.dylib')
('libc++.1.dylib', '@rpath/libc++.1.dylib')

Diffs
{('libfaiss.dylib', '@rpath/libfaiss.dylib')}
Diffs
set()
libfaiss
--------
libcxx-11.0.0-h4c3b8ed_1:
    libc++.1.dylib (lib/libc++.1.dylib)

libfaiss-1.6.4-hea9d0ce_0_cpu:
    libfaiss.dylib (lib/libfaiss.dylib)

llvm-openmp-11.0.0-h73239a0_1:
    libomp.dylib (lib/libomp.dylib)

system:
    Accelerate (/System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate)
    libSystem.B.dylib (/usr/lib/libSystem.B.dylib)

not found:


+ conda inspect objects -p /Users/runner/miniforge3/conda-bld/faiss-split_1606947140101/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl libfaiss
libfaiss
--------

lib/libfaiss.dylib
  filetype: DYLIB
  rpath: @loader_path

@h-vetinari
Copy link
Member Author

Also, the python-detection of the conda-packaged cmake doesn't seem to work out of the box...

@beckermr
Copy link
Member

beckermr commented Dec 3, 2020

You can remove this test. It is now in conda build and is not needed.

@h-vetinari
Copy link
Member Author

Thanks @beckermr. Would you also happen to have any tips for python-detection in cmake on windows? 🙃

@h-vetinari
Copy link
Member Author

OK, linux builds now. On OSX, I have Symbol not found: _PyCObject_Type, somewhere related to swigfaiss.so:

import: 'faiss'
Traceback (most recent call last):
  File "/Users/runner/[...]/lib/python3.8/site-packages/faiss/loader.py", line 34, in <module>
    from .swigfaiss import *
  File "/Users/runner/[...]/lib/python3.8/site-packages/faiss/swigfaiss.py", line 13, in <module>
    from . import _swigfaiss
ImportError: dlopen(/Users/runner/[...]/lib/python3.8/site-packages/faiss/_swigfaiss.so, 2): Symbol not found: _PyCObject_Type
  Referenced from: /Users/runner/[...]/lib/python3.8/site-packages/faiss/_swigfaiss.so
  Expected in: flat namespace

@beckermr
Copy link
Member

beckermr commented Dec 3, 2020

CMake is a mystery to me. Someone from @conda-forge/core might be able to help?

@h-vetinari
Copy link
Member Author

OK, linux builds now. On OSX, I have Symbol not found: _PyCObject_Type, somewhere related to swigfaiss.so:

Found this info from the pyarrow dev mailing list:

Hi Quang,

It sounds like you have compiled Arrow against a Python 2 install but
are now trying to use it with Python 3. This won't work, the same
Python version must be used when compiling and when using PyArrow.

("PyCObject" is a Python 2-specific API that doesn't exist anymore in
Python 3)

Regards
Antoine.

And indeed, while the environment is obviously based on python3, and CMake finds the correct version at build time of libfaiss...

-- Found PythonInterp: /Users/runner/miniforge3/bin/python (found version "3.8.6") 

... it then proceeds to pick up the wrong python version while building faiss-cpu:

-- Found OpenMP_CXX: -fopenmp=libomp (found version "5.0") 
-- Found OpenMP: TRUE (found version "5.0")  
-- Found Python: /System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 (found version "2.7.16") found components: Development NumPy Interpreter Development.Module Development.Embed 
-- Configuring done
-- Generating done

@h-vetinari
Copy link
Member Author

@conda-forge-admin, please rerender

@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2020

Hi! This is the friendly automated conda-forge-webservice.
I tried to rerender for you, but it looks like there was nothing to do.

@h-vetinari
Copy link
Member Author

Apparently I cannot easily skip python 3.6 builds here (neither by explicit skips, nor by trickery with conda_build_config.yaml), which is really annoying, because I think I solved the cmake issues otherwise. Opened conda/conda-build#4148

@h-vetinari h-vetinari mentioned this pull request Dec 6, 2020
@mbargull
Copy link
Member

mbargull commented Dec 6, 2020

@h-vetinari, builds with multiple outputs work on the same build files. As such the CMake caches have to be cleaned manually between builds. I've pushed a commit that just removes the build dir after each Python package has been built.

recipe/build-pkg.sh Outdated Show resolved Hide resolved
recipe/build-pkg.sh Outdated Show resolved Hide resolved
Copy link
Member Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look! :)

recipe/meta.yaml Show resolved Hide resolved
recipe/meta.yaml Show resolved Hide resolved
@h-vetinari
Copy link
Member Author

I've seen the OSX failure in #18 already, where I pinged the upstream maintainers because I couldn't figure out how it could be that the same job that previously passed CI now fails. Seeing the parallel_mode=2, (and the passing parallel_mode=1 above), could this be a bug in the updated llvm-openmp?

parallel_mode=1
sizes [2 0 0 0 2 0 0 0 9 3 0 1 4 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 9 0 0 0 0 0 0 0 0
 0 0 0 3 0 2 2 0 0 0 1 2 0 0 0 0 4 0 0 1 0 4 0 2 2 2 3 0 7 0 0 0 0 0 0 0 0
 1 0 0 0 0 0 0 3 1 0 1 0 0 0 0 0 2 0 0 5 0 0 0 5 0 0 1 0 3 0 0 0 2 2 0 0 0
 0 0 0 3 2 1 4 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 1 0 2 5 0 0 0 0 0 0
 0 0 2 0 2 0 1 0 0 0 1 2 0 0 0 0 0 1 0 0 2 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0
 1 0 0 6 0 0 0 0 0 0 0 0 0 0 0]
parallel_mode=2
Faiss assertion '!qres || i > qres->qno' failed in void faiss::IndexIVF::range_search_preassigned(faiss::Index::idx_t, const float *, float, const faiss::Index::idx_t *, const float *, faiss::RangeSearchResult *, bool, const faiss::IVFSearchParameters *) const at /Users/runner/miniforge3/conda-bld/faiss-split_1607290716685/work/faiss/IndexIVF.cpp:654
/Users/runner/miniforge3/conda-bld/faiss-split_1607290716685/test_tmp/run_test.sh: line 7:  8331 Abort trap: 6           python -m unittest discover tests
Tests failed for faiss-1.6.4-py38he44c006_0_cpu.tar.bz2 - moving package to /Users/runner/miniforge3/conda-bld/broken

@mbargull
Copy link
Member

mbargull commented Dec 6, 2020

could this be a bug in the updated llvm-openmp?

No idea. If you wanted to try that theory out, you could add set the compiler version back to the previous version in conda_build_config.yaml. Or, for temporary testing purposes, just change the generated .ci_support/osx_64_.yaml ;).
[edit: just saw you did the downgrade already ^^ ]

@h-vetinari
Copy link
Member Author

No idea. If you wanted to try that theory out, you could add set the compiler version back to the previous version in conda_build_config.yaml.

I've found changing the values of cxx_compiler_version in a local conda_build_config.yaml to be really painful... I ended up with the same kind of manual hack in that other PR.

@h-vetinari h-vetinari marked this pull request as draft December 6, 2020 22:26
@h-vetinari
Copy link
Member Author

Uh-oh, that doesn't look very good:

    libclang-cpp10:     10.0.1-default_hf57f61e_1 conda-forge
    libcxx:             11.0.0-h4c3b8ed_1         conda-forge
    libllvm10:          10.0.1-h009f743_3         conda-forge
    libllvm11:          11.0.0-hf85e3d2_0         conda-forge
    llvm-tools:         10.0.1-h1341992_3         conda-forge

Seems that libcxx is not linked to cxx_compiler_version?

Same goes for llvm-openmp:

    llvm-openmp:        11.0.0-h73239a0_1         conda-forge
    llvm-tools:         10.0.1-h1341992_3         conda-forge

@h-vetinari
Copy link
Member Author

Looks like this is finally converging :)
The blasversions-vs.-openmp check was successful as well. 🥳

@h-vetinari h-vetinari marked this pull request as ready for review December 9, 2020 17:53
Copy link
Member Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbargull @jakirkham @isuruf @beckermr
This is finally ready. Would appreciate some review on this PR as the changes (e.g. re:deps, much less the build scripts) are relatively extensive.

Comment on lines +4 to +6
cxx_compiler_version: # [unix]
# need to downgrade on osx due to a bug that breaks the test suite
- 10 # [osx]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use some further exploration (though I think it can come after)

- cmake
- libgomp # [linux]
- llvm-openmp # [osx or win]
- autotools_clang_conda # [win]
- llvm-openmp # [osx]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm-openmp doesn't seem to be necessary on windows after all (even testing against other blas versions)

Comment on lines +117 to +119
# On windows, faiss.lib is an "import library";
# Deleting it breaks the faiss-builds
- if not exist %LIBRARY_LIB%\faiss.lib exit 1 # [win]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can probably be improved; I tried deleting faiss.lib during the build-lib.bat, but then the cmake-invocation in build-pkg.bat fails.

recipe/meta.yaml Show resolved Hide resolved
@kkraus14 kkraus14 closed this Dec 10, 2020
@mbargull mbargull reopened this Dec 10, 2020
@mbargull
Copy link
Member

@h-vetinari, this looks fine to me!
However, note that even though I helped out, I actually don't know anything about faiss and very little about CUDA, LLVM+Windows, etc. ;D.

@h-vetinari
Copy link
Member Author

Planning to merge this in ~24h, if anyone from @conda-forge/core or the GPU side of things was still planning to have a look.

@h-vetinari
Copy link
Member Author

h-vetinari commented Dec 11, 2020

Btw, I noticed that the <arch>-real vs. <arch>-virtual from the CMake docs does not seem to match what @teju85 said in the initial CUDA PR:

@teju85: after this line, in order to support PTX JITing, add another gencode along the lines of:
CUDA_ARCH="${CUDA_ARCH} -gencode arch=compute_${last_arch},code=compute_${last_arch}"
where last_arch is the last element in the ARCHES array.

The mismatch I mean is that passing -DCMAKE_CUDA_ARCHITECTURES=35-virtual;50-virtual;52-virtual;60-virtual;61-virtual;70-virtual;75-virtual;75-real to cmake does not result in something that has both arch=compute_* and code=compute_* for the final arch (note the 4 entries for 75 at the end):

 Id flags: --keep;--keep-dir;tmp;-ccbin=[...];-gencode=arch=compute_35,code=sm_35;-gencode=arch=compute_50,code=sm_50;-gencode=arch=compute_52,code=sm_52;-gencode=arch=compute_60,code=sm_60;-gencode=arch=compute_61,code=sm_61;-gencode=arch=compute_70,code=sm_70;-gencode=arch=compute_75,code=sm_75;-gencode=arch=compute_75,code=sm_75  -v

CMake doesn't show how it's resolving this in the linux logs, I just saw this by accident on the windows side in #19 (so strictly speaking, it could also just be a cmake-on-windows thing, but wanted to double-check).

CC @kkraus14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants