Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build with AVX2 support #27

Merged
merged 21 commits into from
Feb 19, 2021
Merged

Build with AVX2 support #27

merged 21 commits into from
Feb 19, 2021

Conversation

h-vetinari
Copy link
Member

@h-vetinari h-vetinari commented Feb 4, 2021

From the discussion in #23, it became clear that the archspec-stuff isn't ready for primetime yet, and that it's better to follow the upstream model of compiling both for a generic target as well as for avx2, and distinguish by availability at load time.

Since the patch for avx2-detection would have been non-trivial, I upstreamed this first in facebookresearch/faiss#1600.

Not sure that I've translated everything correctly from the upstream build-lib.sh / build-pkg.sh, or even if -DFAISS_OPT_LEVEL=avx2 is respected (or working) on windows.

PS. The extension stuff in setup.py was a learning from #22, but is required here as well. Once everything is ironed out, I'll happily upstream this as well.

PPS. Currently based on #26, will rebase once that goes in.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@h-vetinari h-vetinari force-pushed the avx2 branch 10 times, most recently from 9591644 to 82c60e5 Compare February 5, 2021 15:55
@h-vetinari
Copy link
Member Author

h-vetinari commented Feb 5, 2021

@beauby @mdouze

FYI, this is pretty much how I'm thinking of supporting the AVX2 stuff; also testing both with and without AVX2 support (as you can see from the sequence of commands, resp. the logs).

The patch for EXT_SUFFIX is actually something that's been necessary for PyPy anyway (see #22) - not sure what your appetite is for supporting that, but I'd be happy to upstream a version of this as well.

I'm planning to merge this latest on Monday, in case you have opinions about any of it. 🙃

Edit: obviously, that did not happen...

@h-vetinari
Copy link
Member Author

@beauby @mdouze @wickedfoo
Not sure what -DFAISS_OPT_LEVEL=avx2 does behind the scenes (i.e. does it also support windows, for example?), but could it perhaps be necessary to pass other explicit compiler flags?

I ask because I would have (naïvely) expected the AVX2-enabled test suite to run through faster, but the times are very comparable between with & without AVX2 support.

So either the tests (or the problem sizes) don't really allow the AVX2 acceleration to make a significant impact, or something is not working yet with the integration (either in this PR, or generally).

@h-vetinari h-vetinari force-pushed the avx2 branch 2 times, most recently from 56a173e to d8f2037 Compare February 6, 2021 20:33
@h-vetinari
Copy link
Member Author

Ping @beauby @mdouze
Would love to hear your thoughts on this one. :)

@beauby
Copy link

beauby commented Feb 11, 2021

@h-vetinari Apologies for the late reply. So the FAISS_OPT_LEVEL=avx2 triggers this: https://github.com/facebookresearch/faiss/blob/master/faiss/CMakeLists.txt#L153, which basically sets some compile options (-mavx2 -mfma -mf16c -mpopcnt) enabling some optimized instruction sets, and changes the target output to libfaiss_avx2. No other flags should be needed. We haven't tested those options with VC++.

Regarding test times, I'm not sure a large difference can be observed (@mdouze can you confirm?) due to the limited problem size. In order to make sure the resulting library was indeed compiled with AVX instructions, you can check objdump -d libfaiss*.a | awk '/[ \t]vxorps[ \t]/' (it should output something when FAISS_OPT_LEVELis set toavx2` and nothing otherwise). Next thing to be extra cautious is to make sure the loader loads the right library.

Regarding the EXT_SUFFIX patch, it looks good, feel free to open a PR on our repo and I'll review it.

@h-vetinari
Copy link
Member Author

Cool, thanks for the feedback! :)

Next thing to be extra cautious is to make sure the loader loads the right library.

That's exactly the reason I'm switching to INFO level for the logger so that I can see which library was loaded, and testing it twice with & without AVX2. 🙃

That got me thinking - this check can even be automated. Have a look at the test set-up, maybe it's interesting for you as well (fingers crossed that it works on first try, but you get the idea).

@h-vetinari
Copy link
Member Author

Good lesson for always automating things - I had overlooked that things weren't working under windows (and with py36).

@h-vetinari
Copy link
Member Author

@beauby, seems that CMake Switch isn't windows compatible (just looking at the logs in more detail, because the avx2-lib couldn't be loaded on windows):

cl : Command line warning D9002: ignoring unknown option '-mavx2' [D:\bld\faiss-split_1613049237715\work\_build_avx2\faiss\faiss.vcxproj]
cl : Command line warning D9002: ignoring unknown option '-mfma' [D:\bld\faiss-split_1613049237715\work\_build_avx2\faiss\faiss.vcxproj]
cl : Command line warning D9002: ignoring unknown option '-mf16c' [D:\bld\faiss-split_1613049237715\work\_build_avx2\faiss\faiss.vcxproj]
cl : Command line warning D9002: ignoring unknown option '-mpopcnt' [D:\bld\faiss-split_1613049237715\work\_build_avx2\faiss\faiss.vcxproj]

@h-vetinari h-vetinari force-pushed the avx2 branch 8 times, most recently from 0a5e606 to 6d8ffed Compare February 12, 2021 01:05
facebook-github-bot pushed a commit to facebookresearch/faiss that referenced this pull request Feb 17, 2021
Summary:
While working on conda-forge/faiss-split-feedstock#27, it turned out I needed
to patch `setup.py` anyway. In order to unify how the extension of the built lib is set, I fell back
to another patch that would/will become necessary if faiss ever wants to support PyPy
(see discussion in conda-forge/faiss-split-feedstock#22).

It would be nice if this was done natively by CMake, but as far as I can tell from
https://gitlab.kitware.com/cmake/cmake/-/issues/21070, cmake is not likely to do that right away.

I didn't particularly expect this patch to be upstreamed (especially if there is no interest for PyPy support, for example),
but beauby [invited](conda-forge/faiss-split-feedstock#27 (comment))
me to post it so here goes (plus necessary adaptations to the conda recipes)

Related to #1600, #1680, #1681

PS. I thought about using `logger.INFO` in case of an import failure for AVX2, but since it's a setup file,
I thought `print` would actually be more useful. Happy to change or remove if desired.

Pull Request resolved: #1682

Reviewed By: wickedfoo

Differential Revision: D26484393

Pulled By: beauby

fbshipit-source-id: 6cd2598838c4070dbf83d6f27ce15ce9faa6bf20
also:
- switch to pytest
- yaml + jinja2 seem to hate colons or URLs in test.commands section
- NPY_DISABLE_CPU_FEATURES is documented in
  https://github.com/numpy/numpy/blob/v1.20.0/numpy/core/src/common/npy_cpu_features.c.src#L178
this is done by adding a pytest arg to see logger-messages
from loader.py, and inspecting those for the required lines

also: skip AVX2-run in OSX CI
for (hopefully) better AVX2 / FMA / etc. support
Otherwise, we'd be packaging an two libs within libfaiss, where one is
effectively invisible (unless someone knows exactly what to look for).

By introducing a separate output, we can avoid duplicating the artefact
size (for those who only use libfaiss), and the additional output only
gets added for faiss itself (but is available for separate use if someone
doesn't mind depending on the availability of AVX2).

Also, re-render to pick up new output.
facebook-github-bot pushed a commit to facebookresearch/faiss that referenced this pull request Feb 18, 2021
Summary:
Trying to compile windows for AVX2 in conda-forge/faiss-split-feedstock#27
(after #1600) surfaced a bunch of things (#1680, #1681, #1682), but the most voluminous problem
was MSVC being much worse at dealing with operator overloads and casts around `__m128` / `__m256`.

This lead to loads of errors that looked as follows:
```
[...]\faiss\utils\distances_simd.cpp(411): error C2676: binary '+=': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(440): error C2676: binary '-': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(441): error C2676: binary '*': 'const __m256' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(446): error C2676: binary '+=': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(451): error C2676: binary '-': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(452): error C2676: binary '*': 'const __m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(459): error C2676: binary '-': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(460): error C2676: binary '*': '__m128' does not define this operator or a conversion to a type acceptable to the predefined operator
[...]\faiss\utils\distances_simd.cpp(471): error C2440: '<function-style-cast>': cannot convert from '__m256i' to '__m256'
```

I've followed https://software.intel.com/sites/landingpage/IntrinsicsGuide/ to try to replace everything correctly,
but this will surely require close review, because I'm not sure how well these code-paths are checked by the
test suite.

In any case, with the commits from #1600 #1666 #1680 #1681 #1682, I was able to build `libfaiss` & `faiss`
for AVX2 on windows (while remaining "green" on linux/osx, both with & without AVX2).

Sidenote: the issues in the last commit (26fc7cf)
were uncovered by adding the `__SSE3__` compat macros in #1681.

Pull Request resolved: #1684

Test Plan: buck test //faiss/tests/...

Reviewed By: beauby

Differential Revision: D26454443

Pulled By: mdouze

fbshipit-source-id: 70df0818e357f1ecea6a056d619618df0236e0eb
@h-vetinari h-vetinari added the automerge Merge the PR when CI passes label Feb 19, 2021
@github-actions github-actions bot merged commit df8cd2e into conda-forge:master Feb 19, 2021
@github-actions
Copy link
Contributor

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

  • linter: passed
  • azure: passed

Thus the PR was passing and merged! Have a great day!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge the PR when CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants