Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering & Minor Patches in JS, Rust, & Java SDKs #503

Merged
merged 39 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
d68fb62
Improve: Clustering
ashvardanian Aug 23, 2024
f0c250e
Add: `kmeans_clustering_gt` plugin
ashvardanian Sep 2, 2024
0f4c98c
Add: Early exit strategies for KMeans
ashvardanian Sep 2, 2024
7416247
Chore: Revert integrated K-Means attempts
ashvardanian Sep 2, 2024
9c799e1
Fix: Ambiguous `destroy_at` call
ashvardanian Sep 2, 2024
e35eab8
Improve: Export aggregate distances
ashvardanian Sep 2, 2024
db0f44d
Add: PyBind11 API for K-Means
ashvardanian Sep 2, 2024
3af4801
Add: Clustering benchmark
ashvardanian Sep 2, 2024
ce1af09
Fix: Stride arithmetic
ashvardanian Sep 9, 2024
9d53d18
Improve: Cleanup clustering
ashvardanian Sep 9, 2024
f12314c
Merge branch 'main-dev' into main-dev-cluster
ashvardanian Sep 9, 2024
e448ce3
Fix: Type-casting
ashvardanian Sep 9, 2024
1a6753b
Make: Rust CI build and test
CCnut Sep 29, 2024
f1c158f
Make: Android CI build and test
CCnut Sep 29, 2024
189bb0b
Fix: Android build
CCnut Sep 29, 2024
7425cd7
Fix: JavaScript Change the return value of index.count() to a number
abetomo Oct 10, 2024
f315979
Merge pull request #502 from abetomo/js-fix-count-method
ashvardanian Oct 10, 2024
d6fd1eb
Fix: Raise exceptions from `add()` in JS (#486)
abetomo Oct 10, 2024
16dec63
Fix: Reserve after deserialization in JS (#484)
abetomo Oct 11, 2024
08c835d
Fix: Skip JS `view()` on Winodws (#504)
abetomo Oct 14, 2024
9969f10
Merge branch 'unum-cloud:main-dev' into main-dev
CCnut Oct 16, 2024
8fa3090
Merge pull request #499 from CCnut/main-dev
ashvardanian Oct 16, 2024
c27c99d
Fix: Remove from a read-only index (#506)
abetomo Oct 22, 2024
113a786
Add: Metadata for observability (#508)
mbautin Oct 22, 2024
aa2ddd7
Doc: Error message type (#509)
abetomo Oct 28, 2024
19a9d0b
Add: `kmeans` Python API
ashvardanian Oct 28, 2024
730815f
Docs: Spelling
ashvardanian Oct 28, 2024
64142ee
Merge branch 'main-dev' of https://github.com/unum-cloud/usearch into…
ashvardanian Oct 28, 2024
295b3d6
Make: Bump SimSIMD
ashvardanian Oct 28, 2024
0dac789
Make: Bump SimSIMD for Turin builds
ashvardanian Oct 29, 2024
e057feb
Improve: Clustering benchmarks
ashvardanian Oct 29, 2024
91c0bcb
Merge branch 'main-dev' into main-dev-cluster
ashvardanian Oct 29, 2024
f9fc617
Merge pull request #513 from unum-cloud/main-dev-cluster
ashvardanian Oct 29, 2024
3f5a4ef
Make: Dynamic dispatch in Java
ashvardanian Oct 29, 2024
f80ded9
Fix: Initializing `std::atomic`
ashvardanian Oct 29, 2024
fbcab99
Fix: Narrowing static casts for Py build
ashvardanian Oct 29, 2024
e4afcf3
Fix: Avoid `std::accumulate`
ashvardanian Oct 29, 2024
6f132bb
Fix: Missing `ssize_t` on Windows
ashvardanian Oct 29, 2024
512c6aa
Fix: Generating 64-bit unsigned seeds
ashvardanian Oct 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 105 additions & 4 deletions .github/workflows/prerelease.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ env:
PYTHONUTF8: 1
PYTHON_VERSION: 3.11
DOTNET_VERSION: 7.0.x
ANDROID_NDK_VERSION: 26.3.11579264
ANDROID_SDK_VERSION: 21

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
Expand Down Expand Up @@ -101,11 +103,15 @@ jobs:
run: npm test

# Rust
- name: Set up Rust
run: |
rustup update stable
rustup default stable
rustc -vV
- name: Build Rust
run: cargo build
- name: Test Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
run: cargo test

# Java
- name: Setup Java
Expand Down Expand Up @@ -193,6 +199,17 @@ jobs:
- name: Test Python
run: pytest

# Rust
- name: Set up Rust
run: |
rustup update stable
rustup default stable
rustc -vV
- name: Build Rust
run: cargo build
- name: Test Rust
run: cargo test

# C#
- name: Setup .NET ${{ env.DOTNET_VERSION }}
uses: actions/setup-dotnet@v3
Expand Down Expand Up @@ -260,6 +277,17 @@ jobs:
- name: Test ObjC/Swift
run: swift test

# Rust
- name: Set up Rust
run: |
rustup update stable
rustup default stable
rustc -vV
- name: Build Rust
run: cargo build
- name: Test Rust
run: cargo test

# C#
- name: Setup .NET ${{ env.DOTNET_VERSION }}
uses: actions/setup-dotnet@v3
Expand Down Expand Up @@ -318,6 +346,17 @@ jobs:
- name: Test JavaScript
run: npm test

# Rust
- name: Set up Rust
run: |
rustup update stable
rustup default stable
rustc -vV
- name: Build Rust
run: cargo build
- name: Test Rust
run: cargo test

# C#
- name: Setup .NET ${{ env.DOTNET_VERSION }}
uses: actions/setup-dotnet@v3
Expand Down Expand Up @@ -421,3 +460,65 @@ jobs:
run: |
test -e build_artifacts/libusearch_c.so
test -e build_artifacts/libusearch_sqlite.so

test_ubuntu_android_ndk:
name: Android NDK Build
runs-on: ubuntu-22.04
strategy:
matrix:
include:
- processor: armv7a
abi: armeabi-v7a
target: armv7-linux-androideabi
- processor: aarch64
abi: arm64-v8a
target: aarch64-linux-android
steps:
- uses: actions/checkout@v4
- run: git submodule update --init --recursive

- name: Install NDK ndk
run: |
${ANDROID_HOME}/cmdline-tools/latest/bin/sdkmanager --install "ndk;${{ env.ANDROID_NDK_VERSION }}"

- name: Build C/C++
run: |
cmake -B build_artifacts \
-D CMAKE_BUILD_TYPE=RelWithDebInfo \
-D CMAKE_EXPORT_COMPILE_COMMANDS=1 \
-D CMAKE_TOOLCHAIN_FILE=${ANDROID_HOME}/ndk/${{ env.ANDROID_NDK_VERSION }}/build/cmake/android.toolchain.cmake \
-D CMAKE_ANDROID_STL_TYPE=c++_static \
-D ANDROID_PLATFORM=${{ env.ANDROID_SDK_VERSION }} \
-D ANDROID_ABI=${{ matrix.abi }} \
-D USEARCH_BUILD_LIB_C=1 \
-D USEARCH_BUILD_TEST_CPP=0 \
-D USEARCH_BUILD_BENCH_CPP=0

cmake --build build_artifacts --config RelWithDebInfo

# We can't run the produced builds, but we can make sure they exist
- name: Test artifacts presense
run: |
test -e build_artifacts/libusearch_c.so

# Rust
- name: Set up Rust
run: |
rustup update stable
rustup default stable
rustup target add ${{ matrix.target }}
rustc -vV

- name: Set up Rust Env
run: |
TOOLCHAIN=${ANDROID_HOME}/ndk/${{ env.ANDROID_NDK_VERSION }}/toolchains/llvm/prebuilt/linux-x86_64/bin/
NDK_CLANG=$(find ${TOOLCHAIN} -name "${{ matrix.processor }}*${{ env.ANDROID_SDK_VERSION }}-clang")
echo "CC_${{ matrix.target }}=${NDK_CLANG}" >> ${GITHUB_ENV}
echo "CXX_${{ matrix.target }}=${NDK_CLANG}++" >> ${GITHUB_ENV}
echo "AR_${{ matrix.target }}=${TOOLCHAIN}/llvm-ar" >> ${GITHUB_ENV}
echo "CARGO_${{ matrix.target }}=${NDK_CLANG}" >> ${GITHUB_ENV}
echo "CARGO_${{ matrix.target }}=${TOOLCHAIN}/llvm-ar" >> ${GITHUB_ENV}

- name: Build Rust
run: |
cargo build --target ${{ matrix.target }}
3 changes: 3 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@
"ivdep",
"jaccard",
"Jemalloc",
"kmeans",
"Kullback",
"Leibler",
"libjemalloc",
Expand All @@ -173,6 +174,7 @@
"nlevels",
"Numba",
"numpy",
"NVME",
"objc",
"OPENMP",
"preprocess",
Expand All @@ -198,6 +200,7 @@
"uninitialize",
"unumusearch",
"usearch",
"usecase",
"usecases",
"Vardanian",
"vectorize",
Expand Down
50 changes: 31 additions & 19 deletions BENCHMARKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ All major HNSW implementation share an identical list of hyper-parameters:

The default values vary drastically.

| Library | Connectivity | EF @ A | EF @ S |
| :-------: | :----------: | :----: | :----: |
| `hnswlib` | 16 | 200 | 10 |
| `FAISS` | 32 | 40 | 16 |
| `USearch` | 16 | 128 | 64 |
| Library | Connectivity | EF @ A | EF @ S |
| :-------- | -----------: | -----: | -----: |
| `hnswlib` | 16 | 200 | 10 |
| `FAISS` | 32 | 40 | 16 |
| `USearch` | 16 | 128 | 64 |

Below are the performance numbers for a benchmark running on the 64 cores of AWS `c7g.metal` "Graviton 3"-based instances.
The main columns are:
Expand All @@ -26,27 +26,27 @@ The main columns are:
### Different "connectivity"

| Vectors | Connectivity | EF @ A | EF @ S | __Add__, QPS | __Search__, QPS | __Recall @1__ |
| :--------- | :----------: | :----: | :----: | :----------: | :-------------: | ------------: |
| `f32` x256 | 16 | 128 | 64 | 75'640 | 131'654 | 99.3% |
| `f32` x256 | 12 | 128 | 64 | 81'747 | 149'728 | 99.0% |
| `f32` x256 | 32 | 128 | 64 | 64'368 | 104'050 | 99.4% |
| :--------- | -----------: | -----: | -----: | -----------: | --------------: | ------------: |
| `f32` x256 | 16 | 128 | 64 | 75'640 | 131'654 | 99.3% |
| `f32` x256 | 12 | 128 | 64 | 81'747 | 149'728 | 99.0% |
| `f32` x256 | 32 | 128 | 64 | 64'368 | 104'050 | 99.4% |

### Different "expansion factors"

| Vectors | Connectivity | EF @ A | EF @ S | __Add__, QPS | __Search__, QPS | __Recall @1__ |
| :--------- | :----------: | :----: | :----: | :----------: | :-------------: | ------------: |
| `f32` x256 | 16 | 128 | 64 | 75'640 | 131'654 | 99.3% |
| `f32` x256 | 16 | 64 | 32 | 128'644 | 228'422 | 97.2% |
| `f32` x256 | 16 | 256 | 128 | 39'981 | 69'065 | 99.2% |
| :--------- | -----------: | -----: | -----: | -----------: | --------------: | ------------: |
| `f32` x256 | 16 | 128 | 64 | 75'640 | 131'654 | 99.3% |
| `f32` x256 | 16 | 64 | 32 | 128'644 | 228'422 | 97.2% |
| `f32` x256 | 16 | 256 | 128 | 39'981 | 69'065 | 99.2% |

### Different vectors "quantization"

| Vectors | Connectivity | EF @ A | EF @ S | __Add__, QPS | __Search__, QPS | __Recall @1__ |
| :----------- | :----------: | :----: | :----: | :----------: | :-------------: | ------------: |
| `f32` x256 | 16 | 128 | 64 | 87'995 | 171'856 | 99.1% |
| `f16` x256 | 16 | 128 | 64 | 87'270 | 153'788 | 98.4% |
| `f16` x256 ✳️ | 16 | 128 | 64 | 71'454 | 132'673 | 98.4% |
| `i8` x256 | 16 | 128 | 64 | 115'923 | 274'653 | 98.9% |
| :----------- | -----------: | -----: | -----: | -----------: | --------------: | ------------: |
| `f32` x256 | 16 | 128 | 64 | 87'995 | 171'856 | 99.1% |
| `f16` x256 | 16 | 128 | 64 | 87'270 | 153'788 | 98.4% |
| `f16` x256 ✳️ | 16 | 128 | 64 | 71'454 | 132'673 | 98.4% |
| `i8` x256 | 16 | 128 | 64 | 115'923 | 274'653 | 98.9% |

As seen on the chart, for `f16` quantization, performance may differ depending on native hardware support for that numeric type.
Also worth noting, 8-bit quantization results in almost no quantization loss and may perform better than `f16`.
Expand All @@ -58,9 +58,12 @@ Within this repository you will find two commonly used utilities:
- `cpp/bench.cpp` the produces the `bench_cpp` binary for broad USearch benchmarks.
- `python/bench.py` and `python/bench.ipynb` for interactive charts against FAISS.

### C++ Benchmarking Utilities

To achieve best highest results we suggest compiling locally for the target architecture.

```sh
git submodule update --init --recursive
cmake -USEARCH_BUILD_BENCH_CPP=1 -DUSEARCH_BUILD_TEST_C=1 -DUSEARCH_USE_OPENMP=1 -DUSEARCH_USE_SIMSIMD=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -B build_profile
cmake --build build_profile --config RelWithDebInfo -j
build_profile/bench_cpp --help
Expand Down Expand Up @@ -146,11 +149,20 @@ build_profile/bench_cpp \
--cos
```


> Optional parameters include `connectivity`, `expansion_add`, `expansion_search`.

For Python, jut open the Jupyter Notebook and start playing around.

### Python Benchmarking Utilities

Several benchmarking suites are available for Python: approximate search, exact search, and clustering.

```sh
python/scripts/bench.py --help
python/scripts/bench_exact.py --help
python/scripts/bench_cluster.py --help
```

## Datasets

BigANN benchmark is a good starting point, if you are searching for large collections of high-dimensional vectors.
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ cibuildwheel --platform macos # works only on MacOS
cibuildwheel --platform windows # works only on Windows
```

You may need root previligies for multi-architecture builds:
You may need root privileges for multi-architecture builds:

```sh
sudo $(which cibuildwheel) --platform linux
Expand Down
15 changes: 15 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import org.gradle.internal.jvm.Jvm

plugins {
id 'java-library'
id 'c'
id 'cpp'
id 'maven-publish'
id 'signing'
Expand Down Expand Up @@ -66,6 +67,15 @@ model {
srcDirs "include", "fp16/include", "simsimd/include", "${Jvm.current().javaHome}/include"
}
}
c {
source {
srcDirs "simsimd/c/"
include "**/*.c"
}
exportedHeaders {
srcDirs "simsimd/include"
}
}
}
binaries.withType(StaticLibraryBinarySpec) {
buildable = false
Expand All @@ -83,6 +93,11 @@ model {
cppCompiler.args "-I${Jvm.current().javaHome}/include/win32"
cppCompiler.args '/std:c++11'
}
cppCompiler.args '-DUSEARCH_USE_FP16LIB=1'
cppCompiler.args '-DUSEARCH_USE_SIMSIMD=1'
cppCompiler.args '-DSIMSIMD_DYNAMIC_DISPATCH=1'
cppCompiler.args '-DSIMSIMD_NATIVE_BF16=0'
cppCompiler.args '-DSIMSIMD_NATIVE_F16=0'
}
}
}
Expand Down
Loading
Loading