-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark against FAISS & nmslib? #4
Comments
In case further motivation is needed, here are the types of algorithms I need to benchmark: https://github.com/google-deepmind/xtr - the FAISS parts are here: https://github.com/google-deepmind/xtr/blob/main/xtr_evaluation_on_beir_miracl.ipynb
|
Hello, I ran some benchmarks comparing VSS against the FAISS extension and posted them here: https://github.com/arjenpdevries/faiss/blob/main/README.md |
Such a benchmark would be super helpful to decide which in-browser use cases are flexible enough :)
https://github.com/nmslib/hnswlib
For example, I have a few databases ready to go:
20 years of census data - https://jaanli.github.io/american-community-survey/new-york-area/income-by-race
15 million hospital claims - https://onefact.github.io/synthetic-healthcare-data/
All of NYC real estate - https://jaanli.github.io/new-york-real-estate/
And I really want to visualize the 30,000+ Mandarin characters by their phono-semantic specificity/etymological origins on a map.
All of these require high-dimensional similarity search, but are of very different scale. So the UI/UX interactions (e.g. very early ones from 2017 here: https://jaan.io/food2vec-augmented-cooking-machine-intelligence/) will be constrained by the queries per second supported in this duckdb extension.
Hope that makes sense, and happy to help! 🙏 super exciting that this is now feasible!!
The text was updated successfully, but these errors were encountered: