-
Notifications
You must be signed in to change notification settings - Fork 749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elasticsearch #174
Comments
I'd love to have the numbers for this too - docs say it uses HNSW from NMSLIB so it would be similar to that, but maybe there are some overheads that may lead to differences in performance. |
I believe this is being implemented under #180 |
So far I've implemented ES-based nearest-neighbors for the stock vector functionality that comes with X-Pack (#186) and for my own vector search plugin: https://github.com/alexklibisz/elastiknn (#189). I'm hoping to also find some time to implement it using Amazon's open-distro plugin, which was linked in the original issue comment above. |
I'm using opendistro at work and familiar with the KNN plugin. I have it working with ann-benchmarks but still hitting random timeouts that I'm troubleshooting. @alexklibisz , do you mind if I fix and push this one? |
Go for it! I haven't had the time to even start on it yet. Also obviously feel free to borrow from the Elastiknn and Elasticsearch docker images and algos and post questions about your timeouts. I found some things very tricky to setup with Elasticsearch. |
Oh yes your work on other elastic images helped tremendously. I'm mainly facing timeouts during refresh. I increased it to 100 from default 10 and it still fails on some runs. Wondering if I should increase it further or find some other way to handle it. I hope to update my fork over this weekend so that I can share the code for a clearer picture. Thank you! |
Good to hear. 10sec definitely seems way too low. With regular ES under the hood, I wouldn't be surprised if refreshing and merging 1M docs into a single segment takes 2-3 minutes. You also have to factor in that you're using an HNSW binary under the hood, so maybe you can get an idea of reasonable times by fitting some HNSW models without ES in the loop. |
closing this for now since it's added right? |
I guess it depends how granular you want to make the issues. Right now there are three ways to do KNN on elasticsearch:
So far 1 and 2 are implemented and merged. @stephenleo is working on 3. |
Got it – I think it's good enough for now? Is there a huge difference between 3 and 1/2? |
I would not be surprised if there actually is a pretty substantial difference. 1 and 2 use the JVM exclusively, which is pretty darn slow for CPU-bound number crunching. 3 is using the C/C++ HNSW binary under the hood, which is an extra operational consideration, but if they implemented it well should be clearly faster. |
oh ok, interesting. looking forward to any PR for #3! |
Adds Open Distro Elastic Search's KNN plugin support. Closes #174.
Adds Open Distro Elastic Search's KNN plugin support. Closes #174.
Would be interesting to add: https://opendistro.github.io/for-elasticsearch/features/knn.html
The text was updated successfully, but these errors were encountered: