Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load knn vectors format with mmapfs #78724

Merged
merged 1 commit into from
Oct 7, 2021

Conversation

jtibshirani
Copy link
Contributor

Before the format used niofs. The current knn vectors implementation is based on
the HNSW algorithm, which is designed for the case where the graph and vectors
are be held in memory. Switching to mmapfs from niofs made a big difference in
ANN benchmarks, speeding up some searches over 3x.

Relates to #78473.

Before the format used niofs. The current knn vectors implementation is based on
the HNSW algorithm, which is designed for the case where the graph and vectors
are be held in memory. Switching to mmapfs from niofs made a huge difference in
ANN benchmarks, speeding up some searches over 3x.
@jtibshirani jtibshirani added :Search/Search Search-related issues that do not fall into other categories v8.0.0 labels Oct 5, 2021
@jtibshirani
Copy link
Contributor Author

Example benchmark results on glove-100-angular with M=32, efConstruction=100.

nio (NIOFSDirectory)

Algorithm                    Recall  QPS
EsDenseVector(n_cands=10)    0.468   311.834
EsDenseVector(n_cands=50)    0.737   195.817
EsDenseVector(n_cands=80)    0.788   161.091
EsDenseVector(n_cands=100)   0.811   133.426
EsDenseVector(n_cands=500)   0.924    55.674
EsDenseVector(n_cands=800)   0.946    39.257

mmap (MMapDirectory)

Algorithm                    Recall  QPS
EsDenseVector(n_cands=10)    0.468   403.680    
EsDenseVector(n_cands=50)    0.737   363.926
EsDenseVector(n_cands=80)    0.788   336.437
EsDenseVector(n_cands=100)   0.811   269.075   
EsDenseVector(n_cands=500)   0.924   176.996
EsDenseVector(n_cands=800)   0.946   134.560

@jtibshirani jtibshirani marked this pull request as ready for review October 6, 2021 00:09
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 6, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtibshirani Oh very nice, indeed impressive gains, especially with higher recall, more than 3x times! Great notice, Julie!.

@jtibshirani jtibshirani merged commit bfba7fa into elastic:master Oct 7, 2021
@jtibshirani jtibshirani deleted the vectors-format branch October 7, 2021 00:15
@jtibshirani jtibshirani mentioned this pull request Oct 7, 2021
17 tasks
wjp719 added a commit to wjp719/elasticsearch that referenced this pull request Oct 7, 2021
…' into feature/data_stream_support_routing

* wjp/feature/data_stream_support_routing: (44 commits)
  Revert "Adjust /_cat/templates not to request all metadata (elastic#78812)"
  Allow indices lookup to be built lazily (elastic#78745)
  [DOCS] Document default security in alpha2 (elastic#78227)
  Add cluster applier stats (elastic#77552)
  Fix failing URLDecodeProcessorTests::testProcessor test (elastic#78690)
  Upgrade to lucene snapshot ba75dc5e6bf (elastic#78817)
  Adjust /_cat/templates not to request all metadata (elastic#78812)
  Simplify build plugin license handling (elastic#77009)
  Fix SearchableSnapshotsBlobStoreCacheIntegTests.testBlobStoreCache (elastic#78616)
  Improve Docker image caching and testing (elastic#78552)
  Load knn vectors format with mmapfs (elastic#78724)
  Fix date math zone test to use negative minutes (elastic#78796)
  Changing name of shards field in node/stats api to shard_stats (elastic#78531)
  [DOCS] Fix system index refs in restore tutorial (elastic#78582)
  Add previously removed settings back for 8.0 (elastic#78784)
  TSDB: Fix template name in test
  Add a system property to forcibly format everything (elastic#78768)
  Revert "Adding config so that some tests will break if over-the-wire encryption fails (elastic#78409)" (elastic#78787)
  Must date math test failure
  Adding config so that some tests will break if over-the-wire encryption fails (elastic#78409)
  ...
@jtibshirani jtibshirani added :Search Relevance/Vectors Vector search and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search Relevance/Vectors Vector search Team:Search Meta label for search team v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants