Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Lucene specific file extensions to core HybridFS #721

Conversation

martin-gaievski
Copy link
Member

Signed-off-by: Martin Gaievski [email protected]

Description

We're including file extensions for vector value files from Lucene 9.4 to the list of extensions that Core OpenSearch will use with HybridFS store type and MMap file I/O. This increases performance for both data ingestion and queries for p99. Setting is abstracted at engine level with specific implementation for Lucene. Setting is set at cluster defaults level, index specific overrides will take priority over it.

Tested locally on 1M dataset

Before change (baseline):
data ingestion

  "results": {
    "test_took": 2706990.31884,
    "create_index_took_total": 2702.2581160000004,
    "ingest_took_total": 513438.04862599995,
    "refresh_index_store_kb_total": 2810943.876953125,
    "refresh_index_took_total": 641.9670379999616,
    "force_merge_took_total": 2190208.04506
  }

query

  "results": {
    "test_took": 347624.7275845,
    "clear_cache_took_total": 22.227584499987294,
    "query_took_total": 347602.5,
    "query_took_p50": 32.5,
    "query_took_p90": 45.5,
    "query_took_p99": 83.0,
    "query_memory_kb_total": 0.0,
    "query_recall@K_total": 0.991061,
    "query_recall@1_total": 1.0
  }

With the change
data ingestion

  "results": {
    "test_took": 748055.190353,
    "create_index_took_total": 2722.189422,
    "ingest_took_total": 535081.281891,
    "refresh_index_store_kb_total": 2814413.49609375,
    "refresh_index_took_total": 1110.5574829999796,
    "force_merge_took_total": 209141.16155700004
  }

query

  "results": {
    "test_took": 111353.221852,
    "clear_cache_took_total": 110.22185199999768,
    "query_took_total": 111243.0,
    "query_took_p50": 9.5,
    "query_took_p90": 16.0,
    "query_took_p99": 28.5,
    "query_memory_kb_total": 0.0,
    "query_recall@K_total": 0.990751,
    "query_recall@1_total": 1.0
  }

Issues Resolved

#637

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@martin-gaievski martin-gaievski added Features Introduces a new unit of functionality that satisfies a requirement backport 2.x v2.5.0 'Issues and PRs related to version v2.5.0' labels Jan 11, 2023
@martin-gaievski martin-gaievski requested a review from a team January 11, 2023 23:03
@martin-gaievski martin-gaievski force-pushed the add-mmap-file-extensions branch from d47eca6 to 1d38554 Compare January 11, 2023 23:04
@navneet1v
Copy link
Collaborator

@navneet1v
Copy link
Collaborator

Overall code looks good to me. Please reply comment of Jack on the adding another file type.

navneet1v
navneet1v previously approved these changes Jan 11, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2023

Codecov Report

Merging #721 (1c81f29) into main (5f4bef9) will decrease coverage by 0.21%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main     #721      +/-   ##
============================================
- Coverage     84.64%   84.43%   -0.22%     
  Complexity     1072     1072              
============================================
  Files           151      152       +1     
  Lines          4345     4356      +11     
  Branches        389      389              
============================================
  Hits           3678     3678              
- Misses          489      498       +9     
- Partials        178      180       +2     
Impacted Files Coverage Δ
.../java/org/opensearch/knn/index/util/KNNEngine.java 100.00% <100.00%> (ø)
...java/org/opensearch/knn/index/util/KNNLibrary.java 100.00% <100.00%> (ø)
...ain/java/org/opensearch/knn/index/util/Lucene.java 93.75% <100.00%> (+0.41%) ⬆️
...main/java/org/opensearch/knn/plugin/KNNPlugin.java 100.00% <100.00%> (ø)
...va/org/opensearch/knn/index/KNNCircuitBreaker.java 60.00% <0.00%> (-20.00%) ⬇️
...ain/java/org/opensearch/knn/index/KNNSettings.java 80.88% <0.00%> (-2.21%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@martin-gaievski martin-gaievski merged commit 8a2aa04 into opensearch-project:main Jan 12, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 12, 2023
* Add lucene vector specific file extensions for io with mmap

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit 8a2aa04)
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 12, 2023
* Add lucene vector specific file extensions for io with mmap

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit 8a2aa04)
martin-gaievski added a commit that referenced this pull request Jan 12, 2023
* Add lucene vector specific file extensions for io with mmap

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit 8a2aa04)

Co-authored-by: Martin Gaievski <[email protected]>
martin-gaievski added a commit that referenced this pull request Jan 12, 2023
* Add lucene vector specific file extensions for io with mmap

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit 8a2aa04)

Co-authored-by: Martin Gaievski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport 2.5 Features Introduces a new unit of functionality that satisfies a requirement v2.5.0 'Issues and PRs related to version v2.5.0'
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants