Skip to content

Commit

Permalink
Fixed the ef_search default value for faiss HNSW with filters and upd…
Browse files Browse the repository at this point in the history
…ated the perf-tool to include Faiss HNSW tests (opensearch-project#926)

Signed-off-by: Navneet Verma <[email protected]>
  • Loading branch information
navneet1v committed Jun 14, 2023
1 parent 079f669 commit 2b1c47e
Show file tree
Hide file tree
Showing 15 changed files with 355 additions and 57 deletions.
47 changes: 33 additions & 14 deletions benchmarks/perf-tool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,36 @@ file.

## Install Prerequisites

### Python
### Setup

Python 3.7 or above is required.
K-NN perf requires Python 3.8 or greater to be installed. One of
the easier ways to do this is through Conda, a package and environment
management system for Python.

### Pip
First, follow the
[installation instructions](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html)
to install Conda on your system.

Use pip to install the necessary requirements:
Next, create a Python 3.8 environment:
```
conda create -n knn-perf python=3.8
```

After the environment is created, activate it:
```
source activate knn-perf
```

Lastly, clone the k-NN repo and install all required python packages:
```
git clone https://github.com/opensearch-project/k-NN.git
cd k-NN/benchmarks/perf-tool
pip install -r requirements.txt
```

After all of this completes, you should be ready to run your first performance benchmarks!


## Usage

### Quick Start
Expand Down Expand Up @@ -72,16 +90,17 @@ The output will be the delta between the two metrics.

### Test Parameters

| Parameter Name | Description | Default |
| ----------- | ----------- | ----------- |
| endpoint | Endpoint OpenSearch cluster is running on | localhost |
| test_name | Name of test | No default |
| test_id | String ID of test | No default |
| num_runs | Number of runs to execute steps | 1 |
| show_runs | Whether to output each run in addition to the total summary | false |
| setup | List of steps to run once before metric collection starts | [] |
| steps | List of steps that make up one test run. Metrics will be collected on these steps. | No default |
| cleanup | List of steps to run after each test run | [] |
| Parameter Name | Description | Default |
|----------------|------------------------------------------------------------------------------------|------------|
| endpoint | Endpoint OpenSearch cluster is running on | localhost |
| port | Port on which OpenSearch Cluster is running on | 9200 |
| test_name | Name of test | No default |
| test_id | String ID of test | No default |
| num_runs | Number of runs to execute steps | 1 |
| show_runs | Whether to output each run in addition to the total summary | false |
| setup | List of steps to run once before metric collection starts | [] |
| steps | List of steps that make up one test run. Metrics will be collected on these steps. | No default |
| cleanup | List of steps to run after each test run | [] |

### Steps

Expand Down
5 changes: 5 additions & 0 deletions benchmarks/perf-tool/okpt/io/config/parsers/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ class TestConfig:
test_name: str
test_id: str
endpoint: str
port: int
num_runs: int
show_runs: bool
setup: List[Step]
Expand All @@ -48,6 +49,9 @@ def parse(self, file_obj: TextIOWrapper) -> TestConfig:
if 'endpoint' in config_obj:
implicit_step_config['endpoint'] = config_obj['endpoint']

if 'port' in config_obj:
implicit_step_config['port'] = config_obj['port']

# Each step should have its own parse - take the config object and check if its valid
setup = []
if 'setup' in config_obj:
Expand All @@ -62,6 +66,7 @@ def parse(self, file_obj: TextIOWrapper) -> TestConfig:

test_config = TestConfig(
endpoint=config_obj['endpoint'],
port=config_obj['port'],
test_name=config_obj['test_name'],
test_id=config_obj['test_id'],
num_runs=config_obj['num_runs'],
Expand Down
3 changes: 3 additions & 0 deletions benchmarks/perf-tool/okpt/io/config/schemas/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@
endpoint:
type: string
default: "localhost"
port:
type: integer
default: 9200
test_name:
type: string
test_id:
Expand Down
11 changes: 5 additions & 6 deletions benchmarks/perf-tool/okpt/test/steps/steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# compatible open source license.
"""Provides steps for OpenSearch tests.
Some of the OpenSearch operations return a `took` field in the response body,
Some OpenSearch operations return a `took` field in the response body,
so the profiling decorators aren't needed for some functions.
"""
import json
Expand Down Expand Up @@ -454,8 +454,7 @@ def _action(self):
results['took'] = [
float(query_response['took']) for query_response in query_responses
]
port = 9200 if self.endpoint == 'localhost' else 80
results['memory_kb'] = get_cache_size_in_kb(self.endpoint, port)
results['memory_kb'] = get_cache_size_in_kb(self.endpoint, self.port)

if self.calculate_recall:
ids = [[int(hit['_id'])
Expand Down Expand Up @@ -614,7 +613,6 @@ def _action(self):
num_of_search_segments = 0;
for shard_key in shards.keys():
for segment in shards[shard_key]:

num_of_committed_segments += segment["num_committed_segments"]
num_of_search_segments += segment["num_search_segments"]

Expand Down Expand Up @@ -689,12 +687,13 @@ def delete_model(endpoint, port, model_id):
return response.json()


def get_opensearch_client(endpoint: str, port: int):
def get_opensearch_client(endpoint: str, port: int, timeout=60):
"""
Get an opensearch client from an endpoint and port
Args:
endpoint: Endpoint OpenSearch is running on
port: Port OpenSearch is running on
timeout: timeout for OpenSearch client, default value 60
Returns:
OpenSearch client
Expand All @@ -708,7 +707,7 @@ def get_opensearch_client(endpoint: str, port: int):
use_ssl=False,
verify_certs=False,
connection_class=RequestsHttpConnection,
timeout=60,
timeout=timeout,
)


Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"settings": {
"index": {
"knn": true,
"number_of_shards": 24,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"target_field": {
"type": "knn_vector",
"dimension": 128,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "faiss",
"parameters": {
"ef_construction": 256,
"m": 16
}
}
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
{
"bool":
{
"should":
[
{
"range":
{
"age":
{
"gte": 30,
"lte": 70
}
}
},
{
"term":
{
"color": "green"
}
},
{
"term":
{
"color": "blue"
}
},
{
"term":
{
"color": "yellow"
}
},
{
"term":
{
"color": "sweet"
}
}
]
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
endpoint: [ENDPOINT]
test_name: "Faiss HNSW Relaxed Filter Test"
test_id: "Faiss HNSW Relaxed Filter Test"
num_runs: 10
show_runs: false
steps:
- name: delete_index
index_name: target_index
- name: create_index
index_name: target_index
index_spec: [INDEX_SPEC_PATH]/relaxed-filter/index.json
- name: ingest_multi_field
index_name: target_index
field_name: target_field
bulk_size: 500
dataset_format: hdf5
dataset_path: [DATASET_PATH]/sift-128-euclidean-with-attr.hdf5
attributes_dataset_name: attributes
attribute_spec: [ { name: 'color', type: 'str' }, { name: 'taste', type: 'str' }, { name: 'age', type: 'int' } ]
- name: refresh_index
index_name: target_index
- name: query_with_filter
k: 100
r: 1
calculate_recall: true
index_name: target_index
field_name: target_field
dataset_format: hdf5
dataset_path: [DATASET_PATH]/sift-128-euclidean-with-attr.hdf5
neighbors_format: hdf5
neighbors_path: [DATASET_PATH]/sift-128-euclidean-with-filters.hdf5
neighbors_dataset: neighbors_filter_5
filter_spec: [INDEX_SPEC_PATH]/relaxed-filter-spec.json
filter_type: FILTER
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"settings": {
"index": {
"knn": true,
"number_of_shards": 24,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"target_field": {
"type": "knn_vector",
"dimension": 128,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "faiss",
"parameters": {
"ef_construction": 256,
"m": 16
}
}
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"bool":
{
"must":
[
{
"range":
{
"age":
{
"gte": 30,
"lte": 60
}
}
},
{
"term":
{
"taste": "bitter"
}
},
{
"bool":
{
"should":
[
{
"term":
{
"color": "blue"
}
},
{
"term":
{
"color": "green"
}
}
]
}
}
]
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
endpoint: [ENDPOINT]
test_name: "Faiss HNSW Restrictive Filter Test"
test_id: "Faiss HNSW Restrictive Filter Test"
num_runs: 10
show_runs: false
steps:
- name: delete_index
index_name: target_index
- name: create_index
index_name: target_index
index_spec: [INDEX_SPEC_PATH]/index.json
- name: ingest_multi_field
index_name: target_index
field_name: target_field
bulk_size: 500
dataset_format: hdf5
dataset_path: [DATASET_PATH]/sift-128-euclidean-with-attr.hdf5
attributes_dataset_name: attributes
attribute_spec: [ { name: 'color', type: 'str' }, { name: 'taste', type: 'str' }, { name: 'age', type: 'int' } ]
- name: refresh_index
index_name: target_index
- name: force_merge
index_name: target_index
max_num_segments: 1
- name: query_with_filter
k: 100
r: 1
calculate_recall: true
index_name: target_index
field_name: target_field
dataset_format: hdf5
dataset_path: [DATASET_PATH]/sift-128-euclidean-with-attr.hdf5
neighbors_format: hdf5
neighbors_path: [DATASET_PATH]/sift-128-euclidean-with-filters.hdf5
neighbors_dataset: neighbors_filter_4
filter_spec: [INDEX_SPEC_PATH]/restrictive-filter-spec.json
filter_type: FILTER
26 changes: 26 additions & 0 deletions benchmarks/perf-tool/release-configs/faiss-hnsw/index.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"settings": {
"index": {
"knn": true,
"number_of_shards": 24,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"target_field": {
"type": "knn_vector",
"dimension": 128,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "faiss",
"parameters": {
"ef_construction": 256,
"m": 16
}
}
}
}
}
}
Loading

0 comments on commit 2b1c47e

Please sign in to comment.