Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the ef_search default value for faiss HNSW with filters and updated the perf-tool to include Faiss HNSW tests #926

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 33 additions & 14 deletions benchmarks/perf-tool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,36 @@ file.

## Install Prerequisites

### Python
### Setup

Python 3.7 or above is required.
K-NN perf requires Python 3.8 or greater to be installed. One of
the easier ways to do this is through Conda, a package and environment
management system for Python.

### Pip
First, follow the
[installation instructions](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html)
to install Conda on your system.

Use pip to install the necessary requirements:
Next, create a Python 3.8 environment:
```
conda create -n knn-perf python=3.8
```

After the environment is created, activate it:
```
source activate knn-perf
```

Lastly, clone the k-NN repo and install all required python packages:
```
git clone https://github.com/opensearch-project/k-NN.git
cd k-NN/benchmarks/perf-tool
pip install -r requirements.txt
```

After all of this completes, you should be ready to run your first performance benchmarks!


## Usage

### Quick Start
Expand Down Expand Up @@ -72,16 +90,17 @@ The output will be the delta between the two metrics.

### Test Parameters

| Parameter Name | Description | Default |
| ----------- | ----------- | ----------- |
| endpoint | Endpoint OpenSearch cluster is running on | localhost |
| test_name | Name of test | No default |
| test_id | String ID of test | No default |
| num_runs | Number of runs to execute steps | 1 |
| show_runs | Whether to output each run in addition to the total summary | false |
| setup | List of steps to run once before metric collection starts | [] |
| steps | List of steps that make up one test run. Metrics will be collected on these steps. | No default |
| cleanup | List of steps to run after each test run | [] |
| Parameter Name | Description | Default |
|----------------|------------------------------------------------------------------------------------|------------|
| endpoint | Endpoint OpenSearch cluster is running on | localhost |
| port | Port on which OpenSearch Cluster is running on | 9200 |
| test_name | Name of test | No default |
| test_id | String ID of test | No default |
| num_runs | Number of runs to execute steps | 1 |
| show_runs | Whether to output each run in addition to the total summary | false |
| setup | List of steps to run once before metric collection starts | [] |
| steps | List of steps that make up one test run. Metrics will be collected on these steps. | No default |
| cleanup | List of steps to run after each test run | [] |

### Steps

Expand Down
5 changes: 5 additions & 0 deletions benchmarks/perf-tool/okpt/io/config/parsers/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ class TestConfig:
test_name: str
test_id: str
endpoint: str
port: int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If port is configurable we need to update description of test params in benchmarks/perf-tool/README

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

num_runs: int
show_runs: bool
setup: List[Step]
Expand All @@ -48,6 +49,9 @@ def parse(self, file_obj: TextIOWrapper) -> TestConfig:
if 'endpoint' in config_obj:
implicit_step_config['endpoint'] = config_obj['endpoint']

if 'port' in config_obj:
implicit_step_config['port'] = config_obj['port']

# Each step should have its own parse - take the config object and check if its valid
setup = []
if 'setup' in config_obj:
Expand All @@ -62,6 +66,7 @@ def parse(self, file_obj: TextIOWrapper) -> TestConfig:

test_config = TestConfig(
endpoint=config_obj['endpoint'],
port=config_obj['port'],
test_name=config_obj['test_name'],
test_id=config_obj['test_id'],
num_runs=config_obj['num_runs'],
Expand Down
3 changes: 3 additions & 0 deletions benchmarks/perf-tool/okpt/io/config/schemas/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@
endpoint:
type: string
default: "localhost"
port:
type: integer
default: 9200
test_name:
type: string
test_id:
Expand Down
11 changes: 5 additions & 6 deletions benchmarks/perf-tool/okpt/test/steps/steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# compatible open source license.
"""Provides steps for OpenSearch tests.

Some of the OpenSearch operations return a `took` field in the response body,
Some OpenSearch operations return a `took` field in the response body,
so the profiling decorators aren't needed for some functions.
"""
import json
Expand Down Expand Up @@ -454,8 +454,7 @@ def _action(self):
results['took'] = [
float(query_response['took']) for query_response in query_responses
]
port = 9200 if self.endpoint == 'localhost' else 80
results['memory_kb'] = get_cache_size_in_kb(self.endpoint, port)
results['memory_kb'] = get_cache_size_in_kb(self.endpoint, self.port)

if self.calculate_recall:
ids = [[int(hit['_id'])
Expand Down Expand Up @@ -614,7 +613,6 @@ def _action(self):
num_of_search_segments = 0;
for shard_key in shards.keys():
for segment in shards[shard_key]:

num_of_committed_segments += segment["num_committed_segments"]
num_of_search_segments += segment["num_search_segments"]

Expand Down Expand Up @@ -689,12 +687,13 @@ def delete_model(endpoint, port, model_id):
return response.json()


def get_opensearch_client(endpoint: str, port: int):
def get_opensearch_client(endpoint: str, port: int, timeout=60):
"""
Get an opensearch client from an endpoint and port
Args:
endpoint: Endpoint OpenSearch is running on
port: Port OpenSearch is running on
timeout: timeout for OpenSearch client, default value 60
Returns:
OpenSearch client

Expand All @@ -708,7 +707,7 @@ def get_opensearch_client(endpoint: str, port: int):
use_ssl=False,
verify_certs=False,
connection_class=RequestsHttpConnection,
timeout=60,
timeout=timeout,
)


Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"settings": {
"index": {
"knn": true,
"number_of_shards": 24,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"target_field": {
"type": "knn_vector",
"dimension": 128,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "faiss",
"parameters": {
"ef_construction": 256,
"m": 16
}
}
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
{
"bool":
{
"should":
[
{
"range":
{
"age":
{
"gte": 30,
"lte": 70
}
}
},
{
"term":
{
"color": "green"
}
},
{
"term":
{
"color": "blue"
}
},
{
"term":
{
"color": "yellow"
}
},
{
"term":
{
"color": "sweet"
}
}
]
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
endpoint: [ENDPOINT]
test_name: "Faiss HNSW Relaxed Filter Test"
test_id: "Faiss HNSW Relaxed Filter Test"
num_runs: 10
show_runs: false
steps:
- name: delete_index
index_name: target_index
- name: create_index
index_name: target_index
index_spec: [INDEX_SPEC_PATH]/relaxed-filter/index.json
- name: ingest_multi_field
index_name: target_index
field_name: target_field
bulk_size: 500
dataset_format: hdf5
dataset_path: [DATASET_PATH]/sift-128-euclidean-with-attr.hdf5
attributes_dataset_name: attributes
attribute_spec: [ { name: 'color', type: 'str' }, { name: 'taste', type: 'str' }, { name: 'age', type: 'int' } ]
- name: refresh_index
index_name: target_index
- name: query_with_filter
k: 100
r: 1
calculate_recall: true
index_name: target_index
field_name: target_field
dataset_format: hdf5
dataset_path: [DATASET_PATH]/sift-128-euclidean-with-attr.hdf5
neighbors_format: hdf5
neighbors_path: [DATASET_PATH]/sift-128-euclidean-with-filters.hdf5
neighbors_dataset: neighbors_filter_5
filter_spec: [INDEX_SPEC_PATH]/relaxed-filter-spec.json
filter_type: FILTER
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"settings": {
"index": {
"knn": true,
"number_of_shards": 24,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"target_field": {
"type": "knn_vector",
"dimension": 128,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "faiss",
"parameters": {
"ef_construction": 256,
"m": 16
}
}
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"bool":
{
"must":
[
{
"range":
{
"age":
{
"gte": 30,
"lte": 60
}
}
},
{
"term":
{
"taste": "bitter"
}
},
{
"bool":
{
"should":
[
{
"term":
{
"color": "blue"
}
},
{
"term":
{
"color": "green"
}
}
]
}
}
]
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
endpoint: [ENDPOINT]
test_name: "Faiss HNSW Restrictive Filter Test"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add port: here and in other .yml files as well. Else, it will pick default port as 80 instead of 9200 for localhost and will fail

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making default value as 9200

test_id: "Faiss HNSW Restrictive Filter Test"
num_runs: 10
show_runs: false
steps:
- name: delete_index
index_name: target_index
- name: create_index
index_name: target_index
index_spec: [INDEX_SPEC_PATH]/index.json
- name: ingest_multi_field
index_name: target_index
field_name: target_field
bulk_size: 500
dataset_format: hdf5
dataset_path: [DATASET_PATH]/sift-128-euclidean-with-attr.hdf5
attributes_dataset_name: attributes
attribute_spec: [ { name: 'color', type: 'str' }, { name: 'taste', type: 'str' }, { name: 'age', type: 'int' } ]
- name: refresh_index
index_name: target_index
- name: force_merge
index_name: target_index
max_num_segments: 1
- name: query_with_filter
k: 100
r: 1
calculate_recall: true
index_name: target_index
field_name: target_field
dataset_format: hdf5
dataset_path: [DATASET_PATH]/sift-128-euclidean-with-attr.hdf5
neighbors_format: hdf5
neighbors_path: [DATASET_PATH]/sift-128-euclidean-with-filters.hdf5
neighbors_dataset: neighbors_filter_4
filter_spec: [INDEX_SPEC_PATH]/restrictive-filter-spec.json
filter_type: FILTER
26 changes: 26 additions & 0 deletions benchmarks/perf-tool/release-configs/faiss-hnsw/index.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"settings": {
"index": {
"knn": true,
"number_of_shards": 24,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"target_field": {
"type": "knn_vector",
"dimension": 128,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "faiss",
"parameters": {
"ef_construction": 256,
"m": 16
}
}
}
}
}
}
Loading