Skip to content

Commit

Permalink
Adding bool query post-filtering option
Browse files Browse the repository at this point in the history
Signed-off-by: Martin Gaievski <[email protected]>
  • Loading branch information
martin-gaievski committed Oct 28, 2022
1 parent 3064f07 commit 2d7440e
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 16 deletions.
32 changes: 16 additions & 16 deletions benchmarks/perf-tool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,22 +286,22 @@ Runs a set of queries with filter against an index.

##### Parameters

| Parameter Name | Description | Default |
| ----------- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
| k | Number of neighbors to return on search | 100 |
| r | r value in Recall@R | 1 |
| index_name | Name of index to search | No default |
| field_name | Name field to search | No default |
| calculate_recall | Whether to calculate recall values | False |
| dataset_format | Format the dataset is in. Currently hdf5 and bigann is supported. The hdf5 file must be organized in the same way that the ann-benchmarks organizes theirs. | 'hdf5' |
| dataset_path | Path to dataset | No default |
| neighbors_format | Format the neighbors dataset is in. Currently hdf5 and bigann is supported. The hdf5 file must be organized in the same way that the ann-benchmarks organizes theirs. | 'hdf5' |
| neighbors_path | Path to neighbors dataset | No default |
| neighbors_dataset | Name of filter dataset inside the neighbors dataset | No default |
| filter_spec | Path to filter specification | No default |
| filter_type | Type of filter format, we do support following types: <br/>FILTER inner filter format for approximate k-NN search<br/>SCRIPT score scripting style with exact k-NN search | SCRIPT |
| score_script_similarity | Similarity function that has been used to index dataset. Used for SCRIPT filter type and ignored for others | l2 |
| query_count | Number of queries to create from data-set | Size of the data-set |
| Parameter Name | Description | Default |
| ----------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
| k | Number of neighbors to return on search | 100 |
| r | r value in Recall@R | 1 |
| index_name | Name of index to search | No default |
| field_name | Name field to search | No default |
| calculate_recall | Whether to calculate recall values | False |
| dataset_format | Format the dataset is in. Currently hdf5 and bigann is supported. The hdf5 file must be organized in the same way that the ann-benchmarks organizes theirs. | 'hdf5' |
| dataset_path | Path to dataset | No default |
| neighbors_format | Format the neighbors dataset is in. Currently hdf5 and bigann is supported. The hdf5 file must be organized in the same way that the ann-benchmarks organizes theirs. | 'hdf5' |
| neighbors_path | Path to neighbors dataset | No default |
| neighbors_dataset | Name of filter dataset inside the neighbors dataset | No default |
| filter_spec | Path to filter specification | No default |
| filter_type | Type of filter format, we do support following types: <br/>FILTER inner filter format for approximate k-NN search<br/>SCRIPT score scripting with exact k-NN search and pre-filtering<br/>BOOL_POST_FILTER Bool query with post-filtering | SCRIPT |
| score_script_similarity | Similarity function that has been used to index dataset. Used for SCRIPT filter type and ignored for others | l2 |
| query_count | Number of queries to create from data-set | Size of the data-set |

##### Metrics

Expand Down
19 changes: 19 additions & 0 deletions benchmarks/perf-tool/okpt/test/steps/steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,25 @@ def get_body_filter(vec):
}
}
}
elif self.filter_type == 'BOOL_POST_FILTER':
return {
'size': self.k,
'query': {
'bool': {
'filter': filter_json,
'must': [
{
'knn': {
self.field_name: {
'vector': vec,
'k': self.k
}
}
}
]
}
}
}
else:
raise ConfigurationError('Not supported filter type {}'.format(self.filter_type))

Expand Down

0 comments on commit 2d7440e

Please sign in to comment.