Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate BEIR #2333

Merged
merged 16 commits into from
Mar 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions docs/_src/api/api/document_store.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,25 @@ When set to None (default) all available eval documents are used.
same question might be found in different contexts.
- `headers`: Custom HTTP headers to pass to document store client if supported (e.g. {'Authorization': 'Basic YWRtaW46cm9vdA=='} for basic authentication)

<a id="base.BaseDocumentStore.delete_index"></a>

#### delete\_index

```python
@abstractmethod
def delete_index(index: str)
```

Delete an existing index. The index including all data will be removed.

**Arguments**:

- `index`: The name of the index to delete.

**Returns**:

None

<a id="base.BaseDocumentStore.run"></a>

#### run
Expand Down Expand Up @@ -1842,6 +1861,24 @@ Example:

None

<a id="memory.InMemoryDocumentStore.delete_index"></a>

#### delete\_index

```python
def delete_index(index: str)
```

Delete an existing index. The index including all data will be removed.

**Arguments**:

- `index`: The name of the index to delete.

**Returns**:

None

<a id="memory.InMemoryDocumentStore.delete_labels"></a>

#### delete\_labels
Expand Down Expand Up @@ -2127,6 +2164,24 @@ have their ID in the list).

None

<a id="sql.SQLDocumentStore.delete_index"></a>

#### delete\_index

```python
def delete_index(index: str)
```

Delete an existing index. The index including all data will be removed.

**Arguments**:

- `index`: The name of the index to delete.

**Returns**:

None

<a id="sql.SQLDocumentStore.delete_labels"></a>

#### delete\_labels
Expand Down Expand Up @@ -2371,6 +2426,24 @@ have their ID in the list).

None

<a id="faiss.FAISSDocumentStore.delete_index"></a>

#### delete\_index

```python
def delete_index(index: str)
```

Delete an existing index. The index including all data will be removed.

**Arguments**:

- `index`: The name of the index to delete.

**Returns**:

None

<a id="faiss.FAISSDocumentStore.query_by_embedding"></a>

#### query\_by\_embedding
Expand Down Expand Up @@ -2641,6 +2714,24 @@ have their ID in the list).

None

<a id="milvus1.Milvus1DocumentStore.delete_index"></a>

#### delete\_index

```python
def delete_index(index: str)
```

Delete an existing index. The index including all data will be removed.

**Arguments**:

- `index`: The name of the index to delete.

**Returns**:

None

<a id="milvus1.Milvus1DocumentStore.get_all_documents_generator"></a>

#### get\_all\_documents\_generator
Expand Down Expand Up @@ -2932,6 +3023,24 @@ Example: {"name": ["some", "more"], "category": ["only_one"]}

None

<a id="milvus2.Milvus2DocumentStore.delete_index"></a>

#### delete\_index

```python
def delete_index(index: str)
```

Delete an existing index. The index including all data will be removed.

**Arguments**:

- `index`: The name of the index to delete.

**Returns**:

None

<a id="milvus2.Milvus2DocumentStore.get_all_documents_generator"></a>

#### get\_all\_documents\_generator
Expand Down Expand Up @@ -3565,6 +3674,24 @@ operation.

None

<a id="weaviate.WeaviateDocumentStore.delete_index"></a>

#### delete\_index

```python
def delete_index(index: str)
```

Delete an existing index. The index including all data will be removed.

**Arguments**:

- `index`: The name of the index to delete.

**Returns**:

None

<a id="weaviate.WeaviateDocumentStore.delete_labels"></a>

#### delete\_labels
Expand Down
59 changes: 58 additions & 1 deletion docs/_src/api/api/pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -413,7 +413,7 @@ Set the component for a node in the Pipeline.
#### run

```python
def run(query: Optional[str] = None, file_paths: Optional[List[str]] = None, labels: Optional[MultiLabel] = None, documents: Optional[List[Document]] = None, meta: Optional[dict] = None, params: Optional[dict] = None, debug: Optional[bool] = None)
def run(query: Optional[str] = None, file_paths: Optional[List[str]] = None, labels: Optional[MultiLabel] = None, documents: Optional[List[Document]] = None, meta: Optional[Union[dict, List[dict]]] = None, params: Optional[dict] = None, debug: Optional[bool] = None)
```

Runs the pipeline, one node at a time.
Expand All @@ -434,6 +434,35 @@ about their execution. By default these include the input parameters
they received and the output they generated. All debug information can
then be found in the dict returned by this method under the key "_debug"

<a id="base.Pipeline.eval_beir"></a>

#### eval\_beir

```python
@classmethod
def eval_beir(cls, index_pipeline: Pipeline, query_pipeline: Pipeline, index_params: dict = {}, query_params: dict = {}, dataset: str = "scifact", dataset_dir: Path = Path("."), top_k_values: List[int] = [1, 3, 5, 10, 100, 1000], keep_index: bool = False) -> Tuple[Dict[str, float], Dict[str, float], Dict[str, float], Dict[str, float]]
```

Runs information retrieval evaluation of a pipeline using BEIR on a specified BEIR dataset.

See https://github.com/beir-cellar/beir for more information.

**Arguments**:

- `index_pipeline`: The indexing pipeline to use.
- `query_pipeline`: The query pipeline to evaluate.
- `index_params`: The params to use during indexing (see pipeline.run's params).
- `query_params`: The params to use during querying (see pipeline.run's params).
- `dataset`: The BEIR dataset to use.
- `dataset_dir`: The directory to store the dataset to.
- `top_k_values`: The top_k values each metric will be calculated for.
- `keep_index`: Whether to keep the index after evaluation.
If True the index will be kept after beir evaluation. Otherwise it will be deleted immediately afterwards.
Defaults to False.

Returns a tuple containing the ncdg, map, recall and precision scores.
Each metric is represented by a dictionary containing the scores for each top_k value.

<a id="base.Pipeline.eval"></a>

#### eval
Expand Down Expand Up @@ -835,6 +864,34 @@ def __call__(*args, **kwargs)

Ray calls this method which is then re-directed to the corresponding component's run().

<a id="base._HaystackBeirRetrieverAdapter"></a>

## \_HaystackBeirRetrieverAdapter

```python
class _HaystackBeirRetrieverAdapter()
```

<a id="base._HaystackBeirRetrieverAdapter.__init__"></a>

#### \_\_init\_\_

```python
def __init__(index_pipeline: Pipeline, query_pipeline: Pipeline, index_params: dict, query_params: dict)
```

Adapter mimicking a BEIR retriever used by BEIR's EvaluateRetrieval class to run BEIR evaluations on Haystack Pipelines.

This has nothing to do with Haystack's retriever classes.
See https://github.com/beir-cellar/beir/blob/main/beir/retrieval/evaluation.py.

**Arguments**:

- `index_pipeline`: The indexing pipeline to use.
- `query_pipeline`: The query pipeline to evaluate.
- `index_params`: The params to use during indexing (see pipeline.run's params).
- `query_params`: The params to use during querying (see pipeline.run's params).

<a id="standard_pipelines"></a>

# Module standard\_pipelines
Expand Down
10 changes: 10 additions & 0 deletions haystack/document_stores/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -512,6 +512,16 @@ def delete_labels(
):
pass

@abstractmethod
def delete_index(self, index: str):
"""
Delete an existing index. The index including all data will be removed.

:param index: The name of the index to delete.
:return: None
"""
pass

@abstractmethod
def _create_document_field_map(self) -> Dict:
pass
Expand Down
3 changes: 3 additions & 0 deletions haystack/document_stores/deepsetcloud.py
Original file line number Diff line number Diff line change
Expand Up @@ -485,3 +485,6 @@ def delete_labels(
headers: Optional[Dict[str, str]] = None,
):
raise NotImplementedError("DeepsetCloudDocumentStore currently does not support labels.")

def delete_index(self, index: str):
raise NotImplementedError("DeepsetCloudDocumentStore currently does not support deleting indexes.")
19 changes: 11 additions & 8 deletions haystack/document_stores/elasticsearch.py
Original file line number Diff line number Diff line change
Expand Up @@ -318,12 +318,11 @@ def _create_document_index(self, index_name: str, headers: Optional[Dict[str, st
if self.search_fields:
for search_field in self.search_fields:
if search_field in mapping["properties"] and mapping["properties"][search_field]["type"] != "text":
host_data = self.client.transport.hosts[0]
raise Exception(
f"The search_field '{search_field}' of index '{index_name}' with type '{mapping['properties'][search_field]['type']}' "
f"does not have the right type 'text' to be queried in fulltext search. Please use only 'text' type properties as search_fields. "
f"This error might occur if you are trying to use haystack 1.0 and above with an existing elasticsearch index created with a previous version of haystack."
f"In this case deleting the index with `curl -X DELETE \"{host_data['host']}:{host_data['port']}/{index_name}\"` will fix your environment. "
f"does not have the right type 'text' to be queried in fulltext search. Please use only 'text' type properties as search_fields or use another index. "
f"This error might occur if you are trying to use haystack 1.0 and above with an existing elasticsearch index created with a previous version of haystack. "
f'In this case deleting the index with `delete_index(index="{index_name}")` will fix your environment. '
f"Note, that all data stored in the index will be lost!"
)
if self.embedding_field:
Expand Down Expand Up @@ -1571,6 +1570,11 @@ def delete_index(self, index: str):
:param index: The name of the index to delete.
:return: None
"""
if index == self.index:
logger.warning(
f"Deletion of default index '{index}' detected. "
f"If you plan to use this index again, please reinstantiate '{self.__class__.__name__}' in order to avoid side-effects."
)
self.client.indices.delete(index=index, ignore=[400, 404])
logger.debug(f"deleted elasticsearch index {index}")

Expand Down Expand Up @@ -1790,12 +1794,11 @@ def _create_document_index(self, index_name: str, headers: Optional[Dict[str, st
search_field in mappings["properties"]
and mappings["properties"][search_field]["type"] != "text"
):
host_data = self.client.transport.hosts[0]
raise Exception(
f"The search_field '{search_field}' of index '{index_name}' with type '{mappings['properties'][search_field]['type']}' "
f"does not have the right type 'text' to be queried in fulltext search. Please use only 'text' type properties as search_fields. "
f"This error might occur if you are trying to use haystack 1.0 and above with an existing elasticsearch index created with a previous version of haystack."
f"In this case deleting the index with `curl -X DELETE \"{host_data['host']}:{host_data['port']}/{index_name}\"` will fix your environment. "
f"does not have the right type 'text' to be queried in fulltext search. Please use only 'text' type properties as search_fields or use another index. "
f"This error might occur if you are trying to use haystack 1.0 and above with an existing elasticsearch index created with a previous version of haystack. "
f'In this case deleting the index with `delete_index(index="{index_name}")` will fix your environment. '
f"Note, that all data stored in the index will be lost!"
)

Expand Down
15 changes: 15 additions & 0 deletions haystack/document_stores/faiss.py
Original file line number Diff line number Diff line change
Expand Up @@ -525,6 +525,21 @@ def delete_documents(

super().delete_documents(index=index, ids=ids, filters=filters)

def delete_index(self, index: str):
"""
Delete an existing index. The index including all data will be removed.

:param index: The name of the index to delete.
:return: None
"""
if index == self.index:
logger.warning(
f"Deletion of default index '{index}' detected. "
f"If you plan to use this index again, please reinstantiate '{self.__class__.__name__}' in order to avoid side-effects."
)
del self.faiss_indexes[index]
super().delete_index(index)

def query_by_embedding(
self,
query_emb: np.ndarray,
Expand Down
14 changes: 14 additions & 0 deletions haystack/document_stores/memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -743,6 +743,20 @@ def delete_documents(
for doc in docs_to_delete:
del self.indexes[index][doc.id]

def delete_index(self, index: str):
"""
Delete an existing index. The index including all data will be removed.

:param index: The name of the index to delete.
:return: None
"""
if index == self.index:
logger.warning(
f"Deletion of default index '{index}' detected. "
f"If you plan to use this index again, please reinstantiate '{self.__class__.__name__}' in order to avoid side-effects."
)
del self.indexes[index]

def delete_labels(
self,
index: Optional[str] = None,
Expand Down
15 changes: 15 additions & 0 deletions haystack/document_stores/milvus1.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,21 @@ def delete_documents(
# Delete from SQL at the end to allow the above .get_all_documents() to work properly
super().delete_documents(index=index, ids=ids, filters=filters)

def delete_index(self, index: str):
"""
Delete an existing index. The index including all data will be removed.

:param index: The name of the index to delete.
:return: None
"""
if index == self.index:
logger.warning(
f"Deletion of default index '{index}' detected. "
f"If you plan to use this index again, please reinstantiate '{self.__class__.__name__}' in order to avoid side-effects."
)
self.milvus_server.drop_collection(index)
super().delete_index(index)

def get_all_documents_generator(
self,
index: Optional[str] = None,
Expand Down
Loading