-
Notifications
You must be signed in to change notification settings - Fork 2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor: Adapt running benchmarks (#5007)
* Generate eval result in separate method * Adapt benchmarking utils * Adapt running retriever benchmarks * Adapt error message * Adapt running reader benchmarks * Adapt retriever reader benchmark script * Adapt running benchmarks script * Adapt README.md * Raise error if file doesn't exist * Raise error if path doesn't exist or is a directory * minor readme update * Create separate methods for checking if pipeline contains reader or retriever * Fix reader pipeline case --------- Co-authored-by: Darja Fokina <[email protected]>
- Loading branch information
1 parent
2ede4d1
commit b8ff105
Showing
3 changed files
with
179 additions
and
84 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,45 +1,97 @@ | ||
# Benchmarks | ||
|
||
|
||
|
||
To start all benchmarks (e.g. for a new Haystack release), run: | ||
|
||
```` | ||
python run.py --reader --retriever_index --retriever_query --update_json --save_markdown | ||
```` | ||
|
||
For custom runs, you can specify which components and processes to benchmark with the following flags: | ||
The tooling provided in this directory allows running benchmarks on reader pipelines, retriever pipelines, | ||
and retriever-reader pipelines. | ||
|
||
## Defining configuration | ||
|
||
To run a benchmark, you need to create a configuration file first. This file should be a Pipeline YAML file that | ||
contains both the querying and, optionally, the indexing pipeline, in case the querying pipeline includes a retriever. | ||
|
||
The configuration file should also have a **`benchmark_config`** section that includes the following information: | ||
|
||
- **`labels_file`**: The path to a SQuAD-formatted JSON or CSV file that contains the labels to be benchmarked on. | ||
- **`documents_directory`**: The path to a directory containing files intended to be indexed into the document store. | ||
This is only necessary for retriever and retriever-reader pipelines. | ||
- **`data_url`**: This is optional. If provided, the benchmarking script will download data from this URL and | ||
save it in the **`data/`** directory. | ||
|
||
Here is an example of how a configuration file for a retriever-reader pipeline might look like: | ||
|
||
```yaml | ||
components: | ||
- name: DocumentStore | ||
type: ElasticsearchDocumentStore | ||
- name: TextConverter | ||
type: TextConverter | ||
- name: Reader | ||
type: FARMReader | ||
params: | ||
model_name_or_path: deepset/roberta-base-squad2-distilled | ||
- name: Retriever | ||
type: BM25Retriever | ||
params: | ||
document_store: DocumentStore | ||
top_k: 10 | ||
|
||
pipelines: | ||
- name: indexing | ||
nodes: | ||
- name: TextConverter | ||
inputs: [File] | ||
- name: Retriever | ||
inputs: [TextConverter] | ||
- name: DocumentStore | ||
inputs: [Retriever] | ||
- name: querying | ||
nodes: | ||
- name: Retriever | ||
inputs: [Query] | ||
- name: Reader | ||
inputs: [Retriever] | ||
|
||
benchmark_config: | ||
data_url: http://example.com/data.tar.gz | ||
documents_directory: /path/to/documents | ||
labels_file: /path/to/labels.csv | ||
``` | ||
python run.py [--reader] [--retriever_index] [--retriever_query] [--ci] [--update_json] [--save_markdown] | ||
where | ||
**--reader** will trigger the speed and accuracy benchmarks for the reader. Here we simply use the SQuAD dev set. | ||
**--retriever_index** will trigger indexing benchmarks | ||
**--retriever_query** will trigger querying benchmarks (embeddings will be loaded from file instead of being computed on the fly) | ||
## Running benchmarks | ||
**--ci** will cause the the benchmarks to run on a smaller slice of each dataset and a smaller subset of Retriever / Reader / DocStores. | ||
Once you have your configuration file, you can run benchmarks by using the **`run.py`** script. | ||
|
||
**--update-json** will cause the script to update the json files in docs/_src/benchmarks so that the website benchmarks will be updated. | ||
**--save_markdown** save results additionally to the default csv also as a markdown file | ||
```bash | ||
python run.py [--output OUTPUT] config | ||
``` | ||
|
||
Results will be stored in this directory as | ||
- retriever_index_results.csv and retriever_index_results.md | ||
- retriever_query_results.csv and retriever_query_results.md | ||
- reader_results.csv and reader_results.md | ||
|
||
|
||
# Temp. Quickfix for bigger runs | ||
|
||
For bigger indexing runs (500k docs) the standard elastic / opensearch container that we spawn via haystack might run OOM. | ||
Therefore, start them manually before you trigger the benchmark script and assign more memory to them: | ||
|
||
`docker start opensearch > /dev/null 2>&1 || docker run -d -p 9201:9200 -p 9600:9600 -e "discovery.type=single-node" -e "OPENSEARCH_JAVA_OPTS=-Xms4096m -Xmx4096m" --name opensearch opensearchproject/opensearch:2.2.1` | ||
|
||
and | ||
|
||
`docker start elasticsearch > /dev/null 2>&1 || docker run -d -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms4096m -Xmx4096m" --name elasticsearch elasticsearch:7.9.2` | ||
The script takes the following arguments: | ||
|
||
- `config`: This is the path to your configuration file. | ||
- `--output`: This is an optional path where benchmark results should be saved. If not provided, the script will create a JSON file with the same name as the specified config file. | ||
|
||
## Metrics | ||
|
||
The benchmarks yield the following metrics: | ||
|
||
- Reader pipelines: | ||
- Exact match score | ||
- F1 score | ||
- Total querying time | ||
- Seconds/query | ||
- Retriever pipelines: | ||
- Recall | ||
- Mean-average precision | ||
- Total querying time | ||
- Seconds/query | ||
- Queries/second | ||
- Total indexing time | ||
- Number of indexed Documents/second | ||
- Retriever-Reader pipelines: | ||
- Exact match score | ||
- F1 score | ||
- Total querying time | ||
- Seconds/query | ||
- Total indexing time | ||
- Number of indexed Documents/second | ||
|
||
You can find more details about the performance metrics in our [evaluation guide](https://docs.haystack.deepset.ai/docs/evaluation). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,51 +1,77 @@ | ||
# The benchmarks use | ||
# - a variant of the Natural Questions Dataset (https://ai.google.com/research/NaturalQuestions) from Google Research | ||
# licensed under CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/) | ||
# - the SQuAD 2.0 Dataset (https://rajpurkar.github.io/SQuAD-explorer/) from Rajpurkar et al. | ||
# licensed under CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/legalcode) | ||
from pathlib import Path | ||
from typing import Dict | ||
import argparse | ||
import json | ||
|
||
from haystack import Pipeline | ||
from haystack.nodes import BaseRetriever, BaseReader | ||
from haystack.pipelines.config import read_pipeline_config_from_yaml | ||
|
||
from retriever import benchmark_indexing, benchmark_querying | ||
from utils import prepare_environment, contains_reader, contains_retriever | ||
from reader import benchmark_reader | ||
from utils import load_config | ||
import argparse | ||
from retriever import benchmark_retriever | ||
from retriever_reader import benchmark_retriever_reader | ||
|
||
|
||
def run_benchmark(pipeline_yaml: Path) -> Dict: | ||
""" | ||
Run benchmarking on a given pipeline. Pipeline can be a retriever, reader, or retriever-reader pipeline. | ||
In case of retriever or retriever-reader pipelines, indexing is also benchmarked, so the config file must | ||
contain an indexing pipeline as well. | ||
:param pipeline_yaml: Path to pipeline YAML config. The config file should contain a benchmark_config section where | ||
the following parameters are specified: | ||
- documents_directory: Directory containing files to index. | ||
- labels_file: Path to evaluation set. | ||
- data_url (optional): URL to download the data from. Downloaded data will be stored in | ||
the directory `data/`. | ||
""" | ||
pipeline_config = read_pipeline_config_from_yaml(pipeline_yaml) | ||
benchmark_config = pipeline_config.pop("benchmark_config", {}) | ||
|
||
# Prepare environment | ||
prepare_environment(pipeline_config, benchmark_config) | ||
labels_file = Path(benchmark_config["labels_file"]) | ||
|
||
querying_pipeline = Pipeline.load_from_config(pipeline_config, pipeline_name="querying") | ||
pipeline_contains_reader = contains_reader(querying_pipeline) | ||
pipeline_contains_retriever = contains_retriever(querying_pipeline) | ||
|
||
# Retriever-Reader pipeline | ||
if pipeline_contains_retriever and pipeline_contains_reader: | ||
documents_dir = Path(benchmark_config["documents_directory"]) | ||
indexing_pipeline = Pipeline.load_from_config(pipeline_config, pipeline_name="indexing") | ||
|
||
results = benchmark_retriever_reader(indexing_pipeline, querying_pipeline, documents_dir, labels_file) | ||
|
||
# Retriever pipeline | ||
elif pipeline_contains_retriever: | ||
documents_dir = Path(benchmark_config["documents_directory"]) | ||
indexing_pipeline = Pipeline.load_from_config(pipeline_config, pipeline_name="indexing") | ||
|
||
results = benchmark_retriever(indexing_pipeline, querying_pipeline, documents_dir, labels_file) | ||
|
||
# Reader pipeline | ||
elif pipeline_contains_reader: | ||
results = benchmark_reader(querying_pipeline, labels_file) | ||
|
||
# Unsupported pipeline type | ||
else: | ||
raise ValueError("Pipeline must be a retriever, reader, or retriever-reader pipeline.") | ||
|
||
results["config_file"] = pipeline_config | ||
return results | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument("config", type=str, help="Path to pipeline YAML config.") | ||
parser.add_argument("--output", type=str, help="Path to output file.") | ||
args = parser.parse_args() | ||
|
||
config_file = Path(args.config) | ||
output_file = f"{config_file.stem}_results.json" if args.output is None else args.output | ||
|
||
parser = argparse.ArgumentParser() | ||
|
||
parser.add_argument("--reader", default=False, action="store_true", help="Perform Reader benchmarks") | ||
parser.add_argument( | ||
"--retriever_index", default=False, action="store_true", help="Perform Retriever indexing benchmarks" | ||
) | ||
parser.add_argument( | ||
"--retriever_query", default=False, action="store_true", help="Perform Retriever querying benchmarks" | ||
) | ||
parser.add_argument( | ||
"--ci", default=False, action="store_true", help="Perform a smaller subset of benchmarks that are quicker to run" | ||
) | ||
parser.add_argument( | ||
"--update_json", | ||
default=False, | ||
action="store_true", | ||
help="Update the json file with the results of this run so that the website can be updated", | ||
) | ||
parser.add_argument( | ||
"--save_markdown", | ||
default=False, | ||
action="store_true", | ||
help="Update the json file with the results of this run so that the website can be updated", | ||
) | ||
args = parser.parse_args() | ||
|
||
# load config | ||
params, filenames = load_config(config_filename="config.json", ci=args.ci) | ||
|
||
if args.retriever_index: | ||
benchmark_indexing( | ||
**params, **filenames, ci=args.ci, update_json=args.update_json, save_markdown=args.save_markdown | ||
) | ||
if args.retriever_query: | ||
benchmark_querying( | ||
**params, **filenames, ci=args.ci, update_json=args.update_json, save_markdown=args.save_markdown | ||
) | ||
if args.reader: | ||
benchmark_reader(**params, **filenames, ci=args.ci, update_json=args.update_json, save_markdown=args.save_markdown) | ||
results = run_benchmark(config_file) | ||
with open(output_file, "w") as f: | ||
json.dump(results, f, indent=2) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters