Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add batch evaluation method for pipelines #2942

Merged
merged 29 commits into from
Aug 25, 2022
Merged

feat: add batch evaluation method for pipelines #2942

merged 29 commits into from
Aug 25, 2022

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented Aug 1, 2022

Related Issue(s):
closes #2636

Proposed changes:

  • Add a pipeline.eval_batch method
  • Add a _build_eval_dataframe_from_batches method that calls _build_eval_dataframe internally
    • I went for this solution to keep code duplication to a minimum. _build_eval_dataframe is very complex already (the method comprises >300 lines of code and would need some refactoring to be simplified)
  • Group some code in _add_sas_to_eval_result to avoid code duplication
  • Copy most eval tests into test/pipelines/test_eval_batch.py and make them use pipeline.eval_batch
  • Add use_batch_mode option to execute_eval_run with default set to False until pipeline.eval_batch is always faster as pipeline.eval

Limitations:

  • I faced multiprocessing issues as discussed offline and the current workaround is setting num_processes or max_processes to 1.
  • Up for discussion: Should standard_pipelines.eval and standard_pipelines.eval_batch have a documents parameter that they pass on? We decided no, it's not needed at the moment.
  • run_batch does not support different filters (or more generally speaking any different params per query) and thus eval_batch cannot support filters that differ per query and its label. Thus, labels must not have filters, for example test case test_extractive_qa_labels_with_filters won’t work with eval_batch

Currently the following tests are commented out because they are expected to fail due to other issues:

Pre-flight checklist

@julian-risch julian-risch marked this pull request as ready for review August 8, 2022 13:36
@julian-risch julian-risch requested review from a team as code owners August 8, 2022 13:36
@julian-risch julian-risch requested a review from tstadel August 8, 2022 13:49
docs/_src/api/api/pipelines.md Outdated Show resolved Hide resolved
docs/_src/api/api/pipelines.md Outdated Show resolved Hide resolved
docs/_src/api/api/pipelines.md Outdated Show resolved Hide resolved
docs/_src/api/api/pipelines.md Outdated Show resolved Hide resolved
docs/_src/api/api/pipelines.md Outdated Show resolved Hide resolved
haystack/pipelines/standard_pipelines.py Outdated Show resolved Hide resolved
haystack/pipelines/standard_pipelines.py Outdated Show resolved Hide resolved
haystack/pipelines/standard_pipelines.py Outdated Show resolved Hide resolved
haystack/pipelines/standard_pipelines.py Outdated Show resolved Hide resolved
haystack/pipelines/standard_pipelines.py Outdated Show resolved Hide resolved
@julian-risch
Copy link
Member Author

@agnieszka-m Thank you for the detailed feedback. All the change requests are addressed now.

tstadel
tstadel previously requested changes Aug 9, 2022
Copy link
Member

@tstadel tstadel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks already quite nice! I left some comments that I think need to be fixed before merging:

  • there seems to be a small problem with add_doc_meta_data_to_answer within the reader implementation
  • adding sas values and sorting the dataframe should not both be done in a method called _add_sas_to_eval_result
  • there are some duplicate tests that can be deleted (if I didn't oversee something)

test/pipelines/test_eval_batch.py Outdated Show resolved Hide resolved
test/pipelines/test_eval_batch.py Outdated Show resolved Hide resolved
test/pipelines/test_eval_batch.py Outdated Show resolved Hide resolved
test/pipelines/test_eval_batch.py Outdated Show resolved Hide resolved
test/pipelines/test_eval_batch.py Outdated Show resolved Hide resolved
test/pipelines/test_eval_batch.py Outdated Show resolved Hide resolved
haystack/nodes/reader/base.py Outdated Show resolved Hide resolved
haystack/nodes/reader/base.py Outdated Show resolved Hide resolved
haystack/pipelines/base.py Show resolved Hide resolved
@tstadel
Copy link
Member

tstadel commented Aug 9, 2022

  • Up for discussion: Should standard_pipelines.eval and standard_pipelines.eval_batch have a documents parameter that they pass on?

I think it's fine to not have them here, as they are only needed by non-standard pipelines till now.

@julian-risch julian-risch requested a review from vblagoje August 17, 2022 15:00
Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments @julian-risch Not sure if they warrant any changes but I'll leave the state in Request changes

haystack/nodes/reader/base.py Outdated Show resolved Hide resolved
documents: Union[List[Document], List[List[Document]]],
top_k: Optional[int] = None,
batch_size: Optional[int] = None,
labels: Optional[List[MultiLabel]] = None,
add_isolated_node_eval: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this parameter add_isolated_node_eval? As a user of this API it wasn't clear to me immediately what it is about and why do we need it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need it. It's the same parameter as in the standard run(). If it is set to True, the evaluation is executed with labels as node inputs in addition to the integrated evaluation, where the node inputs are the outputs of the previous node in the pipeline.

params: Optional[dict] = None,
sas_model_name_or_path: Optional[str] = None,
sas_batch_size: int = 32,
sas_use_gpu: bool = True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I traced the sas_use_gpu parameter being passed to CrossEncoder via semantic_answer_similarity method. Let's keep in mind that we should soon replace all use_gpu parameter with devices parameter (as per #3062 and #2826) Just to keep in mind as todo item.

context_matching_boost_split_overlaps=context_matching_boost_split_overlaps,
context_matching_min_length=context_matching_min_length,
context_matching_threshold=context_matching_threshold,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the same parameters for both function calls - not sure if it makes sense to create a dict and then unpack the dict in the two method calls. Maybe that's a bad practice, but it makes the code more compact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting thought. In this case here, I would leave it as is because it's part of the user-facing function pipeline.eval_batch that users can call directly and it occurs only twice here. I think listing all the parameters is more intuitive and easier to understand for users than having some dictionary/custom datastructure that they first need to understand. If it was a function that is used internally only or if it occurred much more often, we could make it more compact with your suggestion yes, I agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, good points.

if params is None:
params = {}
else:
params = params.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about one-liner params = {} if params is None else params.copy()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍

Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase and it LGTM @julian-risch

@julian-risch julian-risch dismissed tstadel’s stale review August 25, 2022 15:47

Change requests are addressed. Thank you for your feedback! @tstadel

@julian-risch julian-risch merged commit 3e3ff33 into main Aug 25, 2022
@julian-risch julian-risch deleted the batch-eval branch August 25, 2022 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pipeline eval should use batch processing for queries
4 participants