-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add batch evaluation method for pipelines #2942
Conversation
Co-authored-by: Agnieszka Marzec <[email protected]>
Co-authored-by: Agnieszka Marzec <[email protected]>
@agnieszka-m Thank you for the detailed feedback. All the change requests are addressed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks already quite nice! I left some comments that I think need to be fixed before merging:
- there seems to be a small problem with
add_doc_meta_data_to_answer
within the reader implementation - adding sas values and sorting the dataframe should not both be done in a method called
_add_sas_to_eval_result
- there are some duplicate tests that can be deleted (if I didn't oversee something)
I think it's fine to not have them here, as they are only needed by non-standard pipelines till now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments @julian-risch Not sure if they warrant any changes but I'll leave the state in Request changes
documents: Union[List[Document], List[List[Document]]], | ||
top_k: Optional[int] = None, | ||
batch_size: Optional[int] = None, | ||
labels: Optional[List[MultiLabel]] = None, | ||
add_isolated_node_eval: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this parameter add_isolated_node_eval
? As a user of this API it wasn't clear to me immediately what it is about and why do we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need it. It's the same parameter as in the standard run()
. If it is set to True
, the evaluation is executed with labels as node inputs in addition to the integrated evaluation, where the node inputs are the outputs of the previous node in the pipeline.
params: Optional[dict] = None, | ||
sas_model_name_or_path: Optional[str] = None, | ||
sas_batch_size: int = 32, | ||
sas_use_gpu: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context_matching_boost_split_overlaps=context_matching_boost_split_overlaps, | ||
context_matching_min_length=context_matching_min_length, | ||
context_matching_threshold=context_matching_threshold, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the same parameters for both function calls - not sure if it makes sense to create a dict and then unpack the dict in the two method calls. Maybe that's a bad practice, but it makes the code more compact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting thought. In this case here, I would leave it as is because it's part of the user-facing function pipeline.eval_batch that users can call directly and it occurs only twice here. I think listing all the parameters is more intuitive and easier to understand for users than having some dictionary/custom datastructure that they first need to understand. If it was a function that is used internally only or if it occurred much more often, we could make it more compact with your suggestion yes, I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, good points.
haystack/pipelines/base.py
Outdated
if params is None: | ||
params = {} | ||
else: | ||
params = params.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about one-liner params = {} if params is None else params.copy()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebase and it LGTM @julian-risch
Change requests are addressed. Thank you for your feedback! @tstadel
Related Issue(s):
closes #2636
Proposed changes:
pipeline.eval_batch
method_build_eval_dataframe_from_batches
method that calls_build_eval_dataframe
internally_build_eval_dataframe
is very complex already (the method comprises >300 lines of code and would need some refactoring to be simplified)_add_sas_to_eval_result
to avoid code duplicationtest/pipelines/test_eval_batch.py
and make them usepipeline.eval_batch
use_batch_mode
option toexecute_eval_run
with default set toFalse
untilpipeline.eval_batch
is always faster aspipeline.eval
Limitations:
num_processes
ormax_processes
to 1.Up for discussion: ShouldWe decided no, it's not needed at the moment.standard_pipelines.eval
andstandard_pipelines.eval_batch
have a documents parameter that they pass on?run_batch
does not support different filters (or more generally speaking any different params per query) and thuseval_batch
cannot support filters that differ per query and its label. Thus, labels must not have filters, for example test casetest_extractive_qa_labels_with_filters
won’t work with eval_batchCurrently the following tests are commented out because they are expected to fail due to other issues:
test_extractive_qa_eval_translation
because of TranslationWrapperPipeline returns wrong query format in debug mode #2964test_qa_multi_retriever_pipeline_eval
because of Incomplete debug results for pipeline with two retrievers #2962test_multi_retriever_pipeline_eval
because of Incomplete debug results for pipeline with two retrievers #2962test_multi_retriever_pipeline_with_asymmetric_qa_eval
because of Incomplete debug results for pipeline with two retrievers #2962Pre-flight checklist
I have enabled actions on my fork