-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluating a pipeline consisting only of a reader node #2132
Conversation
I'm sure I'm missing something. But doesn't this do exactly the same as |
The effect is the same, yes. The difference is that the PR here also works with a pipeline that consists only of a reader node. For some use cases you might not have a retriever. Thus, it would be unintuitive to create an ExtractiveQAPipeline and then run |
def run(self, query: str, documents: List[Document] = None, top_k: Optional[int] = None, labels: Optional[MultiLabel] = None, add_isolated_node_eval: bool = False): # type: ignore
self.query_count += 1
if documents:
predict = self.timing(self.predict, "query_time")
results = predict(query=query, documents=documents, top_k=top_k)
else:
results = {"answers": []}
# Add corresponding document_name and more meta data, if an answer contains the document_id
results["answers"] = [
BaseReader.add_doc_meta_data_to_answer(documents=documents, answer=answer) for answer in results["answers"]
]
# run evaluation with labels as node inputs
if add_isolated_node_eval and labels is not None:
predict = self.timing(self.predict, "query_time")
unique_docs = {label.document.id: label.document for label in labels.labels}
relevant_documents = unique_docs.values()
results_label_input = predict(query=query, documents=relevant_documents, top_k=top_k)
# Add corresponding document_name and more meta data, if an answer contains the document_id
results["answers_isolated"] = [
BaseReader.add_doc_meta_data_to_answer(documents=relevant_documents, answer=answer)
for answer in results_label_input["answers"]
]
return results, "output_1" |
@ju-gu and I just found out that we can achieve the same with
The only advantage I can see for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Proposed changes:
pipeline.eval()
gets an additional parameterpass_documents_as_input
that enables passing the gold documents specified in the labels to the first node in the pipeline as input. It's an alternative way to evaluate the reader node only, with the advantage that no retriever needs to be specified in the pipeline.Status (please check what you already did):