-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for building custom Search Pipelines #596
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
This will be the basis for many exciting changes in the future :)
Just adding a small usage example here for completeness: from haystack.pipeline import DocumentSearchPipeline, ExtractiveQAPipeline, Pipeline, JoinDocuments
from haystack.document_store.elasticsearch import ElasticsearchDocumentStore
from haystack.reader.farm import FARMReader
from haystack.retriever.sparse import ElasticsearchRetriever
# Building blocks (= Nodes)
document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
retriever = ElasticsearchRetriever(document_store=document_store)
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=False)
# Combine via default pipeline (here: QA)
qa_pipe = ExtractiveQAPipeline(reader=reader, retriever=retriever)
res = qa_pipe.run(question="Who is the father of Sansa Stark?", top_k_retriever=2, top_k_reader=5)
print(res)
qa_pipe.draw()
# Combine via default pipeline (here: Document Retrieval)
doc_pipe = DocumentSearchPipeline(retriever=retriever)
res = doc_pipe.run(question="Who is the father of Sansa Stark?", top_k_retriever=2)
print(res)
# Or build your own custom pipeline to model complex search routes.
# Choose existing components or build your own. Stick them together to a DAG
p = Pipeline()
p.add_node(component=retriever, name="ESRetriever1", inputs=["Query"])
p.add_node(component=retriever, name="ESRetriever2", inputs=["Query"])
p.add_node(component=JoinDocuments(join_mode="concatenate"), name="JoinResults", inputs=["ESRetriever1", "ESRetriever2"]) In future PRs, we should improve the execution under the hood, provide more standard "components" and add utility methods to import / export to yml |
Related issue: #544