Add support for building custom Search Pipelines #596

tanaysoni · 2020-11-17T15:23:59Z

Related issue: #544

tholor

Very nice!
This will be the basis for many exciting changes in the future :)

tholor · 2020-11-20T16:44:54Z

Just adding a small usage example here for completeness:

from haystack.pipeline import DocumentSearchPipeline, ExtractiveQAPipeline, Pipeline, JoinDocuments
from haystack.document_store.elasticsearch import ElasticsearchDocumentStore
from haystack.reader.farm import FARMReader
from haystack.retriever.sparse import ElasticsearchRetriever

# Building blocks (= Nodes)
document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
retriever = ElasticsearchRetriever(document_store=document_store)
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=False)

# Combine via default pipeline (here: QA)
qa_pipe = ExtractiveQAPipeline(reader=reader, retriever=retriever)
res = qa_pipe.run(question="Who is the father of Sansa Stark?", top_k_retriever=2, top_k_reader=5)
print(res)
qa_pipe.draw()

# Combine via default pipeline (here: Document Retrieval)
doc_pipe = DocumentSearchPipeline(retriever=retriever)
res = doc_pipe.run(question="Who is the father of Sansa Stark?", top_k_retriever=2)
print(res)

# Or build your own custom pipeline to model complex search routes.
# Choose existing components or build your own. Stick them together to a DAG
p = Pipeline()
p.add_node(component=retriever, name="ESRetriever1", inputs=["Query"])
p.add_node(component=retriever, name="ESRetriever2", inputs=["Query"])
p.add_node(component=JoinDocuments(join_mode="concatenate"), name="JoinResults", inputs=["ESRetriever1", "ESRetriever2"])

In future PRs, we should improve the execution under the hood, provide more standard "components" and add utility methods to import / export to yml

tanaysoni and others added 16 commits November 17, 2020 16:18

Add Pipeline class

fbc77e7

Refactor pipeline

eccc4c4

Add filters for retriever

13d7778

Add Finder

e775128

Add top_k parameters for the retriever pipeline

0c837f6

Update requirements

15bf8ae

Add docstrings

17df0c0

add str representation for schema objects

6fd996e

Change stream_id to strings

1e3cf17

replace kwargs. add exception for draw

913dde8

merge

156f4a2

Add draw() & add_node() to standard pipelines

84497a5

Change JoinRetrievers to JoinDOcuments

3f9ccee

Change kwargs to explicit arguments

a354579

Add tests

0631b1d

Add metadata for Reader.run()

a008b0e

tanaysoni requested a review from tholor November 20, 2020 15:37

tholor changed the title ~~WIP: Add support for building custom Search Pipeline~~ Add support for building custom Search Pipelines Nov 20, 2020

tholor approved these changes Nov 20, 2020

View reviewed changes

tanaysoni merged commit e3a68ae into master Nov 20, 2020

tanaysoni deleted the pipeline branch November 20, 2020 16:41

This was referenced Nov 23, 2020

Introduce QueryClassifier #611

Closed

Refactor Finder to allow more flexibility (multiple retrievers, pure document search ...) #544

Closed

tanaysoni mentioned this pull request Nov 26, 2020

Rename question parameter to query #614

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for building custom Search Pipelines #596

Add support for building custom Search Pipelines #596

tanaysoni commented Nov 17, 2020

tholor left a comment

tholor commented Nov 20, 2020 •

edited

Loading

Add support for building custom Search Pipelines #596

Add support for building custom Search Pipelines #596

Conversation

tanaysoni commented Nov 17, 2020

tholor left a comment

Choose a reason for hiding this comment

tholor commented Nov 20, 2020 • edited Loading

tholor commented Nov 20, 2020 •

edited

Loading