Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine FAQ and extractive QA Pipelines #902

Closed
F95GIT opened this issue Mar 18, 2021 · 5 comments
Closed

Combine FAQ and extractive QA Pipelines #902

F95GIT opened this issue Mar 18, 2021 · 5 comments
Assignees

Comments

@F95GIT
Copy link

F95GIT commented Mar 18, 2021

Hi,

I understand from your docs, that it's possible to combine several retrievers and join their results.
Is it also possible to combine the extractive QA pipeline and the FAQ pipeline?
I would like to achieve something similar to this: if a certain term "XY" is included in the query use the FAQ pipeline, if not use the extractive QA pipeline.
Or even better: run the query in both pipelines, join the results and return the most relevant results.
If this is possible, could you please give me a pointer on how to do this?
Thank you!

@tholor
Copy link
Member

tholor commented Mar 29, 2021

Hey @F95GIT ,

Sure, that's possible. It should work pretty similarly to the "multiple retriever examples" in the docs.
Let me try to give you some pointers:

  1. Route Query (Run only one branch)
    class QueryClassifier():
        outgoing_edges = 2

        def run(self, **kwargs):
            # you can put here some rule or fast classification model
            if "?" in kwargs["query"]:
                return (kwargs, "output_1")
            else:
                return (kwargs, "output_2")

    pipe = Pipeline()
    pipe.add_node(component=QueryClassifier(), name="QueryClassifier", inputs=["Query"])
    pipe.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_1"])
    pipe.add_node(component=reader, name="QAReader", inputs=["DPRRetriever"])
    pipe.add_node(component=embedding_retriever, name="FAQRetriever", inputs=["QueryClassifier.output_2"])
    res = p.run(query="What did Einstein work on?", top_k_retriever=1)
  1. Combine results (Run both branches)
    Ideally we should be able to define some pipeline like this:
p = Pipeline()
#extractive branch
p.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["Query"])
p.add_node(component=reader, name="QAReader", inputs=["DPRRetriever"])
#faq branch
p.add_node(component=embedding_retriever, name="EmbeddingRetriever", inputs=["Query"])
#TODO add conversion node "Docs2Answers"
p.add_node(component=JoinAnswers(join_mode="concatenate"), name="JoinResults", inputs=["EmbeddingRetriever", "QAReader"])
res = p.run(query="What did Einstein work on?", top_k_retriever=1)

However, I just realized, that the FAQPipeline has one extra step needed to convert the retrieved documents into proper "answers" (see code here). This is needed, as both branches in the pipeline need to pass lists of Answer to the JoinAnswers node. So we would need a new node (e.g. Document2Answer) that converts the output of EmbeddingRetriever to the needed format.

I guess you already saw those, but maybe it's helpful for others who read this: here are the exemplary code snippets for the retriever pipelines in our docs: https://haystack.deepset.ai/docs/latest/pipelinesmd#Multiple-retrievers

Let me know if you have any further questions on this!

@F95GIT
Copy link
Author

F95GIT commented Apr 7, 2021

@tholor thank you for your detailed answer.
I managed to run the first option (route query), which works well for my use case. If I find the time, I will experiment a bit with second option.

@SasikiranJ
Copy link

@tholor is Document2Answer class available?

@tholor
Copy link
Member

tholor commented May 21, 2021

Nope. You are right this would still be necessary for your case in #1081.

Would you be interested in raising a pull request for it? I can help you with the implementation if needed. Should be rather straight forward...

@SasikiranJ
Copy link

@tholor I will try to do PR for that. You have mentioned JoinAnswers class as well. Both Document2Answer and JoinAnswers should be implemented right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants