-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Finder to allow more flexibility (multiple retrievers, pure document search ...) #544
Comments
We have a use case here @etalab with @psorianom, we have already talked about it (#125), but here is a summary : Our need is to combine at least 2 retrievers :
Why: Mostly because BM25 is very efficient but lacks the retrieval of synonyms, which could be a good addon from the dense retrievers. Note: => We are very looking for having this available with to the FastAPI swagger |
@guillim Yes, this case will definitely be covered! |
For above, we could check this out - https://github.com/facebookresearch/hydra |
Quick Update: We are currently thinking bigger here. With Haystack, we already have many nice "lego building blocks". However, we are missing a flexible, powerful way of sticking them together. Instead of having rather rigid You could add "tasks" as nodes (Retriever, Reader, Generator ...) and route your query via edges. This could cover not only all of the above use cases, but would also allow many other, more complex search pipelines that we have in mind for the future. Happy to hear your feedback on this direction! |
Sounds great to me. It would indeed fit many more complex situations, and be helpful for testing combos |
Nice idea @tholor Not related to this but to make it more extensible. How about adding remote API call and callback support. Mainly I am thinking inline of Jina framework. Specially to make haystack highly distributed by adding cloud native support. So Generator, Retriever, Docs Cleaner, APIs etc will run on their own env/containers/machine but each will communicate via RPC (gRPC or Http). |
Yes, absolutely. Our idea is to start with a "local" Pipeline and get the API / usage straight and then later on enabling a "distributed" pipeline with the execution of single nodes on different containers/machines. |
Awesome let me know if I can contribute to it |
We implemented a first basic draft with #596.
We will tackle those steps in individual PRs & Issues ... |
Is your feature request related to a problem? Please describe.
The finder was initially designed to wrap a single retriever and a single reader to do extractive QA.
As Haystack is growing and covering more use cases we need to rethink the design to allow:
To consider:
Some thoughts to get started...
=> Single vs. Multiple Finder classes?
Option B) Single Finder
get_answers()
get_documents()
Option A) Multiple Finders
a) splitting into faq, generative, extractive
b) DocFinder QAFinder => we don't gain much except clearer init
2) One vs. splitting get_answers()? generate_answer, faq_answer ...
3) Passing List of retrievers vs. Finder.add(retriever) + Finder.add(retriever) + Finder.add(reader)
4) Redesign API endpoints (e.g.
doc-qa
still right naming?)I'd lean towards...
=> Single Finder
=> get_answers() get_documents()
=> API: documents object (ids+meta)
=> remove get_answers_via_similar...()
Open question: FAQ via get_documents() or via get_answers()
The text was updated successfully, but these errors were encountered: