-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethink Document Stores #1897
Comments
Related: #1990 |
Another idea: during the dependencies refactoring (#1994), it became rather evident that not many users would ever want to have more than one or two document stores backends. For the time being, having different extras_require groups could be sufficient, but like in the case of Milvus1/2 and FAISS CPU/GPU, in some cases separate groups are not really sufficient. An even cleaner solution would be to have each document store as a separate package. The base Haystack package could come with An external package would have full control over its dependencies, so a single Another advantage of these external packages approach is that it will be easier to share the maintenance burden with external contributors, and allow third parties to easily extend/fork/develop their own document store implementations without having to wait for us. |
Nice, I like the idea. How we going to solve dependencies conflict. In Obsei, I am using I think most of the time people going to use Haystack as a service (instead of SDK or sub dependency in another package). It is fine to pin dependencies. |
That's a good question indeed. I still have to collect some feedback from the rest of the team about dependency pinning, I've heard different opinions on the topic. I personally lean for "weak" pinning, so to fix only a minimum version or a range of versions to allow for some flexibility; but that's still up to debate right now |
Related: #2224 |
The current implementation of Haystack's document stores seems to be unsuitable to account for the great differences that exist between every document store's implementations.
This can be observed in several places:
BaseDocumentStore
pushes on the actual implementation some abstractions that hardly make sense for some of them (like the concept ofindex
)Thanks to #1861 I came to the conclusion that a better architecture for doc stores is possible and would solve basically all of the above. In addition, it could be made backwards compatible until Haystack 2.0 (or any later major version) is released.
The suggested architecture is the following (the code is entirely runnable):
Classes
Testing & Mocking:
Does it work?
Make it backwards compatible
These small classes are the full implementation. I expect them to be sufficient. Later, once we release a new major version, we can consider removing them, as they add no other value than backward-compatibility.
Time estimate
I want to be optimistic and believe that this could take me a few days (i.e. 2 to 5) to put in place, once we agree on its final form. I won't tackle the tests simplification into the same PR. It can be done later in a separate one if the resulting arch is really backwards compatible (that is, if all existing tests pass on the new code without touching them).
I'm very open to better suggestion regarding the naming, and in general please highlight all the issues you see in this implementation, so we can iterate. I see it mostly as a cleanup operation: the functionality will be moved in the backends and not a lot of new code will need to be written.
The text was updated successfully, but these errors were encountered: