-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Milvus2DocumentStore compatible with pymilvus>=2.0.0 #2126
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far seems good 🙂 However I think the test are still running on Milvus1, is it? We should make them run on Milvus2 before merging.
Running the tests on Milvus2 is going to be a bit problematic because now Milvus can't simply run in a container like before. I've reached out to @lalitpagaria a while ago on this PR: #2077 but he couldn't help back then.
haystack/document_stores/milvus2x.py
Outdated
@@ -45,7 +45,7 @@ class Milvus2DocumentStore(SQLDocumentStore): | |||
|
|||
Usage: | |||
1. Start a Milvus service via docker (see https://milvus.io/docs/v2.0.0/install_standalone-docker.md) | |||
2. Run pip install pymilvus===2.0.0rc6 | |||
2. Run pip install pymilvus>=2.0.0 | |||
3. Init a Milvus2DocumentStore() in Haystack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change the name of this class too.
In addition, the line above could also change from 2. Run pip install pymilvus>=2.0.0
to 2. Run pip install farm-haystack[milvus]
, and I would put a similar docstring into the Milvus1DocumentStore
class as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, done
In general, I think tests for Milvus2 should definitely be enabled. However, I think we should do that when we make Milvus2 the default because the dependencies for testing Milvus1DocumentStore and Milvus2DocumentStore differ. At the moment, I would think it's more important to check Milvus1DocumentStore because a stable version Milvus2 has not been released yet (should happen in the next few months, however). |
…into fix-milvus2
Michel Bartels seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
The tests should now also work with Milvus 2. They will be run when MILVUS2_ENABLED is set to true. It would probably be important to discuss what consistency level we want to use as default. Currently, this is the strongest consistency level which is necessary for the tests to run, but which also very slow. You can already specify different consistency levels, but I'm not sure if users would do that or just disregard Milvus as slow. Setting it to something different could, however, lead to unexpected behaviour. |
@ZanSara Thanks for the review. I think I should be able to do those three steps. If this assessment turns out to be wrong, I'll get back to you. |
haystack/document_stores/milvus2x.py
Outdated
if ids: | ||
self._delete_vector_ids_from_milvus(ids=ids, index=index) | ||
elif filters: | ||
existing_docs = super().get_all_documents(filters=filters, index=index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use Generator?
If is it loose filter then it going to consume a lot of memory as well processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good idea. The feature is now added.
test/conftest.py
Outdated
keywords = [] | ||
|
||
if "milvus" in document_store_types_to_run and os.getenv("MILVUS2_ENABLED"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't we are making Milvus2 as default? If so then how about making it MILVUS1_ENABLED :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is also what @ZanSara was referring to. I have now updated those environment variables.
@MichelBartels @ZanSara I was a bit off some time. My worry about Milvus2 is that they changed a lot of constructs since I worked on it. Mainly pymilvus client function signature. If they keep doing that in the future then we need to update the code again. I am just trying to understand the reason for the separate test-milvus2 step in the CI job. Can we move Milvus1 out or one small CI job which only runs milvus1 tests, but use same pip cache (except removing pymilvus2 and adding pymilvus1) |
Hey @lalitpagaria, thanks for the review! Yeah I hope they won't make such drastic changes at every major release,.. For the CI, I admit it's a bit messy right now. We are still considering how to deal with this Milvus situation, whether to maintain both docstores or just the 2.x. If we decide to keep both, I'll move Milvus 1.x in a separate task as you suggest, but for now I didn't do it because we might just as well drop it and I didn't want to waste too much time on it. Do you think the current setup can cause any issues? Taking this decision on which version of Milvus to keep is not really a priority for now, so I believe it's going to stay like this for a while 😅 |
@ZanSara Yeah I agree it is taking a lot of time which I can see from timeline. |
@ZanSara I have now updated the environment variables and the launch_milvus utility function as you suggested. However, I didn't change setup.cfg as it seems to me as if milvus 2 is the default case there already. Please let me know if I'm missing something. |
Proposed changes:
This PR fixes incompatibilities of Milvus2DocumentStore with pymilvus>=2.0.0 and fixes the extra dependencies for milvus2 as described in #2121.
Status (please check what you already did):