Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Milvus2DocumentStore compatible with pymilvus>=2.0.0 #2126

Merged
merged 56 commits into from
Feb 24, 2022
Merged

Conversation

MichelBartels
Copy link
Contributor

Proposed changes:
This PR fixes incompatibilities of Milvus2DocumentStore with pymilvus>=2.0.0 and fixes the extra dependencies for milvus2 as described in #2121.

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code
  • Added tests
  • Updated documentation

@MichelBartels MichelBartels requested a review from ZanSara February 4, 2022 11:45
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far seems good 🙂 However I think the test are still running on Milvus1, is it? We should make them run on Milvus2 before merging.

Running the tests on Milvus2 is going to be a bit problematic because now Milvus can't simply run in a container like before. I've reached out to @lalitpagaria a while ago on this PR: #2077 but he couldn't help back then.

@@ -45,7 +45,7 @@ class Milvus2DocumentStore(SQLDocumentStore):

Usage:
1. Start a Milvus service via docker (see https://milvus.io/docs/v2.0.0/install_standalone-docker.md)
2. Run pip install pymilvus===2.0.0rc6
2. Run pip install pymilvus>=2.0.0
3. Init a Milvus2DocumentStore() in Haystack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change the name of this class too.

In addition, the line above could also change from 2. Run pip install pymilvus>=2.0.0 to 2. Run pip install farm-haystack[milvus], and I would put a similar docstring into the Milvus1DocumentStore class as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, done

@MichelBartels
Copy link
Contributor Author

In general, I think tests for Milvus2 should definitely be enabled. However, I think we should do that when we make Milvus2 the default because the dependencies for testing Milvus1DocumentStore and Milvus2DocumentStore differ.

At the moment, I would think it's more important to check Milvus1DocumentStore because a stable version Milvus2 has not been released yet (should happen in the next few months, however).

This was referenced Feb 10, 2022
@ZanSara ZanSara linked an issue Feb 11, 2022 that may be closed by this pull request
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Michel Bartels seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@MichelBartels
Copy link
Contributor Author

The tests should now also work with Milvus 2. They will be run when MILVUS2_ENABLED is set to true.

It would probably be important to discuss what consistency level we want to use as default. Currently, this is the strongest consistency level which is necessary for the tests to run, but which also very slow. You can already specify different consistency levels, but I'm not sure if users would do that or just disregard Milvus as slow. Setting it to something different could, however, lead to unexpected behaviour.

@MichelBartels
Copy link
Contributor Author

@ZanSara Thanks for the review. I think I should be able to do those three steps. If this assessment turns out to be wrong, I'll get back to you.

if ids:
self._delete_vector_ids_from_milvus(ids=ids, index=index)
elif filters:
existing_docs = super().get_all_documents(filters=filters, index=index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use Generator?
If is it loose filter then it going to consume a lot of memory as well processing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea. The feature is now added.

test/conftest.py Outdated
keywords = []

if "milvus" in document_store_types_to_run and os.getenv("MILVUS2_ENABLED"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't we are making Milvus2 as default? If so then how about making it MILVUS1_ENABLED :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also what @ZanSara was referring to. I have now updated those environment variables.

@lalitpagaria
Copy link
Contributor

@MichelBartels @ZanSara I was a bit off some time.
Reviewed the PR. It looks good to me.

My worry about Milvus2 is that they changed a lot of constructs since I worked on it. Mainly pymilvus client function signature. If they keep doing that in the future then we need to update the code again.

I am just trying to understand the reason for the separate test-milvus2 step in the CI job. Can we move Milvus1 out or one small CI job which only runs milvus1 tests, but use same pip cache (except removing pymilvus2 and adding pymilvus1)
WDYT?

@ZanSara
Copy link
Contributor

ZanSara commented Feb 21, 2022

Hey @lalitpagaria, thanks for the review! Yeah I hope they won't make such drastic changes at every major release,..

For the CI, I admit it's a bit messy right now. We are still considering how to deal with this Milvus situation, whether to maintain both docstores or just the 2.x. If we decide to keep both, I'll move Milvus 1.x in a separate task as you suggest, but for now I didn't do it because we might just as well drop it and I didn't want to waste too much time on it.

Do you think the current setup can cause any issues? Taking this decision on which version of Milvus to keep is not really a priority for now, so I believe it's going to stay like this for a while 😅

@lalitpagaria
Copy link
Contributor

@ZanSara Yeah I agree it is taking a lot of time which I can see from timeline.
Yeah, we create another issue to clean CI in different PR.
In this PR we can migrate to milvus2 and have related tests.

@MichelBartels
Copy link
Contributor Author

@ZanSara I have now updated the environment variables and the launch_milvus utility function as you suggested. However, I didn't change setup.cfg as it seems to me as if milvus 2 is the default case there already. Please let me know if I'm missing something.

@ZanSara ZanSara self-requested a review February 24, 2022 10:41
@ZanSara ZanSara merged commit 2c423ba into master Feb 24, 2022
@ZanSara ZanSara deleted the fix-milvus2 branch February 24, 2022 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update tutorials, utilities, tests and dependencies to run on Milvus2
4 participants