Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multiple spaces #457

Merged
merged 5 commits into from
May 4, 2023

Conversation

HammadB
Copy link
Collaborator

@HammadB HammadB commented May 3, 2023

Description of changes

Add support for multiple distance metrics in tests.

  1. We coin-flip and sometimes add a space when using hnsw_params
  2. Added the distance functions to the invariant and use them when needed.

In the process of writing this test I discovered a bug with our implementation of update that was revealed by the inner product space. Since the inner product is not a true metric, a point may not be a neighbor to itself. Our update code was strictly appending to the index due to the a bug with how we manage string UUID vs UUID objects. In l2 and cosine spaces, this usually was fine in the eyes of tests since the results returned were correct with the updated data. But IP exacerbated the issue by making the results not always be the same point.

Test plan

These are tests!

Documentation Changes

None required!

@HammadB HammadB requested a review from atroyn May 3, 2023 18:00
@@ -128,7 +132,7 @@ def add(self, ids, embeddings, update=False):

labels = []
for id in ids:
if id in self._id_to_label:
if hexid(id) in self._id_to_label:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bugfix

@@ -141,7 +145,9 @@ def add(self, ids, embeddings, update=False):
labels.append(next_label)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is weird that sometimes we store the UUID vs the id but leaving this for now to minimize the bugs i might introduce.

@@ -71,7 +75,7 @@ class Hnswlib(Index):
_index_metadata: IndexMetadata
_params: HnswParams
_id_to_label: Dict[str, int]
_label_to_id: Dict[int, str]
_label_to_id: Dict[int, UUID]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below comment on how its odd that we store the string and the uuid in opposite direction. Updating the typing.

@@ -172,7 +176,7 @@ def persist_generated_data_with_old_version(
coll = api.create_collection(
name=collection_strategy.name,
metadata=collection_strategy.metadata,
embedding_function=collection_strategy.embedding_function,
embedding_function=lambda x: None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the tests faster because we pass none across the process boundary because we can't pickle a lambda

@HammadB HammadB enabled auto-merge (squash) May 4, 2023 21:10
@HammadB HammadB merged commit 000a9d3 into team/hypothesis-tests May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants