Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for collections property tests #328

Merged
merged 7 commits into from
Apr 13, 2023

Conversation

levand
Copy link
Contributor

@levand levand commented Apr 11, 2023

Description of changes

  • Fix collection name validation
  • Don't reinitialize the default function every time a Collection model object is constructed (1000x perf improvement, thanks @HammadB !)
  • When doing an "upsert" on a collection (get_or_create), update the metadata if it's different from the previous value.

Test plan

These changes are to satisfy the new property-based tests for Collections, which now pass.

Documentation Changes

No user facing changes; documented behavior was not changed.

@levand levand requested review from jeffchuber, atroyn and HammadB April 11, 2023 14:24
@@ -143,6 +143,9 @@ def create_collection(

if len(dupe_check) > 0:
if get_or_create:
if dupe_check[0][2] != metadata:
self.update_collection(name, new_name=name, new_metadata=metadata)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update the documentation with this behavior? (get_or_create will update metadata)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is strictly more intuitive than the current behavior of just dropping it, so I am not sure we need it in the tutorials or overview docs.

I just pushed a change to add it ito the docstrings so it will be present in generated API docs when we have those.

Copy link
Collaborator

@HammadB HammadB Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to add to this https://docs.trychroma.com/usage-guide#creating-inspecting-and-deleting-collections a note on the behavior of metadata. Better than leaving it implicit IMO - wdyt? We already have an example for modify() at the bottom of that section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will add that before merging this.

@@ -29,6 +29,9 @@
from chromadb.api import API


default_embedding_function = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be cleaner to push the singleton down to the EF level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean have a separate singleton for each possible EF? We certainly could: but I'm not sure it's applicable in all cases. For example, in the multi-tenant hosted version, users might entrust us with their OpenAI API keys and we would definitely not want those in a singleton. This is probably only appropriate for the default embedding function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - just for the default we'd change the class so it only has a singleton model, rather than making the singleton live at this level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, make sense, will do.

@@ -82,6 +82,10 @@ def create_collection(
dupe_check = self.get_collection(name)
if len(dupe_check) > 0:
if get_or_create is True:
if dupe_check[0][2] != metadata:
self.update_collection(name, new_name=name, new_metadata=metadata)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally - will this be the same behavior as upsert with embedding metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope so. That won't get merged until after this, so I think we should argue that "replace" makes more sense than "merge" across the board.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great. Let's make sure that what we do.

levand added 3 commits April 11, 2023 14:46
This reverts commit f48a07a. Going to
    use a different approach.
Saves a lot of time during testing by not re-constructing them all the time.
@levand
Copy link
Contributor Author

levand commented Apr 12, 2023

@HammadB I think this is ready to merge.

The current documentation does not talk about collection-level metadata, at all, so IMO adding that to the usage-guide is out of scope for a bugfix PR like this one. I created a separate ticket so we don't lose track of that (#336 )

@levand levand requested a review from HammadB April 12, 2023 14:12
@HIMANSHUSINGHYANIA

This comment was marked as off-topic.

@HIMANSHUSINGHYANIA

This comment was marked as off-topic.

Copy link
Collaborator

@HammadB HammadB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think @HIMANSHUSINGHYANIA is referencing above some of the areas we reference collection.modify and create, my suggestion was to add the metadata to those calls as a quick clarification. Documentation should always be in scope if we are modifying code and it is undocumented - otherwise we'll rarely get to it.

@levand levand merged commit 2558481 into lukev/collection-state-machine-tests Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants