feat: Add contraints and embedding directives #3405
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Relevant issue(s)
Resolves #3350
Resolves #3351
Description
Sorry for having 3 technically separate features as part of 1 PR. The reason for this is that I started working on embeddings and the size contraint was initially part of the embedding. Since we discussed that it could be applied to all array fields (and later even to String and Blob), I extracted it into a contraints directive that has a size parameter (more contraints can be added in the future). Furthermore, embeddings returned from ML models are arrays of float32. This caused some precision issues because we only supported float64. When saving the float32 array, querying it would return an float64 array with slight precision issues. I decided to add the float32 type.
You can review the first commit for contraint and embedding related code and the second commit for the float related changes. Some float stuff might have leaked in the first commit. Sorry for this. I tried hard to separate the float32 related changes.
Note that the
gql.Float
type is nowFloat64
internally.is the same as
The embedding generation relies on a 3rd party package called
chromem-go
to call the model provider API. As long as one of the supported provider API is configured and accessible, the embeddings will be generated when adding new documents. I've added a step in the test workflow that will run the embedding specific tests on linux only (this is because installation on windows and mac is less straight forward) using Ollama (because it runs locally).If you're interested in running it locally, install Ollama and define a schema like so
Next steps:
_similarity
operation to calculate the cosine similarity between two arrays.Tasks
How has this been tested?
make test and manual testing
Specify the platform(s) on which this was tested: