Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for vector search. #22

Merged
merged 46 commits into from
Mar 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
2a82076
Starting work on Vector support.
tombatron Feb 3, 2024
01d5801
Getting started on vector support by defining the vector schema field.
tombatron Feb 3, 2024
3785670
Starting on HNSW index definition.
tombatron Feb 3, 2024
5d40d06
Added in a bunch of documentation. No real implementation yet.
tombatron Feb 3, 2024
906c230
Added in a class to encapsulate the flat index spec.
tombatron Feb 4, 2024
7dfbff9
Roughed in the constructor here.
tombatron Feb 4, 2024
875f448
Added in some documentation.
tombatron Feb 4, 2024
2665777
Added in a bunch of tests to make pass.
tombatron Feb 4, 2024
faef305
Initial implementation of the flat vector index algo.
tombatron Feb 4, 2024
19a3346
Added in initial implementation for defining hnsw indexes.
tombatron Feb 4, 2024
d4b0629
Added in a vector schema field builder method.
tombatron Feb 12, 2024
f45d7ce
Wired up the schema field for hnsw vector including aliases.
tombatron Feb 12, 2024
c098255
I think I'm finished with the hash based index definitions here.
tombatron Feb 13, 2024
57fbf34
Wired up the alias builder as well as the JSON builder for vector
tombatron Feb 13, 2024
b505469
Fixed documentation here.
tombatron Feb 13, 2024
2aef62e
Outlined some integration tests.
tombatron Feb 15, 2024
f40cbf4
Finished sketching out the initial integration tests.
tombatron Feb 16, 2024
03751aa
Fixed a few more issues with the tests and field generation logic.
tombatron Feb 16, 2024
2d16da2
Minor formating here.
tombatron Feb 16, 2024
71140ea
Updated the documentation here.
tombatron Feb 16, 2024
3925c93
Replaced a bunch of redundant stuff with test theory.
tombatron Feb 16, 2024
2c8ebc7
Added in some sample data to use with vector query integration tests.
tombatron Feb 17, 2024
3d8e7f2
Fixed deprecation issue.
tombatron Feb 17, 2024
77ca822
Now writing sample vector data.
tombatron Feb 17, 2024
b71a188
Now adding test vectors and creating test indexes.
tombatron Feb 17, 2024
7f7fa2d
Moved the sample vector data stuff into the SampleData static class just
tombatron Feb 17, 2024
9b25fc9
Getting started building the vector query builders and what not.
tombatron Feb 17, 2024
6711ad8
Roughed in the surface of the fluent interface.
tombatron Feb 17, 2024
7016d32
I think that I've sketched in all of the builder methods that I need for
tombatron Feb 17, 2024
0f14cac
Created failing unit test for building a KNN query.
tombatron Feb 17, 2024
1be825b
I think the initial tests are done now.
tombatron Feb 17, 2024
c796771
Initial build method is implemented.
tombatron Feb 17, 2024
08ac235
Well... all the unit tests are passing now... I wonder if this stuff
tombatron Feb 18, 2024
b3c6d3d
A little bit of cleanup and a slight expansion of the vector query API..
tombatron Feb 18, 2024
9a44823
Added in the ability to specify return fields and their aliases for the
tombatron Feb 18, 2024
8d5c611
I think we're good for basic knn vector query.
tombatron Feb 18, 2024
ece3d04
A little bit of clean up here.
tombatron Feb 18, 2024
17a44f2
Added in some tests to demonstrate simple vector querying and sorting.
tombatron Mar 2, 2024
f278da6
Tried to make the API for KNN and Range queries the same-ish.
tombatron Mar 2, 2024
bbe5feb
Added in ability to specify additional filters for range queries.
tombatron Mar 2, 2024
5cbdf48
Upgraded test projects to .NET 8.
tombatron Mar 22, 2024
ce34b3e
Updated the Github Actions configuration here.
tombatron Mar 22, 2024
b7ccf0e
Upped the version here.
tombatron Mar 23, 2024
9225682
Reference to new documentation.
tombatron Mar 23, 2024
853e253
Updated the Redis docker image used for testing.
tombatron Mar 23, 2024
4b713b9
Fixed some integration tests here.
tombatron Mar 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/dotnet.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:

services:
redis:
image: redislabs/redismod:preview
image: redis/redis-stack-server:latest
ports:
- 6379:6379

Expand All @@ -22,7 +22,7 @@ jobs:
- name: Setup .NET
uses: actions/setup-dotnet@v1
with:
dotnet-version: 6.0.x
dotnet-version: 8.0.x
- name: Restore dependencies
run: dotnet restore
- name: Build
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
- name: Setup .NET
uses: actions/setup-dotnet@v1
with:
dotnet-version: 6.0.x
dotnet-version: 8.0.x
- name: Clean
run: dotnet clean ./RediSearchClient.sln --configuration Release && dotnet nuget locals all --clear
- name: Restore dependencies
Expand Down
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ So here we are.
- [Custom Dictionaries](#custom-dictionaries)
* [Tag Values](#tag-values)
* [Synonyms](#synonyms)
* [Vector Support **\*NEW\***](https://github.com/tombatron/RediSearchClient/wiki/VectorSearchSupport)

## Installation

Expand Down Expand Up @@ -108,6 +109,46 @@ var indexDefintion = RediSearchIndex
.Build();
```

If you're using RediSearch as a vector database, you can now define a `VECTOR` field on hash-based and JSON-based indexes.

The following is an example of defining a `VECTOR` field on a hash-based index:

```csharp
var indexDefinition = RediSearchIndex
.OnHash()
.WithSchema(
x=> x.Text("Name"),
x=> x.Vector("Embedding",
VectorIndexAlgorithm.FLAT(
type: VectorType.FLOAT32,
dimensions: 32,
distanceMetric: DistanceMetric.L2,
initialCap: 30,
blockSize: 20))
)
.Build();
```

Next is an example of defining a `VECTOR` field on a JSON-based index:

```csharp
var indexDefinition = RediSearchIndex
.OnJson()
.WithSchema(
x=> x.Text("$.Id", "Id"),
x=> x.Vector("$.Embedding", alias: "Embedded",
VectorIndexAlgorithm.FLAT(
type: VectorType.FLOAT64,
dimensions: 33,
distanceMetric: DistanceMetric.L2,
initialCap: 23,
blockSize: 22))
)
.Build();
```

Note, that while the above examples specify a "FLAT" vector index, the `VectorIndexAlgorithm` factory class will also allow you to define an HNSW index as well.

#### Dates and Times

The only field types that RediSearch supports are "Text", "Tag", "Numeric", and "Geo". No dates and times.
Expand Down
165 changes: 117 additions & 48 deletions RediSearchClient.IntegrationTests/BaseIntegrationTest.cs
Original file line number Diff line number Diff line change
@@ -1,87 +1,156 @@
using System;
using NReJSON;
using RediSearchClient.Indexes;
using StackExchange.Redis;
using System;
using System.Text.Json;
using System.Threading;
using static RediSearchClient.IntegrationTests.SampleData;

namespace RediSearchClient.IntegrationTests
namespace RediSearchClient.IntegrationTests;

public abstract class BaseIntegrationTest : IDisposable
{
public abstract class BaseIntegrationTest : IDisposable
private static bool HasIndexCleanupRun = false;

protected const string MovieDataPrefix = "movie::";

private ConnectionMultiplexer _muxr;
protected IDatabase _db;
protected string _indexName;
protected string _recordPrefix;
protected string _dictionaryName;
protected string _hashVectorIndexName;
protected string _jsonVectorIndexName;

protected virtual void Setup()
{
private static bool HasIndexCleanupRun = false;
NReJSONSerializer.SerializerProxy = new SystemTextJsonSerializer();

protected const string MovieDataPrefix = "movie::";
_muxr = ConnectionMultiplexer.Connect("localhost");

private ConnectionMultiplexer _muxr;
protected IDatabase _db;
protected string _indexName;
protected string _recordPrefix;
protected string _dictionaryName;
_db = _muxr.GetDatabase(0);

protected virtual void Setup()
{
_muxr = ConnectionMultiplexer.Connect("localhost");
_indexName = Guid.NewGuid().ToString("n");
_recordPrefix = Guid.NewGuid().ToString("n");
_dictionaryName = Guid.NewGuid().ToString("n");
_hashVectorIndexName = Guid.NewGuid().ToString("n");
_jsonVectorIndexName = Guid.NewGuid().ToString("n");

_db = _muxr.GetDatabase(0);
SetupDemoMovieData();
SetupTestVectorData();
}

_indexName = Guid.NewGuid().ToString("n");
_recordPrefix = Guid.NewGuid().ToString("n");
_dictionaryName = Guid.NewGuid().ToString("n");
public virtual void TearDown()
{
//CleanupIndexes();

SetupDemoMovieData();
_muxr.Dispose();
}

CleanupIndexes();
}
public BaseIntegrationTest()
{
Setup();
}

public virtual void TearDown()
{
_muxr.Dispose();
}
public void Dispose()
{
TearDown();
}

public BaseIntegrationTest()
private void SetupDemoMovieData()
{
if (_db.KeyExists($"{MovieDataPrefix}1"))
{
Setup();
// Movies are already in the database, bail.
return;
}

public void Dispose()
for (var i = 0; i < Movies.Length; i++)
{
TearDown();
_db.HashSet($"{MovieDataPrefix}{i + 1}", Movies[i]);
}
}

private void SetupDemoMovieData()
private void SetupTestVectorData()
{
foreach (var vec in SampleData.SampleVectorData)
{
if (_db.KeyExists($"{MovieDataPrefix}1"))
_db.HashSet($"test_hash_vector:{vec.Name}", new[]
{
// Movies are already in the database, bail.
return;
}
new HashEntry("name", vec.Name),
new HashEntry("feature_embeddings", vec.FileBytes)
});

for (var i = 0; i < Movies.Length; i++)
_db.JsonSet($"test_json_vector:{vec.Name}", new
{
_db.HashSet($"{MovieDataPrefix}{i + 1}", Movies[i]);
}
name = vec.Name,
feature_embeddings = vec.FileFloats
});
}

private static object locker = new object();
// Create the test indexes.
var hashIndex = RediSearchIndex
.OnHash()
.ForKeysWithPrefix("test_hash_vector:")
.WithSchema(
s => s.Text("name"),
s => s.Vector("feature_embeddings",
VectorIndexAlgorithm.HNSW(
type: VectorType.FLOAT32,
dimensions: 512, // Used ResNet34 to generate feature embeddings...
distanceMetric: DistanceMetric.COSINE
))
).Build();

var jsonIndex = RediSearchIndex
.OnJson()
.ForKeysWithPrefix("test_hash_vector:")
.WithSchema(
s => s.Text("$.name", "name"),
s => s.Vector("$.feature_embeddings", "feature_embeddings",
VectorIndexAlgorithm.HNSW(
type: VectorType.FLOAT32,
dimensions: 512, // Used ResNet34 to generate feature embeddings...
distanceMetric: DistanceMetric.COSINE
))
).Build();

_db.CreateIndex(_hashVectorIndexName, hashIndex);
_db.CreateIndex(_jsonVectorIndexName, jsonIndex);

Thread.Sleep(500);
}

private void CleanupIndexes()
private static object locker = new object();

private void CleanupIndexes()
{
if (!HasIndexCleanupRun)
{
if (!HasIndexCleanupRun)
lock (locker)
{
lock (locker)
if (!HasIndexCleanupRun)
{
if (!HasIndexCleanupRun)
foreach (var index in _db.ListIndexes())
{
foreach (var index in _db.ListIndexes())
if(Guid.TryParse(index, out var _))
{
if(Guid.TryParse(index, out var _))
{
_db.DropIndex(index);
}
_db.DropIndex(index);
}

HasIndexCleanupRun = true;
}

HasIndexCleanupRun = true;
}
}
}
}
}

public sealed class SystemTextJsonSerializer : ISerializerProxy
{
public TResult Deserialize<TResult>(RedisResult serializedValue) =>
JsonSerializer.Deserialize<TResult>(serializedValue.ToString());

public string Serialize<TObjectType>(TObjectType obj) =>
JsonSerializer.Serialize(obj);
}
Loading
Loading