Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSHARP-5202: BSON Binary Vector Subtype Support #1581

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

BorisDog
Copy link
Contributor

No description provided.

@BorisDog BorisDog requested a review from a team as a code owner December 27, 2024 00:28
@BorisDog BorisDog requested review from JamesKovacs, papafe and sanych-sun and removed request for a team and JamesKovacs December 27, 2024 00:28
Copy link
Contributor

@papafe papafe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it makes sense, I just got some notes/questions

src/MongoDB.Bson/ObjectModel/BsonVector.cs Outdated Show resolved Hide resolved
src/MongoDB.Bson/ObjectModel/BsonVector.cs Outdated Show resolved Hide resolved
src/MongoDB.Bson/Serialization/BsonVectorReader.cs Outdated Show resolved Hide resolved
src/MongoDB.Bson/Serialization/BsonVectorReader.cs Outdated Show resolved Hide resolved
src/MongoDB.Bson/Serialization/BsonVectorReader.cs Outdated Show resolved Hide resolved
@BorisDog BorisDog requested a review from papafe January 2, 2025 23:00
src/MongoDB.Bson/ObjectModel/VectorDataType.cs Outdated Show resolved Hide resolved
{
var itemType = type.GetElementType();
var vectorSerializerType = typeof(BsonVectorArraySerializer<>).MakeGenericType(itemType);
var vectorSerializer = (IBsonSerializer)Activator.CreateInstance(vectorSerializerType, DataType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to move this into BsonVectorArraySerializer.Create(Type itemType) static factory method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and more similar below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible as BsonVectorArraySerializer and the rest are generic classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can create non-generic static class just to have this factory methods. We are doing it in other places. (far example, NullableSerializer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/MongoDB.Bson/ObjectModel/VectorDataType.cs Outdated Show resolved Hide resolved
/// </summary>
/// <typeparam name="TItemCollection">The concrete type derived from <see cref="BsonVector{T}"/>.</typeparam>
/// <typeparam name="TItem">The .NET data type.</typeparam>
public sealed class BsonVectorSerializer<TItemCollection, TItem> : BsonVectorSerializerBase<TItemCollection, TItem>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we override Equals for the Serializer to make proper comparison of serializers? It might be important if Vestors could be used in LINQ statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss if it's safe to have comparison implemented in base class only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't override Equals if no private data was introduced.

public static byte[] WriteVectorData<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding)
where T : struct
{
var vectorDataBytes = MemoryMarshal.Cast<T, byte>(vectorData);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we should ensure we are allowed to do that and throw in case wrong endian-platform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. Done.

{
internal static class BsonVectorWriter
{
public static byte[] WriteBsonVector<T>(BsonVector<T> bsonVector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The name of the method is a little confusing to me. If it's WriteXXX I expect to pass target into the method, like a stream of buffer where I want it to be written to. In this case, I think FormatXXX and ParseXXX would work better instead of Write and Read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done.

Copy link
Contributor Author

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add support for VectorSearch.
Test cluster setup is WIP.

src/MongoDB.Bson/ObjectModel/VectorDataType.cs Outdated Show resolved Hide resolved
src/MongoDB.Bson/ObjectModel/VectorDataType.cs Outdated Show resolved Hide resolved
{
var itemType = type.GetElementType();
var vectorSerializerType = typeof(BsonVectorArraySerializer<>).MakeGenericType(itemType);
var vectorSerializer = (IBsonSerializer)Activator.CreateInstance(vectorSerializerType, DataType);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible as BsonVectorArraySerializer and the rest are generic classes.

{
internal static class BsonVectorWriter
{
public static byte[] WriteBsonVector<T>(BsonVector<T> bsonVector)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done.

public static byte[] WriteVectorData<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding)
where T : struct
{
var vectorDataBytes = MemoryMarshal.Cast<T, byte>(vectorData);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. Done.

/// </summary>
/// <typeparam name="TItemCollection">The concrete type derived from <see cref="BsonVector{T}"/>.</typeparam>
/// <typeparam name="TItem">The .NET data type.</typeparam>
public sealed class BsonVectorSerializer<TItemCollection, TItem> : BsonVectorSerializerBase<TItemCollection, TItem>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

@BorisDog BorisDog requested a review from sanych-sun January 27, 2025 20:01
/// </summary>
/// <typeparam name="TItemCollection">The collection type.</typeparam>
/// <typeparam name="TItem">The .NET data type.</typeparam>
public abstract class BsonVectorSerializerBase<TItemCollection, TItem> : SerializerBase<TItemCollection>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move each serializer into it's own file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested classes are very simple just a few lines long. I find it more readable to keep all in the same file in this case. But we can discuss further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have often bundled very tightly related classes together in a single file. I'm fine with this.

/// </summary>
/// <typeparam name="TItemCollection">The concrete type derived from <see cref="BsonVector{T}"/>.</typeparam>
/// <typeparam name="TItem">The .NET data type.</typeparam>
public sealed class BsonVectorSerializer<TItemCollection, TItem> : BsonVectorSerializerBase<TItemCollection, TItem>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss if it's safe to have comparison implemented in base class only.

Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. But I have a lot of naming suggestions and a few refactoring suggestions. Let me know what you think.

/// <summary>
/// Represents a BSON vector.
/// </summary>
public abstract class BsonVectorBase<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better name for this class is BsonVector (no Base needed in the name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I've started with, but then I realized that there is some ambiguity with BsonVectorAttribute:
When trying to define the following poco without including using MongoDB.Bson.Serialization.Serializers; namespace, user unclear error without clear suggested resolution.

class POCO
{
     [BsonVector(...)]
     public int[] Vector { get; set; }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think [BsonVector(...)] should be [BsonVectorRepresentation(...)] which would eliminate this conflict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar class is BsonValue which is the abstract base class for multiple types of BSON values.

Here BsonVector is the base class for multiple types of BSON vectors.

Note that we did not name BsonValue as BsonValueBase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that without considering BsonVectorAttribute, BsonVector is more natural name.
But I still think it's preferable to reserve BsonVector for BsonVectorAttribute, as it's expected to appear in most of the use cases, while direct usages of BsonVectorBase are expected to be rare.

/// <summary>
/// Gets the vector.
/// </summary>
public ReadOnlyMemory<T> Vector { get; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec calls this property Data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

/// Represents the data type of BSON Vector.
/// </summary>
/// <seealso cref="BsonVectorBase{T}"/>
public enum BsonVectorDataType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename file to match enum name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.


namespace MongoDB.Bson.Serialization.Serializers
{
internal static class BsonVectorSerializerBase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this static class should be named just BsonVectorSerializer (no Base in the name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return (elements, padding, vectorDataType);
}

public static (ReadOnlyMemory<byte> VectorDataBytes, byte Padding, BsonVectorDataType VectorDataType) ReadBsonVectorAsBytes(ReadOnlyMemory<byte> vectorData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is As redundant in the name? Could be ReadBsonVectorBytes or ReadBsonVectorData.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorDataBytes could be shortened to Bytes if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

var vectorDataSpan = vectorData.Span;
var vectorDataType = (BsonVectorDataType)vectorDataSpan[0];

var paddingSizeBits = vectorDataSpan[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable could be named just padding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return (vectorData.Slice(2), paddingSizeBits, vectorDataType);
}

private static BsonVectorBase<T> CreateBsonVector<T>(T[] elements, byte padding, BsonVectorDataType vectorDataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a static factory method in the BsonVector<T> class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so: I didn't see a need for a such a factory outside this class. Also I think BsonVector itself should be just a data holder, decoupled from its factories.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename T to TItem for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{
internal static class BsonVectorWriter
{
public static byte[] BsonVectorToBytes<T>(BsonVectorBase<T> bsonVector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider naming this method WriteBytes or WriteToBytes.

The "from" is implied by the parameter types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return VectorDataToBytes(bsonVector.Vector.Span, bsonVector.DataType, padding);
}

public static byte[] VectorDataToBytes<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider naming WriteToBytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BorisDog BorisDog requested a review from rstam January 30, 2025 00:31
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I have so many requested naming changes.

But specially where things are entering the public API it is important to get the names right because we can't change them later.

/// <summary>
/// Represents a BSON vector.
/// </summary>
public abstract class BsonVectorBase<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think [BsonVector(...)] should be [BsonVectorRepresentation(...)] which would eliminate this conflict.

/// <summary>
/// Initializes a new instance of the BsonVector class.
/// </summary>
public BsonVectorBase(ReadOnlyMemory<T> data, BsonVectorDataType dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protected internal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think <T> should be <TItem> for consistency and as a hint of what T actually is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// </summary>
/// <typeparam name="TItemCollection">The collection type.</typeparam>
/// <typeparam name="TItem">The .NET data type.</typeparam>
internal abstract class BsonVectorSerializerBase<TItemCollection, TItem> : SerializerBase<TItemCollection>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by "collection" in the name TItemCollection because it implies that the type implements ICollection or at least IEnumerable but that's only true sometimes.

Can we use a more neutral term like "container" and name this TItemContainer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/// <summary>
/// Represents a serializer for BSON vector to/from a given collection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Represents a serializer for TItemContainer values represented as a BsonVector.

The part immediately after for should be what is being serializer, not how it is represented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}

public static void ValidateDataType<T>(BsonVectorDataType bsonVectorDataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename T to TItem for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

_ => throw new ArgumentOutOfRangeException(nameof(bsonVectorDataType), bsonVectorDataType, "Unsupported vector datatype.")
};

if (supportedType != typeof(T))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would read more logically swapping the sides:

if (typeof(TItem) != supportedType)

That's how you would say it in English.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look at it from a different perspective "What is the value of my variable", similar to if variable == constant.
Also typeof(T) is in a way constant here.

{
internal static class BsonVectorWriter
{
public static byte[] WriteToBytes<T>(BsonVectorBase<T> bsonVector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename T to TItem for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return WriteToBytes(bsonVector.Data.Span, bsonVector.DataType, padding);
}

public static byte[] WriteToBytes<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename T to TItem for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// <summary>
/// Represents a BSON vector.
/// </summary>
public abstract class BsonVectorBase<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar class is BsonValue which is the abstract base class for multiple types of BSON values.

Here BsonVector is the base class for multiple types of BSON vectors.

Note that we did not name BsonValue as BsonValueBase.

Copy link
Contributor Author

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review. I think it's much more polished now.

/// <summary>
/// Represents a BSON vector.
/// </summary>
public abstract class BsonVectorBase<T>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that without considering BsonVectorAttribute, BsonVector is more natural name.
But I still think it's preferable to reserve BsonVector for BsonVectorAttribute, as it's expected to appear in most of the use cases, while direct usages of BsonVectorBase are expected to be rare.

/// <summary>
/// Initializes a new instance of the BsonVector class.
/// </summary>
public BsonVectorBase(ReadOnlyMemory<T> data, BsonVectorDataType dataType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// Initializes a new instance of the <see cref="BsonVectorSerializerBase{TItemCollection, TItem}"/> class.
/// </summary>
/// <param name="bsonVectorDataType">Type of the bson vector data.</param>
public BsonVectorSerializerBase(BsonVectorDataType bsonVectorDataType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

context.Writer.WriteBinaryData(binaryData);
}

private static BsonVectorDataType GetDataType() =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find, it's a leftover, thanks!

}

/// <inheritdoc/>
public override sealed void Serialize(BsonSerializationContext context, BsonSerializationArgs args, TItemCollection bsonVector)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// <summary>
/// Initializes a new instance of the <see cref="BsonVectorToCollectionSerializer{TItemCollection, TItem}" /> class.
/// </summary>
public BsonVectorToCollectionSerializer(BsonVectorDataType bsonVectorDataType) :
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/// <inheritdoc/>
public override sealed TItemCollection Deserialize(BsonDeserializationContext context, BsonDeserializationArgs args)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{
internal static class BsonVectorWriter
{
public static byte[] WriteToBytes<T>(BsonVectorBase<T> bsonVector)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return WriteToBytes(bsonVector.Data.Span, bsonVector.DataType, padding);
}

public static byte[] WriteToBytes<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BorisDog BorisDog requested a review from rstam January 31, 2025 01:06
where TItem : struct
{
/// <summary>
/// Initializes a new instance of the BsonVector class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    /// Initializes a new instance of the BsonVectorBase class.

Although I wish the summary as it stood was correct!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

/// <summary>
/// Initializes a new instance of the BsonVector class.
/// </summary>
protected BsonVectorBase(ReadOnlyMemory<TItem> data, BsonVectorDataType dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be internal to prevent users from attempting to create new subclasses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done.

/// </summary>
public BsonVectorPackedBit(ReadOnlyMemory<byte> data, byte padding) : base(data, BsonVectorDataType.PackedBit)
{
if (padding < 0 || padding > 7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rider complains that padding can never be less than 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's time to try Rider, done.

}

/// <summary>
/// Represents a serializer for BSON vector to/from <see cref="Memory{TItem}"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Represents a serializer for <see cref="Memory{TItem}"/> represented as a BsonVector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private protected override ReadOnlySpan<TItem> GetItemsContainerSpan(TItem[] data) =>
data;

private protected override TItem[] CreateResult(TItem[] elements) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    private protected override TItem[] CreateResult(TItem[] items) => items;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private protected override ReadOnlySpan<TItem> GetItemsContainerSpan(Memory<TItem> data) =>
data.Span;

private protected override Memory<TItem> CreateResult(TItem[] elements) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    private protected override Memory<TItem> CreateResult(TItem[] items) =>
        new(items);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

/// <summary>
/// Represents a serializer for <see cref="ReadOnlyMemory{TItem}"/>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Represents a serializer for <see cref="ReadOnlyMemory{TItem}"/> represented as a BsonVector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

data.Span;

private protected override ReadOnlyMemory<TItem> CreateResult(TItem[] elements) =>
new(elements);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    private protected override ReadOnlyMemory<TItem> CreateResult(TItem[] items) =>
        new(items);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BorisDog BorisDog requested a review from rstam February 1, 2025 00:27
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few typos remain.

/// </summary>
/// <param name="data">The vector data.</param>
/// <param name="dataType">Type of the vector data.</param>
internal protected BsonVectorBase(ReadOnlyMemory<TItem> data, BsonVectorDataType dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protected internal was my first thought also, but I looked it up and it's not what I first thought it was.

It should probably just be internal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right it's always confusing. It's needs to be private internal.

/// <summary>
/// Represents the data type of a BSON Vector.
/// </summary>
/// <seealso cref="BsonVectorBase{T}"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// <seealso cref="BsonVectorBase{TItem}"/>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/MongoDB.Bson/ObjectModel/BsonVectorDataType.cs Outdated Show resolved Hide resolved
src/MongoDB.Bson/ObjectModel/BsonVectorDataType.cs Outdated Show resolved Hide resolved
/// </summary>
/// <param name="binaryData">The binary data.</param>
/// <returns>Vector data, padding and datatype</returns>
public static (TItem[] Items, byte Padding, BsonVectorDataType vectorDataType) ToBsonVectorAsArray<TItem>(this BsonBinaryData binaryData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't vectorDataType start with an upper case V?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, done.

return (items, padding, vectorDataType);
}

public static (ReadOnlyMemory<byte> Byte, byte Padding, BsonVectorDataType VectorDataType) ReadBsonVectorAsBytes(ReadOnlyMemory<byte> vectorData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't Byte be `Bytes``? (plural)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

switch (vectorDataType)
{
case BsonVectorDataType.Float32:
return new BsonVectorFloat32(AsTypedArrayOrThrow<float>()) as BsonVectorBase<TItem>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indentation in these three cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/MongoDB.Bson/Serialization/BsonVectorReader.cs Outdated Show resolved Hide resolved
_ => throw new ArgumentException(nameof(bsonVectorDataType), "Unsupported vector datatype.")
};

if (supportedType != typeof(TItem))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in with variables on the left and constants on the right.

But two things are different here:

  1. typeof(TItem) is not a constant, it's an expression involving the type variable TItem
  2. we also like to standardize on if (actualXyz != expectedXyz)

Technically typeof(TItem) will become a constant at runtime but ONLY after the generic method is Jitted for a particular TItem. But for us as programmers TItem is a type variable.

Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new comments related to changes required to allow users to configure serialization in code rather than using attributes.

/// </summary>
/// <typeparam name="TItemContainer">The items container, for example <see cref="BsonVectorBase{TItem}"/> or <see cref="Memory{TItem}"/>.</typeparam>
/// <typeparam name="TItem">The .NET data type.</typeparam>
internal abstract class BsonVectorSerializerBase<TItemContainer, TItem> : SerializerBase<TItemContainer>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class and all of its subclasses need to be public so that users can configure class maps in code.

Some users don't want to (or can't) put attributes on their model classes and either prefer or are forced to configure their serialization separately in code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


[BsonVector(BsonVectorDataType.Float32)]
public float[] ValuesFloat { get; set; }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making the serializers for values represented as vectors public the same configuration can be achieved in code as follows:

        var cm = new BsonClassMap<ArrayHolder>();
        cm.AutoMap();

        cm.MapMember(x => x.ValuesByte).SetSerializer(new ArrayAsBsonVectorSerializer<byte>(BsonVectorDataType.Int8));
        cm.MapMember(x => x.ValuesPackedBit).SetSerializer(new ArrayAsBsonVectorSerializer<byte>(BsonVectorDataType.PackedBit));
        cm.MapMember(x => x.ValuesFloat).SetSerializer(new ArrayAsBsonVectorSerializer<float>(BsonVectorDataType.Float32));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But... we can make the user's life easier by defining the following three subclasses:

    public class ByteArrayAsInt8VectorSerializer() : ArrayAsBsonVectorSerializer<byte>(BsonVectorDataType.Int8);
    public class ByteArrayAsPackedBitVectorSerializer() : ArrayAsBsonVectorSerializer<byte>(BsonVectorDataType.PackedBit);
    public class SingleArrayAsFloat32VectorSerializer() : ArrayAsBsonVectorSerializer<byte>(BsonVectorDataType.Float32);

which allow the user to configure the serializers without using the more complicated constructor call requiring a generic type parameter and a vector data type:

        cm.MapMember(x => x.ValuesByte).SetSerializer(new ByteArrayAsInt8VectorSerializer());
        cm.MapMember(x => x.ValuesPackedBit).SetSerializer(new ByteArrayAsPackedBitVectorSerializer());
        cm.MapMember(x => x.ValuesFloat).SetSerializer(new SingleArrayAsFloat32VectorSerializer());

Besides being more compact, it is less error prone (less likely to use invalid generic type parameters or mis-matched vector data types).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly for the Memory and ReadOnlyMemory serializers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's potentially a good suggestion.
But I'd like to consider it later, hopefully after collecting some user feedback.

BTW we already have some future potential additions to BSON Vector functionality, like BigEndian, other primitive types which are not part of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. This is probably the ONLY thing that can't be configured in code now.

Keep in mind that some users don't want to pollute their model classes with MongoDB attributes. Or in some cases they can't because they don't control the code. For those users configuring serialization in code is important.

Attributes are also worthless if you want to configure the same class to be configured two different ways depending on where they are used. Admittedly a very rare use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have misinterreted "I'd like to consider it later".

I thought you meant make the serializers public later. But looks like you did make them public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just meant considering the shortcuts later.
Maybe achieving the same by making static BsonVectorSerializer public and extending it.

@BorisDog BorisDog requested a review from rstam February 3, 2025 19:15

[BsonVector(BsonVectorDataType.Float32)]
public float[] ValuesFloat { get; set; }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. This is probably the ONLY thing that can't be configured in code now.

Keep in mind that some users don't want to pollute their model classes with MongoDB attributes. Or in some cases they can't because they don't control the code. For those users configuring serialization in code is important.

Attributes are also worthless if you want to configure the same class to be configured two different ways depending on where they are used. Admittedly a very rare use case.

/// <summary>
/// Represents a BSON vector.
/// </summary>
public abstract class BsonVectorBase<TItem>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should really be called BsonVector.

I realize there is a pseudo-conflict with [BsonVector] attribute and that's why you chose this unappealing name.

If we can't rename the attribute, then let's just live with the pseudo-conflict. It's not really a conflict because the attribute class name is actually BsonVectorAttribute.

/// Represents a serializer for <see cref="ReadOnlyMemory{TItem}"/> represented as a BsonVector.
/// </summary>
/// <typeparam name="TItem">The .NET data type.</typeparam>
internal sealed class ReadOnlyMemoryAsBsonVectorSerializer<TItem> : ItemContainerAsBsonVectorSerializer<ReadOnlyMemory<TItem>, TItem>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably meant for this one to be public too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Done.

src/MongoDB.Bson/Serialization/BsonVectorReader.cs Outdated Show resolved Hide resolved

[BsonVector(BsonVectorDataType.Float32)]
public float[] ValuesFloat { get; set; }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have misinterreted "I'd like to consider it later".

I thought you meant make the serializers public later. But looks like you did make them public.

@BorisDog BorisDog requested a review from rstam February 3, 2025 22:44
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with reluctance due to the names BsonVectorBase and BsonValueAttribute.

The implementation is solid though. It's only the names which give me pause.

/// Sets the representation for this field or property as a BSON Vector with the specified data type.
/// </summary>
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field)]
public sealed class BsonVectorAttribute : Attribute, IBsonMemberMapAttribute
Copy link
Member

@sanych-sun sanych-sun Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an alternative idea: what if we do not introduce BsonVectorAttribute at all, but instead instruct users to specify required serializer via BsonSerializerAttribute. The main motivation for this suggestion is the fact that all other attributes we have are really adjusting the mapped serializer, when in this case we really want to use DIFFERENT serializer then default for this Clr data type. And if we want different - then let's say that aloud =).

Let's discuss it offline may be.

Copy link
Contributor Author

@BorisDog BorisDog Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline.
BsonSerializerAttribute could also work, but it will require awkward configuration of the additional parameter (and adding support of serializer parameters).
Also it is not as straight forward as BsonVector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you are consistent in your arguments. You keep saying that:

[BsonVector(BsonVectorDataType.Int8)]
public byte[] A;

is SHORT and STRAIGHTFORWARD and yet you reject even more compact and straightforward suggestions like:

[AsInt8Vector]
public byte[] A;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am :)
IMO having multiple attributes instead of a single simple BsonVector is less straightforward. Try imaging the user experience of having just a single entry point of BsonVector, where the rest of the params are self explanatory, versus the need of traversing our docs each time to find the right attribute for the right type.

Also I don't think the AsSomething naming pattern for attributes is ideal for us. This also collides with the potential idea of having BsonDoubleAttribute and similar in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flip side of your BsonDoubleAttribute argument is that we should be able to use BsonVector instead of BsonVectorBase as the type name for the BSON vector class.

If you plan on having BsonDouble and [BsonDouble] coexist there is no reason that BsonVector and [BsonVector] can't coexist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the AsSomething pattern is a good one and one we should have adopted from the very beginning. If we are looking to change how our attributes are defined in the future this would be a good direction to move in.

[AsDouble]
public int X;

is very simple and straightforward. It reads very well: X is an int that is represented AS a double in the database.

Copy link
Member

@sanych-sun sanych-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@papafe papafe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small note, but otherwise LGTM

/// <summary>
/// Converts <see cref="BsonVectorBase{TItem}"/> to <see cref="BsonBinaryData"/>.
/// </summary>
/// <typeparam name="TItem"></typeparam>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TItem is missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

Copy link
Contributor

@papafe papafe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small note, but otherwise LGTM

@rstam rstam self-requested a review February 5, 2025 21:45
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like all the renaming from BsonVector to BinaryVector was done consistently.

To me the last remaining question is whether the <Type>Attribute naming pattern is viable.

I've started a thread in #csharp-driver-devs to discuss as a team before committing to this public API.

@rstam rstam self-requested a review February 5, 2025 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants