-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSHARP-5202: BSON Binary Vector Subtype Support #1581
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it makes sense, I just got some notes/questions
src/MongoDB.Bson/Serialization/Serializers/BsonVectorSerializer.cs
Outdated
Show resolved
Hide resolved
src/MongoDB.Bson/Serialization/Serializers/BsonVectorSerializer.cs
Outdated
Show resolved
Hide resolved
{ | ||
var itemType = type.GetElementType(); | ||
var vectorSerializerType = typeof(BsonVectorArraySerializer<>).MakeGenericType(itemType); | ||
var vectorSerializer = (IBsonSerializer)Activator.CreateInstance(vectorSerializerType, DataType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to move this into BsonVectorArraySerializer.Create(Type itemType)
static factory method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and more similar below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not possible as BsonVectorArraySerializer
and the rest are generic classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we can create non-generic static class just to have this factory methods. We are doing it in other places. (far example, NullableSerializer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/// </summary> | ||
/// <typeparam name="TItemCollection">The concrete type derived from <see cref="BsonVector{T}"/>.</typeparam> | ||
/// <typeparam name="TItem">The .NET data type.</typeparam> | ||
public sealed class BsonVectorSerializer<TItemCollection, TItem> : BsonVectorSerializerBase<TItemCollection, TItem> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we override Equals for the Serializer to make proper comparison of serializers? It might be important if Vestors could be used in LINQ statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss if it's safe to have comparison implemented in base class only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't override Equals
if no private data was introduced.
public static byte[] WriteVectorData<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding) | ||
where T : struct | ||
{ | ||
var vectorDataBytes = MemoryMarshal.Cast<T, byte>(vectorData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we should ensure we are allowed to do that and throw in case wrong endian-platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. Done.
{ | ||
internal static class BsonVectorWriter | ||
{ | ||
public static byte[] WriteBsonVector<T>(BsonVector<T> bsonVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: The name of the method is a little confusing to me. If it's WriteXXX I expect to pass target into the method, like a stream of buffer where I want it to be written to. In this case, I think FormatXXX and ParseXXX would work better instead of Write and Read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add support for VectorSearch.
Test cluster setup is WIP.
{ | ||
var itemType = type.GetElementType(); | ||
var vectorSerializerType = typeof(BsonVectorArraySerializer<>).MakeGenericType(itemType); | ||
var vectorSerializer = (IBsonSerializer)Activator.CreateInstance(vectorSerializerType, DataType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not possible as BsonVectorArraySerializer
and the rest are generic classes.
{ | ||
internal static class BsonVectorWriter | ||
{ | ||
public static byte[] WriteBsonVector<T>(BsonVector<T> bsonVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done.
public static byte[] WriteVectorData<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding) | ||
where T : struct | ||
{ | ||
var vectorDataBytes = MemoryMarshal.Cast<T, byte>(vectorData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. Done.
/// </summary> | ||
/// <typeparam name="TItemCollection">The concrete type derived from <see cref="BsonVector{T}"/>.</typeparam> | ||
/// <typeparam name="TItem">The .NET data type.</typeparam> | ||
public sealed class BsonVectorSerializer<TItemCollection, TItem> : BsonVectorSerializerBase<TItemCollection, TItem> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done.
/// </summary> | ||
/// <typeparam name="TItemCollection">The collection type.</typeparam> | ||
/// <typeparam name="TItem">The .NET data type.</typeparam> | ||
public abstract class BsonVectorSerializerBase<TItemCollection, TItem> : SerializerBase<TItemCollection> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move each serializer into it's own file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested classes are very simple just a few lines long. I find it more readable to keep all in the same file in this case. But we can discuss further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have often bundled very tightly related classes together in a single file. I'm fine with this.
/// </summary> | ||
/// <typeparam name="TItemCollection">The concrete type derived from <see cref="BsonVector{T}"/>.</typeparam> | ||
/// <typeparam name="TItem">The .NET data type.</typeparam> | ||
public sealed class BsonVectorSerializer<TItemCollection, TItem> : BsonVectorSerializerBase<TItemCollection, TItem> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss if it's safe to have comparison implemented in base class only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. But I have a lot of naming suggestions and a few refactoring suggestions. Let me know what you think.
/// <summary> | ||
/// Represents a BSON vector. | ||
/// </summary> | ||
public abstract class BsonVectorBase<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better name for this class is BsonVector
(no Base
needed in the name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's what I've started with, but then I realized that there is some ambiguity with BsonVectorAttribute
:
When trying to define the following poco without including using MongoDB.Bson.Serialization.Serializers; namespace, user unclear error without clear suggested resolution.
class POCO
{
[BsonVector(...)]
public int[] Vector { get; set; }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think [BsonVector(...)]
should be [BsonVectorRepresentation(...)]
which would eliminate this conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar class is BsonValue
which is the abstract base class for multiple types of BSON values.
Here BsonVector
is the base class for multiple types of BSON vectors.
Note that we did not name BsonValue
as BsonValueBase
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that without considering BsonVectorAttribute
, BsonVector
is more natural name.
But I still think it's preferable to reserve BsonVector
for BsonVectorAttribute
, as it's expected to appear in most of the use cases, while direct usages of BsonVectorBase
are expected to be rare.
/// <summary> | ||
/// Gets the vector. | ||
/// </summary> | ||
public ReadOnlyMemory<T> Vector { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec calls this property Data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
/// Represents the data type of BSON Vector. | ||
/// </summary> | ||
/// <seealso cref="BsonVectorBase{T}"/> | ||
public enum BsonVectorDataType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename file to match enum name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
src/MongoDB.Bson/Serialization/Attributes/BsonVectorAttribute.cs
Outdated
Show resolved
Hide resolved
|
||
namespace MongoDB.Bson.Serialization.Serializers | ||
{ | ||
internal static class BsonVectorSerializerBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this static class should be named just BsonVectorSerializer
(no Base
in the name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return (elements, padding, vectorDataType); | ||
} | ||
|
||
public static (ReadOnlyMemory<byte> VectorDataBytes, byte Padding, BsonVectorDataType VectorDataType) ReadBsonVectorAsBytes(ReadOnlyMemory<byte> vectorData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is As
redundant in the name? Could be ReadBsonVectorBytes
or ReadBsonVectorData
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VectorDataBytes
could be shortened to Bytes
if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
var vectorDataSpan = vectorData.Span; | ||
var vectorDataType = (BsonVectorDataType)vectorDataSpan[0]; | ||
|
||
var paddingSizeBits = vectorDataSpan[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable could be named just padding
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return (vectorData.Slice(2), paddingSizeBits, vectorDataType); | ||
} | ||
|
||
private static BsonVectorBase<T> CreateBsonVector<T>(T[] elements, byte padding, BsonVectorDataType vectorDataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a static factory method in the BsonVector<T>
class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so: I didn't see a need for a such a factory outside this class. Also I think BsonVector itself should be just a data holder, decoupled from its factories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename T
to TItem
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
{ | ||
internal static class BsonVectorWriter | ||
{ | ||
public static byte[] BsonVectorToBytes<T>(BsonVectorBase<T> bsonVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider naming this method WriteBytes
or WriteToBytes
.
The "from" is implied by the parameter types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return VectorDataToBytes(bsonVector.Vector.Span, bsonVector.DataType, padding); | ||
} | ||
|
||
public static byte[] VectorDataToBytes<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider naming WriteToBytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I have so many requested naming changes.
But specially where things are entering the public API it is important to get the names right because we can't change them later.
/// <summary> | ||
/// Represents a BSON vector. | ||
/// </summary> | ||
public abstract class BsonVectorBase<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think [BsonVector(...)]
should be [BsonVectorRepresentation(...)]
which would eliminate this conflict.
/// <summary> | ||
/// Initializes a new instance of the BsonVector class. | ||
/// </summary> | ||
public BsonVectorBase(ReadOnlyMemory<T> data, BsonVectorDataType dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected internal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think <T>
should be <TItem>
for consistency and as a hint of what T
actually is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/MongoDB.Bson/Serialization/Attributes/BsonVectorAttribute.cs
Outdated
Show resolved
Hide resolved
/// </summary> | ||
/// <typeparam name="TItemCollection">The collection type.</typeparam> | ||
/// <typeparam name="TItem">The .NET data type.</typeparam> | ||
internal abstract class BsonVectorSerializerBase<TItemCollection, TItem> : SerializerBase<TItemCollection> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by "collection" in the name TItemCollection
because it implies that the type implements ICollection
or at least IEnumerable
but that's only true sometimes.
Can we use a more neutral term like "container" and name this TItemContainer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
/// <summary> | ||
/// Represents a serializer for BSON vector to/from a given collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Represents a serializer for TItemContainer values represented as a BsonVector.
The part immediately after for should be what is being serializer, not how it is represented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
} | ||
|
||
public static void ValidateDataType<T>(BsonVectorDataType bsonVectorDataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename T
to TItem
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
_ => throw new ArgumentOutOfRangeException(nameof(bsonVectorDataType), bsonVectorDataType, "Unsupported vector datatype.") | ||
}; | ||
|
||
if (supportedType != typeof(T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would read more logically swapping the sides:
if (typeof(TItem) != supportedType)
That's how you would say it in English.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I look at it from a different perspective "What is the value of my variable", similar to if variable == constant
.
Also typeof(T)
is in a way constant here.
{ | ||
internal static class BsonVectorWriter | ||
{ | ||
public static byte[] WriteToBytes<T>(BsonVectorBase<T> bsonVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename T
to TItem
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return WriteToBytes(bsonVector.Data.Span, bsonVector.DataType, padding); | ||
} | ||
|
||
public static byte[] WriteToBytes<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename T
to TItem
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/// <summary> | ||
/// Represents a BSON vector. | ||
/// </summary> | ||
public abstract class BsonVectorBase<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar class is BsonValue
which is the abstract base class for multiple types of BSON values.
Here BsonVector
is the base class for multiple types of BSON vectors.
Note that we did not name BsonValue
as BsonValueBase
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough review. I think it's much more polished now.
/// <summary> | ||
/// Represents a BSON vector. | ||
/// </summary> | ||
public abstract class BsonVectorBase<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that without considering BsonVectorAttribute
, BsonVector
is more natural name.
But I still think it's preferable to reserve BsonVector
for BsonVectorAttribute
, as it's expected to appear in most of the use cases, while direct usages of BsonVectorBase
are expected to be rare.
/// <summary> | ||
/// Initializes a new instance of the BsonVector class. | ||
/// </summary> | ||
public BsonVectorBase(ReadOnlyMemory<T> data, BsonVectorDataType dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/MongoDB.Bson/Serialization/Attributes/BsonVectorAttribute.cs
Outdated
Show resolved
Hide resolved
/// Initializes a new instance of the <see cref="BsonVectorSerializerBase{TItemCollection, TItem}"/> class. | ||
/// </summary> | ||
/// <param name="bsonVectorDataType">Type of the bson vector data.</param> | ||
public BsonVectorSerializerBase(BsonVectorDataType bsonVectorDataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
context.Writer.WriteBinaryData(binaryData); | ||
} | ||
|
||
private static BsonVectorDataType GetDataType() => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find, it's a leftover, thanks!
} | ||
|
||
/// <inheritdoc/> | ||
public override sealed void Serialize(BsonSerializationContext context, BsonSerializationArgs args, TItemCollection bsonVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/// <summary> | ||
/// Initializes a new instance of the <see cref="BsonVectorToCollectionSerializer{TItemCollection, TItem}" /> class. | ||
/// </summary> | ||
public BsonVectorToCollectionSerializer(BsonVectorDataType bsonVectorDataType) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
/// <inheritdoc/> | ||
public override sealed TItemCollection Deserialize(BsonDeserializationContext context, BsonDeserializationArgs args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
{ | ||
internal static class BsonVectorWriter | ||
{ | ||
public static byte[] WriteToBytes<T>(BsonVectorBase<T> bsonVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return WriteToBytes(bsonVector.Data.Span, bsonVector.DataType, padding); | ||
} | ||
|
||
public static byte[] WriteToBytes<T>(ReadOnlySpan<T> vectorData, BsonVectorDataType bsonVectorDataType, byte padding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/MongoDB.Bson/Serialization/Attributes/BsonVectorAttribute.cs
Outdated
Show resolved
Hide resolved
src/MongoDB.Bson/Serialization/Attributes/BsonVectorAttribute.cs
Outdated
Show resolved
Hide resolved
where TItem : struct | ||
{ | ||
/// <summary> | ||
/// Initializes a new instance of the BsonVector class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Initializes a new instance of the BsonVectorBase class.
Although I wish the summary as it stood was correct!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
/// <summary> | ||
/// Initializes a new instance of the BsonVector class. | ||
/// </summary> | ||
protected BsonVectorBase(ReadOnlyMemory<TItem> data, BsonVectorDataType dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be internal
to prevent users from attempting to create new subclasses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done.
/// </summary> | ||
public BsonVectorPackedBit(ReadOnlyMemory<byte> data, byte padding) : base(data, BsonVectorDataType.PackedBit) | ||
{ | ||
if (padding < 0 || padding > 7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rider complains that padding can never be less than 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's time to try Rider, done.
} | ||
|
||
/// <summary> | ||
/// Represents a serializer for BSON vector to/from <see cref="Memory{TItem}"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Represents a serializer for <see cref="Memory{TItem}"/> represented as a BsonVector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
private protected override ReadOnlySpan<TItem> GetItemsContainerSpan(TItem[] data) => | ||
data; | ||
|
||
private protected override TItem[] CreateResult(TItem[] elements) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private protected override TItem[] CreateResult(TItem[] items) => items;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
private protected override ReadOnlySpan<TItem> GetItemsContainerSpan(Memory<TItem> data) => | ||
data.Span; | ||
|
||
private protected override Memory<TItem> CreateResult(TItem[] elements) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private protected override Memory<TItem> CreateResult(TItem[] items) =>
new(items);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
/// <summary> | ||
/// Represents a serializer for <see cref="ReadOnlyMemory{TItem}"/>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Represents a serializer for <see cref="ReadOnlyMemory{TItem}"/> represented as a BsonVector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
data.Span; | ||
|
||
private protected override ReadOnlyMemory<TItem> CreateResult(TItem[] elements) => | ||
new(elements); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private protected override ReadOnlyMemory<TItem> CreateResult(TItem[] items) =>
new(items);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few typos remain.
/// </summary> | ||
/// <param name="data">The vector data.</param> | ||
/// <param name="dataType">Type of the vector data.</param> | ||
internal protected BsonVectorBase(ReadOnlyMemory<TItem> data, BsonVectorDataType dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected internal
was my first thought also, but I looked it up and it's not what I first thought it was.
It should probably just be internal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right it's always confusing. It's needs to be private internal
.
/// <summary> | ||
/// Represents the data type of a BSON Vector. | ||
/// </summary> | ||
/// <seealso cref="BsonVectorBase{T}"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <seealso cref="BsonVectorBase{TItem}"/>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/MongoDB.Bson/Serialization/Attributes/BsonVectorAttribute.cs
Outdated
Show resolved
Hide resolved
/// </summary> | ||
/// <param name="binaryData">The binary data.</param> | ||
/// <returns>Vector data, padding and datatype</returns> | ||
public static (TItem[] Items, byte Padding, BsonVectorDataType vectorDataType) ToBsonVectorAsArray<TItem>(this BsonBinaryData binaryData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't vectorDataType
start with an upper case V
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should, done.
return (items, padding, vectorDataType); | ||
} | ||
|
||
public static (ReadOnlyMemory<byte> Byte, byte Padding, BsonVectorDataType VectorDataType) ReadBsonVectorAsBytes(ReadOnlyMemory<byte> vectorData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't Byte
be `Bytes``? (plural)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
switch (vectorDataType) | ||
{ | ||
case BsonVectorDataType.Float32: | ||
return new BsonVectorFloat32(AsTypedArrayOrThrow<float>()) as BsonVectorBase<TItem>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix indentation in these three cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
_ => throw new ArgumentException(nameof(bsonVectorDataType), "Unsupported vector datatype.") | ||
}; | ||
|
||
if (supportedType != typeof(TItem)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in with variables on the left and constants on the right.
But two things are different here:
- typeof(TItem) is not a constant, it's an expression involving the type variable
TItem
- we also like to standardize on
if (actualXyz != expectedXyz)
Technically typeof(TItem)
will become a constant at runtime but ONLY after the generic method is Jitted for a particular TItem
. But for us as programmers TItem
is a type variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added new comments related to changes required to allow users to configure serialization in code rather than using attributes.
/// </summary> | ||
/// <typeparam name="TItemContainer">The items container, for example <see cref="BsonVectorBase{TItem}"/> or <see cref="Memory{TItem}"/>.</typeparam> | ||
/// <typeparam name="TItem">The .NET data type.</typeparam> | ||
internal abstract class BsonVectorSerializerBase<TItemContainer, TItem> : SerializerBase<TItemContainer> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class and all of its subclasses need to be public
so that users can configure class maps in code.
Some users don't want to (or can't) put attributes on their model classes and either prefer or are forced to configure their serialization separately in code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
[BsonVector(BsonVectorDataType.Float32)] | ||
public float[] ValuesFloat { get; set; } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After making the serializers for values represented as vectors public the same configuration can be achieved in code as follows:
var cm = new BsonClassMap<ArrayHolder>();
cm.AutoMap();
cm.MapMember(x => x.ValuesByte).SetSerializer(new ArrayAsBsonVectorSerializer<byte>(BsonVectorDataType.Int8));
cm.MapMember(x => x.ValuesPackedBit).SetSerializer(new ArrayAsBsonVectorSerializer<byte>(BsonVectorDataType.PackedBit));
cm.MapMember(x => x.ValuesFloat).SetSerializer(new ArrayAsBsonVectorSerializer<float>(BsonVectorDataType.Float32));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But... we can make the user's life easier by defining the following three subclasses:
public class ByteArrayAsInt8VectorSerializer() : ArrayAsBsonVectorSerializer<byte>(BsonVectorDataType.Int8);
public class ByteArrayAsPackedBitVectorSerializer() : ArrayAsBsonVectorSerializer<byte>(BsonVectorDataType.PackedBit);
public class SingleArrayAsFloat32VectorSerializer() : ArrayAsBsonVectorSerializer<byte>(BsonVectorDataType.Float32);
which allow the user to configure the serializers without using the more complicated constructor call requiring a generic type parameter and a vector data type:
cm.MapMember(x => x.ValuesByte).SetSerializer(new ByteArrayAsInt8VectorSerializer());
cm.MapMember(x => x.ValuesPackedBit).SetSerializer(new ByteArrayAsPackedBitVectorSerializer());
cm.MapMember(x => x.ValuesFloat).SetSerializer(new SingleArrayAsFloat32VectorSerializer());
Besides being more compact, it is less error prone (less likely to use invalid generic type parameters or mis-matched vector data types).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly for the Memory
and ReadOnlyMemory
serializers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's potentially a good suggestion.
But I'd like to consider it later, hopefully after collecting some user feedback.
BTW we already have some future potential additions to BSON Vector functionality, like BigEndian, other primitive types which are not part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. This is probably the ONLY thing that can't be configured in code now.
Keep in mind that some users don't want to pollute their model classes with MongoDB attributes. Or in some cases they can't because they don't control the code. For those users configuring serialization in code is important.
Attributes are also worthless if you want to configure the same class to be configured two different ways depending on where they are used. Admittedly a very rare use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have misinterreted "I'd like to consider it later".
I thought you meant make the serializers public later. But looks like you did make them public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just meant considering the shortcuts later.
Maybe achieving the same by making static BsonVectorSerializer
public and extending it.
|
||
[BsonVector(BsonVectorDataType.Float32)] | ||
public float[] ValuesFloat { get; set; } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. This is probably the ONLY thing that can't be configured in code now.
Keep in mind that some users don't want to pollute their model classes with MongoDB attributes. Or in some cases they can't because they don't control the code. For those users configuring serialization in code is important.
Attributes are also worthless if you want to configure the same class to be configured two different ways depending on where they are used. Admittedly a very rare use case.
/// <summary> | ||
/// Represents a BSON vector. | ||
/// </summary> | ||
public abstract class BsonVectorBase<TItem> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should really be called BsonVector
.
I realize there is a pseudo-conflict with [BsonVector]
attribute and that's why you chose this unappealing name.
If we can't rename the attribute, then let's just live with the pseudo-conflict. It's not really a conflict because the attribute class name is actually BsonVectorAttribute
.
/// Represents a serializer for <see cref="ReadOnlyMemory{TItem}"/> represented as a BsonVector. | ||
/// </summary> | ||
/// <typeparam name="TItem">The .NET data type.</typeparam> | ||
internal sealed class ReadOnlyMemoryAsBsonVectorSerializer<TItem> : ItemContainerAsBsonVectorSerializer<ReadOnlyMemory<TItem>, TItem> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably meant for this one to be public too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Done.
src/MongoDB.Bson/Serialization/Attributes/BsonVectorAttribute.cs
Outdated
Show resolved
Hide resolved
|
||
[BsonVector(BsonVectorDataType.Float32)] | ||
public float[] ValuesFloat { get; set; } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have misinterreted "I'd like to consider it later".
I thought you meant make the serializers public later. But looks like you did make them public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with reluctance due to the names BsonVectorBase
and BsonValueAttribute
.
The implementation is solid though. It's only the names which give me pause.
/// Sets the representation for this field or property as a BSON Vector with the specified data type. | ||
/// </summary> | ||
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field)] | ||
public sealed class BsonVectorAttribute : Attribute, IBsonMemberMapAttribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an alternative idea: what if we do not introduce BsonVectorAttribute at all, but instead instruct users to specify required serializer via BsonSerializerAttribute. The main motivation for this suggestion is the fact that all other attributes we have are really adjusting the mapped serializer, when in this case we really want to use DIFFERENT serializer then default for this Clr data type. And if we want different - then let's say that aloud =).
Let's discuss it offline may be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline.
BsonSerializerAttribute
could also work, but it will require awkward configuration of the additional parameter (and adding support of serializer parameters).
Also it is not as straight forward as BsonVector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you are consistent in your arguments. You keep saying that:
[BsonVector(BsonVectorDataType.Int8)]
public byte[] A;
is SHORT and STRAIGHTFORWARD and yet you reject even more compact and straightforward suggestions like:
[AsInt8Vector]
public byte[] A;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am :)
IMO having multiple attributes instead of a single simple BsonVector
is less straightforward. Try imaging the user experience of having just a single entry point of BsonVector
, where the rest of the params are self explanatory, versus the need of traversing our docs each time to find the right attribute for the right type.
Also I don't think the AsSomething
naming pattern for attributes is ideal for us. This also collides with the potential idea of having BsonDoubleAttribute
and similar in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flip side of your BsonDoubleAttribute
argument is that we should be able to use BsonVector
instead of BsonVectorBase
as the type name for the BSON vector class.
If you plan on having BsonDouble
and [BsonDouble]
coexist there is no reason that BsonVector
and [BsonVector]
can't coexist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the AsSomething
pattern is a good one and one we should have adopted from the very beginning. If we are looking to change how our attributes are defined in the future this would be a good direction to move in.
[AsDouble]
public int X;
is very simple and straightforward. It reads very well: X is an int that is represented AS a double in the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small note, but otherwise LGTM
/// <summary> | ||
/// Converts <see cref="BsonVectorBase{TItem}"/> to <see cref="BsonBinaryData"/>. | ||
/// </summary> | ||
/// <typeparam name="TItem"></typeparam> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TItem is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small note, but otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like all the renaming from BsonVector
to BinaryVector
was done consistently.
To me the last remaining question is whether the <Type>Attribute
naming pattern is viable.
I've started a thread in #csharp-driver-devs to discuss as a team before committing to this public API.
No description provided.