-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Vectorized span-processing helpers #93217
Comments
Tagging subscribers to this area: @dotnet/area-system-numerics-tensors Issue DetailsThis is more of a tracking issue right now than a proper API proposal. In public static void Abs(ReadOnlySpan<float> x, Span<float> destination) =>
InvokeSpanIntoSpan<AbsoluteOperator>(x, destination); and public static void Add(ReadOnlySpan<float> x, ReadOnlySpan<float> y, Span<float> destination) =>
InvokeSpanSpanIntoSpan<AddOperator>(x, y, destination); There are helpers for various shapes of these element-wise operations, as well as helpers for performing various forms of aggregations. For example, public static float ProductOfSums(ReadOnlySpan<float> x, ReadOnlySpan<float> y)
{
...
return Aggregate<AddOperator, MultiplyOperator>(x, y);
} These "Operator"s each implement various interfaces for readonly struct AddMultiplyOperator : ITernaryOperator
{
public static float Invoke(float x, float y, float z) => (x + y) * z;
public static Vector128<float> Invoke(Vector128<float> x, Vector128<float> y, Vector128<float> z) => (x + y) * z;
public static Vector256<float> Invoke(Vector256<float> x, Vector256<float> y, Vector256<float> z) => (x + y) * z;
public static Vector512<float> Invoke(Vector512<float> x, Vector512<float> y, Vector512<float> z) => (x + y) * z;
} We should consider exposing a set of such helpers and interfaces, to enable developers to express efficient vectorized implementations across various shapes without having to write all the boilerplate themselves. With an readonly struct AddMultiplyOperator<T> : ITernaryOperator<T>
{
public static TVector Invoke<TVector>(TVector x, TVector y, TVector z) where TVector : ISimdVector<TVector, T> =>
(x + y) * z;
}
|
It would be very useful to have extended versions of such helpers like InvokeSpanSpanIntoSpan and etc., these methods should allow to effectively work with span of different data types: static void InvokeSpanSpanIntoSpan<TResult, TBinaryOperator, T, TConverter>(ReadOnlySpan<TResult> x, ReadOnlySpan<T> y, Span<TResult> destination)
where TBinaryOperator : struct, IBinaryOperator<TResult>
where TConverter : struct, IConverter<TResult, T> static void InvokeSpanSpanIntoSpan<TResult, TBinaryOperator, T, TConverter>(ReadOnlySpan<T> x, ReadOnlySpan<TResult> y, Span<TResult> destination)
where TBinaryOperator : struct, IBinaryOperator<TResult>
where TConverter : struct, IConverter<TResult, T> and static void InvokeSpanSpanIntoSpan<TResult, TBinaryOperator, TX, TY, TConverterX, TConverterY>(ReadOnlySpan<TX> x, ReadOnlySpan<TY> y, Span<TResult> destination)
where TBinaryOperator : struct, IBinaryOperator<TResult>
where TConverterX : struct, IConverter<TResult, TX>
where TConverterY : struct, IConverter<TResult, TY> where interface IConverter<TResult, T>
where TResult : struct
where T : struct
{
/// <summary>
/// True if conversion is suitable for low-level optimization using SIMD instructions.
/// </summary>
static abstract bool SupportVectorization { get; }
/// <summary>
/// True if vectorized conversion requires widening.
/// </summary>
static abstract bool RequiresWidening { get; }
/// <summary>
/// Performs not-vectorized conversion.
/// </summary>
static abstract TResult Convert(T value);
/// <summary>
/// Performs vectorized conversion.
/// </summary>
static abstract ISimdVector<TResult> Convert(ISimdVector<T> vector);
/// <summary>
/// Performs vectorized conversion, that requires widening.
/// </summary>
static abstract (ISimdVector<<TResult> Lower, ISimdVector<<TResult> Upper) ConvertWithWidening(ISimdVector<T> vector); This functionality is required for the DataFrame generic math: dotnet/machinelearning#6905 |
Why would the |
Sorry, I don't get your idea. Seems we may talk about different things. Let me try to explain the issue that I faced and would like to solve with new TensorPrimitive API. The main idea of my proposal was to reuse existing generic binary operators and helper methods like InvokeSpanSpanIntoSpan for vectorized operation on arguments of different type. As DataFrame allow user to do math operations with every column inside it, different combination of input types should be supported, like:
and etc. I just don't see how this conversion can be defined in TBinaryOperator using generic math, especially for vectors. The only way I see is to have concreate implementations for each operator for all existing type combinations (like AddOperator_IntLong, AddOperator_DoubleFloat and etc) - the resulting amount of operators will be huge. My thought was to have generic operators that operate with arguments, having both the same type, like we have now. And separate convertes for each type. In this case all conversions are defined only once. Converters know type details, like that converting float to double or int to double requires vector widening, vectorized conversion of long or ulong to double can be performed directly and byte to double or byte to long can't be vectorized. For conversition that requires widening InvokeSpanSpanIntoSpan of course should have different implementation as it requires to read pair of vectors for one side for each vector from another side, but this can be hidded from public API. Do I miss something? |
I've been working on an open-source geometry library that relies on generic math principles, aiming to enhance performance using SIMD technology. One specific area I've been focusing on is optimizing the processing of collections of geometric entities. After experimenting with various concept prototypes and consulting with @tannergooding, I delved into exploring A significant difference lies in the fact that my implementation handles all the same operations with fewer special cases, resulting in a more concise interface surface. I've implemented all operators using the following interfaces: public interface IOperator
{
static virtual bool IsVectorizable => true;
}
public interface IUnaryOperator<T, TResult>
: IOperator
where T : struct
where TResult : struct
{
static abstract TResult Invoke(T x);
static abstract Vector<TResult> Invoke(ref readonly Vector<T> x);
}
public interface IBinaryOperator<T1, T2, TResult>
: IOperator
where T1 : struct
where T2 : struct
where TResult : struct
{
static abstract TResult Invoke(T1 x, T2 y);
static abstract Vector<TResult> Invoke(ref readonly Vector<T1> x, ref readonly Vector<T2> y);
}
public interface IBinaryScalarOperator<T1, T2, TResult>
: IOperator
where T1 : struct
where TResult : struct
{
static abstract TResult Invoke(T1 x, T2 y);
static abstract Vector<TResult> Invoke(ref readonly Vector<T1> x, T2 y);
}
public interface ITernaryOperator<T1, T2, T3, TResult>
: IOperator
where T1 : struct
where T2 : struct
where T3 : struct
where TResult : struct
{
static abstract TResult Invoke(T1 x, T2 y, T3 z);
static abstract Vector<TResult> Invoke(ref readonly Vector<T1> x, ref readonly Vector<T2> y, ref readonly Vector<T3> z);
}
public interface IAggregationOperator<T, TResult>
: IBinaryOperator<TResult, T, TResult>
where T : struct
where TResult : struct
{
static virtual TResult Seed => Throw.NotSupportedException<TResult>();
static abstract TResult Invoke(TResult x, TResult y);
} The only operations I haven't implemented yet, which will require another interface, are the It's worth mentioning that each parameter in these interfaces employs a different generic type, enabling broader coverage of cases and potentially greater adaptability in the future. This also solves the issues mentioned above by @asmirnov82. To simplify usage, I've included multiple overloads for just two methods, An exception to this approach is Additionally, I've introduced Finally, I've introduced the aggregation methods Lastly, I've organized my implementation into smaller files for easier navigation. 😅 Comprehensive documentation accompanies my implementation, providing detailed insights into these concepts. Benchmarking data within the documentation suggests that In summary, I pursued this alternative because |
This is more of a tracking issue right now than a proper API proposal.
In
TensorPrimitives
, we now have ~40 routines, most of which are vectorized acrossVector128
,Vector256
,Vector512
, plus a scalar fallback. Rather than duplicate all of the boilerplate across them, we employ several helper functions that are parameterized via generic method parameters constrained to specific interfaces, with a caller supplying the kernel of the processing via a struct that implements the relevant interfaces. For example,TensorPrimitives.Abs
(which computes the absolute value of each input element and stores the result into the corresponding destination slot) is implemented as:and
TensorPrimitives.Add
(which adds each pair of elements and stores the result for each into the corresponding destination slot) is implemented as:There are helpers for various shapes of these element-wise operations, as well as helpers for performing various forms of aggregations. For example,
TensorPrimitives.ProductOfSums
(which sums each element-wise pair from the two inputs and multiplies together all of those sums) is implemented as:These "Operator"s each implement various interfaces for
IUnaryOperator
,IBinaryOperator
, etc. For example, theAddMultipleOperator
(which implementsITernaryOperator
) is defined as:We should consider exposing a set of such helpers and interfaces, to enable developers to express efficient vectorized implementations across various shapes without having to write all the boilerplate themselves. With an
IVector
as proposed in #76244 and as prototyped internally in #90764, and if we exposed aScalar<T>
that also implemented this interface, this could be consolidated down to just a single method and that would also be more future proof for vector types that might be added in the future, e.g.The text was updated successfully, but these errors were encountered: