-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of BigInteger.Multiply(large, small) #92208
Conversation
Tagging subscribers to this area: @dotnet/area-system-numerics Issue DetailsBigInteger.Multiply is based on Karatsuba algorithm. If implemented correctly, the computational complexity of multiply is However, in the current implementation, it is not. This is because it the half of the smaller value is used when the larger one should be. runtime/src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.SquMul.cs Line 214 in 353d5ea
In this PR, the larger one is used. The reason for using ceiling value is to ensure that Benchmark
public class Benchmark
{
[Params(100000, 500000, 1000000)]
public int LargeLength { get; set; }
[Params(1000, 10000, 100000)]
public int SmallLength { get; set; }
byte[] bytes1, bytes2;
[GlobalSetup]
public void Setup()
{
var random = new Random(918);
bytes1 = new byte[LargeLength];
bytes2 = new byte[SmallLength];
random.NextBytes(bytes1);
random.NextBytes(bytes2);
}
[Benchmark]
public PrBigInteger PR_Multiply()
{
return (new PrBigInteger(bytes1) * new PrBigInteger(bytes2));
}
[Benchmark]
public BigInteger Multiply()
{
return (new BigInteger(bytes1) * new BigInteger(bytes2));
}
}
|
396fa2d
to
20c5d77
Compare
87bc8fc
to
68e9567
Compare
dc4f0b6
to
bc1aa19
Compare
Could you add the benchmark to https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/libraries/System.Runtime.Numerics/Perf.BigInteger.cs (or ensure the existing Changes in general LGTM, just want to ensure we have some perf numbers before we merge so it can be correctly tracked in our historical data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes LGTM.
I've used the benchmarks provided by @kzrnm in dotnet/performance#3361 and run them on my PC. For large inputs, where right is half of the left size the gains are up to 60%. For other test cases the difference is within the range of error.
BenchmarkDotNet v0.13.10-nightly.20231019.90, Windows 11 (10.0.22621.2428/22H2/2022Update/SunValley2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-alpha.1.23531.2
[Host] : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT AVX2
PR : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
main : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method | Job | arguments | Mean | Ratio |
---|---|---|---|---|
Multiply | PR | 1024,1024 bits | 879.885 ns | 1.01 |
Multiply | main | 1024,1024 bits | 868.360 ns | 1.00 |
Multiply | PR | 1024,512 bits | 496.243 ns | 1.02 |
Multiply | main | 1024,512 bits | 484.839 ns | 1.00 |
Multiply | PR | 16,16 bits | 8.754 ns | 0.99 |
Multiply | main | 16,16 bits | 9.107 ns | 1.00 |
Multiply | PR | 16,8 bits | 8.457 ns | 0.88 |
Multiply | main | 16,8 bits | 9.658 ns | 1.00 |
Multiply | PR | 65536,32768 bits | 517,364.993 ns | 0.38 |
Multiply | main | 65536,32768 bits | 1,364,740.916 ns | 1.00 |
Multiply | PR | 65536,65536 bits | 776,079.607 ns | 1.01 |
Multiply | main | 65536,65536 bits | 771,191.496 ns | 1.00 |
Thank you for your contribution @kzrnm !
ulong carry = 0UL; | ||
for (int j = 0; j < left.Length; j++) | ||
{ | ||
ref uint elementPtr = ref Unsafe.Add(ref resultPtr, i + j); | ||
ulong digits = elementPtr + carry + (ulong)left[j] * right[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we could most likely get minor improvement by hoisting the result of right[i]
(however I am not 100% sure that JIT does not perform this optimization already)
ulong carry = 0UL; | |
for (int j = 0; j < left.Length; j++) | |
{ | |
ref uint elementPtr = ref Unsafe.Add(ref resultPtr, i + j); | |
ulong digits = elementPtr + carry + (ulong)left[j] * right[i]; | |
ulong carry = 0UL; | |
uint right_i = right[i]; | |
for (int j = 0; j < left.Length; j++) | |
{ | |
ref uint elementPtr = ref Unsafe.Add(ref resultPtr, i + j); | |
ulong digits = elementPtr + carry + (ulong)left[j] * right_i; |
upperRight.Clear(); | ||
|
||
Multiply(left, rightHigh, upperRight); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiply does not use upperRight
as an input and it's going to overwrite all values starting from 0 to left.Length
:
runtime/src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.SquMul.cs
Lines 143 to 149 in e2ce987
for ( ; i < left.Length; i++) | |
{ | |
ulong digits = (ulong)left[i] * right + carry; | |
bits[i] = unchecked((uint)digits); | |
carry = digits >> 32; | |
} | |
bits[i] = (uint)carry; |
So we can reduce the clear to only last element (this span has left.Length + 1
elements)
upperRight.Clear(); | |
Multiply(left, rightHigh, upperRight); | |
// Multiply has set 0..left.Length elements, the size is left.Length+1 | |
// We need to zero the last element to make sure it does not contain any garbage. | |
Multiply(left, rightHigh, upperRight); | |
upperRight[^1] = 0; |
BigInteger.Multiply is based on Karatsuba algorithm. If implemented correctly, the computational complexity of multiply is$\Theta(n^{\log_2 3})$ where n is number of digits.
However, in the current implementation, it is not. This is because it the half of the smaller value is used when the larger one should be.
runtime/src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.SquMul.cs
Line 214 in 353d5ea
Benchmark