Vectorize Convert.ToBase64CharArray and TryToBase64Chars #73320

stephentoub · 2022-08-03T19:08:41Z

#71795 vectorized Convert.ToBase64String for larger inputs by using Base64.EncodeToUTF8 and then encoding the result UTF8 bytes into a UTF16 string. It did not touch Convert.ToBase64CharArray nor Convert.TryToBase64Chars, however. The ToBase64String change makes use of a temporary array rented from the array pool, and the expectation is it'll rarely allocate, but if it does, it's part of a method that's already allocating the resulting string and so it's presumed to not be too impactful. ToBase64CharArray and TryToBase64Chars, however, are intended to be entirely non-allocating, and so even renting from the array pool could be problematic if it fails to find a buffer in the pool.

This PR changes the non-allocating variants to use Base64.EncodeToUtf8 as well. But instead of renting a temporary buffer, it banks on the knowledge that the encoded Base64 bytes are 1/2 the length of the resulting chars, since the bytes are all guaranteed to be ASCII. Thus, it can treat the destination char buffer as scratch space for the encoded UTF8 bytes, and then widen in-place. This obviates the need for a separate temporary buffer, making it appropriate for the non-allocating versions. And once we had the helper for those, we can use that same helper to replace the code added to ToBase64String, making it non-allocating as well (beyond of course the result string it has to allocate by its nature), and thus making it more predictable.

Overall, this fixes the possible additional allocation in ToBase64String as well as the performance inversion that the allocating ToBase64String could have been significantly faster (due to vectorization) than the ToBase64CharArray and ToBase64Chars methods intended to be the faster versions.

[Params(16, 64, 256, 1024)]
public int Length { get; set; }

private byte[] _data;
private char[] _scratch;

[GlobalSetup]
public void Setup()
{
    _data = new byte[Length];
    _scratch = new char[Length * 4];
    var r = new Random(42);
    r.NextBytes(_data);
}

[Benchmark]
public string ToBase64String() => Convert.ToBase64String(_data);

[Benchmark]
public void ToBase64CharArray() => Convert.ToBase64CharArray(_data, 0, _data.Length, _scratch, 0);

[Benchmark]
public void ToBase64Chars() => Convert.TryToBase64Chars(_data, _scratch, out _);

Method	Toolchain	Length	Mean	Ratio	Allocated
ToBase64String	\main\corerun.exe	16	34.86 ns	1.00	72 B
ToBase64String	\pr\corerun.exe	16	34.82 ns	1.00	72 B

ToBase64CharArray	\main\corerun.exe	16	26.08 ns	1.00	-
ToBase64CharArray	\pr\corerun.exe	16	27.12 ns	1.04	-

ToBase64Chars	\main\corerun.exe	16	25.67 ns	1.00	-
ToBase64Chars	\pr\corerun.exe	16	26.55 ns	1.03	-

ToBase64String	\main\corerun.exe	64	50.12 ns	1.00	200 B
ToBase64String	\pr\corerun.exe	64	49.22 ns	0.98	200 B

ToBase64CharArray	\main\corerun.exe	64	79.72 ns	1.00	-
ToBase64CharArray	\pr\corerun.exe	64	31.39 ns	0.39	-

ToBase64Chars	\main\corerun.exe	64	78.80 ns	1.00	-
ToBase64Chars	\pr\corerun.exe	64	31.49 ns	0.40	-

ToBase64String	\main\corerun.exe	256	137.63 ns	1.00	712 B
ToBase64String	\pr\corerun.exe	256	108.71 ns	0.79	712 B

ToBase64CharArray	\main\corerun.exe	256	300.65 ns	1.00	-
ToBase64CharArray	\pr\corerun.exe	256	47.43 ns	0.16	-

ToBase64Chars	\main\corerun.exe	256	299.34 ns	1.00	-
ToBase64Chars	\pr\corerun.exe	256	46.80 ns	0.16	-

ToBase64String	\main\corerun.exe	1024	392.78 ns	1.00	2760 B
ToBase64String	\pr\corerun.exe	1024	346.42 ns	0.88	2760 B

ToBase64CharArray	\main\corerun.exe	1024	1,174.50 ns	1.00	-
ToBase64CharArray	\pr\corerun.exe	1024	116.84 ns	0.10	-

ToBase64Chars	\main\corerun.exe	1024	1,162.84 ns	1.00	-
ToBase64Chars	\pr\corerun.exe	1024	116.44 ns	0.10	-

A previous PR vectorized Convert.ToBase64String for larger inputs by using Base64.EncodeToUTF8 and then encoding the result UTF8 bytes into a UTF16 string. It did not touch Convert.ToBase64CharArray nor Convert.TryToBase64Chars, however. The ToBase64String change makes use of a temporary array rented from the array pool, and the expectation is it'll rarely allocate, but if it does, it's part of a method that's already allocating the resulting string and so it's presumed to not be too impactful. ToBase64CharArray and TryToBase64Chars, however, are intended to be entirely non-allocating, and so even renting from the array pool would be problematic. This PR changes the non-allocating variants to use Base64.EncodeToUtf8 as well. But instead of renting a temporary buffer, it banks on the knowledge that the encoded Base64 bytes are 1/2 the length of the resulting chars, since the bytes are all guaranteed to be ASCII. Thus, it can treat the destination char buffer as scratch space for the encoded UTF8 bytes, and then widen in-place. This obviates the need for a separate temporary buffer, making it appropriate for the non-allocating versions. And once we had the helper for those, we can use that same helper to replace the code added to ToBase64String, making it non-allocating as well (beyond of course the result string it has to allocate by its nature), and thus making it more predictable. Overall, this fixes the possible additional allocation in ToBase64String as well as the performance inversion that the allocating ToBase64String could have been significantly faster (due to vectorization) than the ToBase64CharArray and ToBase64Chars methods intended to be the faster versions.

ghost · 2022-08-03T19:09:13Z

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

#71795 vectorized Convert.ToBase64String for larger inputs by using Base64.EncodeToUTF8 and then encoding the result UTF8 bytes into a UTF16 string. It did not touch Convert.ToBase64CharArray nor Convert.TryToBase64Chars, however. The ToBase64String change makes use of a temporary array rented from the array pool, and the expectation is it'll rarely allocate, but if it does, it's part of a method that's already allocating the resulting string and so it's presumed to not be too impactful. ToBase64CharArray and TryToBase64Chars, however, are intended to be entirely non-allocating, and so even renting from the array pool could be problematic if it fails to find a buffer in the pool.

This PR changes the non-allocating variants to use Base64.EncodeToUtf8 as well. But instead of renting a temporary buffer, it banks on the knowledge that the encoded Base64 bytes are 1/2 the length of the resulting chars, since the bytes are all guaranteed to be ASCII. Thus, it can treat the destination char buffer as scratch space for the encoded UTF8 bytes, and then widen in-place. This obviates the need for a separate temporary buffer, making it appropriate for the non-allocating versions. And once we had the helper for those, we can use that same helper to replace the code added to ToBase64String, making it non-allocating as well (beyond of course the result string it has to allocate by its nature), and thus making it more predictable.

Overall, this fixes the possible additional allocation in ToBase64String as well as the performance inversion that the allocating ToBase64String could have been significantly faster (due to vectorization) than the ToBase64CharArray and ToBase64Chars methods intended to be the faster versions.

[Params(16, 64, 256, 1024)]
public int Length { get; set; }

private byte[] _data;
private char[] _scratch;

[GlobalSetup]
public void Setup()
{
    _data = new byte[Length];
    _scratch = new char[Length * 4];
    var r = new Random(42);
    r.NextBytes(_data);
}

[Benchmark]
public string ToBase64String() => Convert.ToBase64String(_data);

[Benchmark]
public void ToBase64CharArray() => Convert.ToBase64CharArray(_data, 0, _data.Length, _scratch, 0);

[Benchmark]
public void ToBase64Chars() => Convert.TryToBase64Chars(_data, _scratch, out _);

Method	Toolchain	Length	Mean	Ratio	Allocated
ToBase64String	\main\corerun.exe	16	34.86 ns	1.00	72 B
ToBase64String	\pr\corerun.exe	16	34.82 ns	1.00	72 B

ToBase64CharArray	\main\corerun.exe	16	26.08 ns	1.00	-
ToBase64CharArray	\pr\corerun.exe	16	27.12 ns	1.04	-

ToBase64Chars	\main\corerun.exe	16	25.67 ns	1.00	-
ToBase64Chars	\pr\corerun.exe	16	26.55 ns	1.03	-

ToBase64String	\main\corerun.exe	64	50.12 ns	1.00	200 B
ToBase64String	\pr\corerun.exe	64	49.22 ns	0.98	200 B

ToBase64CharArray	\main\corerun.exe	64	79.72 ns	1.00	-
ToBase64CharArray	\pr\corerun.exe	64	31.39 ns	0.39	-

ToBase64Chars	\main\corerun.exe	64	78.80 ns	1.00	-
ToBase64Chars	\pr\corerun.exe	64	31.49 ns	0.40	-

ToBase64String	\main\corerun.exe	256	137.63 ns	1.00	712 B
ToBase64String	\pr\corerun.exe	256	108.71 ns	0.79	712 B

ToBase64CharArray	\main\corerun.exe	256	300.65 ns	1.00	-
ToBase64CharArray	\pr\corerun.exe	256	47.43 ns	0.16	-

ToBase64Chars	\main\corerun.exe	256	299.34 ns	1.00	-
ToBase64Chars	\pr\corerun.exe	256	46.80 ns	0.16	-

ToBase64String	\main\corerun.exe	1024	392.78 ns	1.00	2760 B
ToBase64String	\pr\corerun.exe	1024	346.42 ns	0.88	2760 B

ToBase64CharArray	\main\corerun.exe	1024	1,174.50 ns	1.00	-
ToBase64CharArray	\pr\corerun.exe	1024	116.84 ns	0.10	-

ToBase64Chars	\main\corerun.exe	1024	1,162.84 ns	1.00	-
ToBase64Chars	\pr\corerun.exe	1024	116.44 ns	0.10	-

Author:	stephentoub
Assignees:	-
Labels:	`area-System.Runtime`, `tenet-performance`
Milestone:	7.0.0

src/libraries/System.Private.CoreLib/src/System/Convert.cs

src/libraries/System.Runtime.Extensions/tests/System/Convert.cs

src/libraries/System.Private.CoreLib/src/System/Convert.cs

stephentoub · 2022-08-04T13:36:55Z

[Params(8, 16, 22, 32, 46, 64, 70, 128, 256, 1024)]
public int Length { get; set; }

private byte[] _data;
private char[] _scratch;

[GlobalSetup]
public void Setup()
{
    _data = new byte[Length];
    _scratch = new char[Length * 4];
    var r = new Random(42);
    r.NextBytes(_data);
}

[Benchmark]
public string ToBase64String() => Convert.ToBase64String(_data);

[Benchmark]
public void ToBase64CharArray() => Convert.ToBase64CharArray(_data, 0, _data.Length, _scratch, 0);

Method	Toolchain	Length	Mean	Ratio
ToBase64String	\main\corerun.exe	8	22.60 ns	1.00
ToBase64String	\pr\corerun.exe	8	22.39 ns	0.99

ToBase64CharArray	\main\corerun.exe	8	17.14 ns	1.00
ToBase64CharArray	\pr\corerun.exe	8	18.06 ns	1.05

ToBase64String	\main\corerun.exe	16	34.52 ns	1.00
ToBase64String	\pr\corerun.exe	16	34.39 ns	1.00

ToBase64CharArray	\main\corerun.exe	16	26.15 ns	1.00
ToBase64CharArray	\pr\corerun.exe	16	25.62 ns	0.98

ToBase64String	\main\corerun.exe	22	44.58 ns	1.00
ToBase64String	\pr\corerun.exe	22	36.16 ns	0.81

ToBase64CharArray	\main\corerun.exe	22	32.45 ns	1.00
ToBase64CharArray	\pr\corerun.exe	22	29.72 ns	0.92

ToBase64String	\main\corerun.exe	32	55.07 ns	1.00
ToBase64String	\pr\corerun.exe	32	40.58 ns	0.74

ToBase64CharArray	\main\corerun.exe	32	43.26 ns	1.00
ToBase64CharArray	\pr\corerun.exe	32	30.63 ns	0.71

ToBase64String	\main\corerun.exe	46	74.12 ns	1.00
ToBase64String	\pr\corerun.exe	46	48.29 ns	0.65

ToBase64CharArray	\main\corerun.exe	46	63.22 ns	1.00
ToBase64CharArray	\pr\corerun.exe	46	32.52 ns	0.51

ToBase64String	\main\corerun.exe	64	48.63 ns	1.00
ToBase64String	\pr\corerun.exe	64	48.90 ns	1.01

ToBase64CharArray	\main\corerun.exe	64	79.96 ns	1.00
ToBase64CharArray	\pr\corerun.exe	64	31.80 ns	0.40

ToBase64String	\main\corerun.exe	70	53.71 ns	1.00
ToBase64String	\pr\corerun.exe	70	53.74 ns	1.00

ToBase64CharArray	\main\corerun.exe	70	85.77 ns	1.00
ToBase64CharArray	\pr\corerun.exe	70	34.40 ns	0.40

ToBase64String	\main\corerun.exe	128	70.16 ns	1.00
ToBase64String	\pr\corerun.exe	128	69.40 ns	0.99

ToBase64CharArray	\main\corerun.exe	128	149.68 ns	1.00
ToBase64CharArray	\pr\corerun.exe	128	37.28 ns	0.25

ToBase64String	\main\corerun.exe	256	129.82 ns	1.00
ToBase64String	\pr\corerun.exe	256	108.09 ns	0.83

ToBase64CharArray	\main\corerun.exe	256	296.20 ns	1.00
ToBase64CharArray	\pr\corerun.exe	256	46.06 ns	0.16

ToBase64String	\main\corerun.exe	1024	387.87 ns	1.00
ToBase64String	\pr\corerun.exe	1024	344.63 ns	0.89

ToBase64CharArray	\main\corerun.exe	1024	1,154.97 ns	1.00
ToBase64CharArray	\pr\corerun.exe	1024	114.74 ns	0.10

stephentoub · 2022-08-05T14:28:29Z

Failure is #73247

kunalspathak · 2022-08-11T16:48:48Z

linux/arm64 improvements dotnet/perf-autofiling-issues#7250

kunalspathak · 2022-08-11T16:50:56Z

windows/arm64 improvements dotnet/perf-autofiling-issues#7244

stephentoub added area-System.Runtime tenet-performance Performance related issue labels Aug 3, 2022

stephentoub added this to the 7.0.0 milestone Aug 3, 2022

stephentoub requested review from EgorBo, GrabYourPitchforks, adamsitnik and tannergooding August 3, 2022 19:08

ghost assigned stephentoub Aug 3, 2022

EgorBo reviewed Aug 3, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Convert.cs Outdated Show resolved Hide resolved

EgorBo reviewed Aug 3, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Convert.cs Outdated Show resolved Hide resolved

adamsitnik reviewed Aug 4, 2022

View reviewed changes

EgorBo approved these changes Aug 4, 2022

View reviewed changes

Address PR feedback and a bit of additional cleanup

3ce3977

Merge branch 'dotnet:main' into vectorizebase64

9de14dd

stephentoub merged commit 053fb58 into dotnet:main Aug 5, 2022

stephentoub deleted the vectorizebase64 branch August 5, 2022 14:28

This was referenced Aug 5, 2022

Infra improvements for Helix #68176

Closed

GC/API/GC/GetGCMemoryInfo/GetGCMemoryInfo.sh test failing intermittently on CoreCLR Linux ARM32 #73247

Closed

kunalspathak mentioned this pull request Aug 11, 2022

[Perf] Improvement on 8/5/2022 7:33:33 PM dotnet/perf-autofiling-issues#7259

Closed

ghost locked as resolved and limited conversation to collaborators Sep 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize Convert.ToBase64CharArray and TryToBase64Chars #73320

Vectorize Convert.ToBase64CharArray and TryToBase64Chars #73320

stephentoub commented Aug 3, 2022

ghost commented Aug 3, 2022

stephentoub commented Aug 4, 2022

stephentoub commented Aug 5, 2022

kunalspathak commented Aug 11, 2022

kunalspathak commented Aug 11, 2022

Vectorize Convert.ToBase64CharArray and TryToBase64Chars #73320

Vectorize Convert.ToBase64CharArray and TryToBase64Chars #73320

Conversation

stephentoub commented Aug 3, 2022

ghost commented Aug 3, 2022

stephentoub commented Aug 4, 2022

stephentoub commented Aug 5, 2022

kunalspathak commented Aug 11, 2022

kunalspathak commented Aug 11, 2022