[Removed] #1588

adamsitnik · 2020-11-10T15:57:51Z

The study is not finished yet, I am going to create this issue now and add a comment for every benchmark that has a significant difference and afterward update the description of this issue and provide a single summary. There are simply too many benchmarks to put them in one GH comment ;)

BDN change: dotnet/BenchmarkDotNet#1587
Since this change requires us to call [GlobalSetup] and [GlobalCleanup] more than once I had to:

add missing cleanups: add missing cleanups #1586
move fields initialization from ctors to [GlobalSetup] methods, split big setups into smaller targeted ones, and fix some bugs that occurred when invoking them more than once: Refactor initialization logic to allow for enabling Memory Randomization #1587

List of benchmarks dependent on memory alignment for which enabling the feature helps:

System.Collections.CopyTo
System.Memory.Span.IndexOfAnyThreeValues
System.Collections.ContainsTrue<Int32>.Span

List of benchmarks where it does not help but changes the reported time:

System.Collections.CreateAddAndClear.Stack

Notes: we might need more than 20 iterations and we should not be removing upper outliers
Note: we should not enable it by default for all benchmarks

The text was updated successfully, but these errors were encountered:

adamsitnik · 2020-11-10T18:22:02Z

As expected, reallocating the arrays after every iteration helps with CopyTo benchmarks. We are now getting non-flat distribution with a single run, while previously it was always flat depending on the initial alignment.

Sample distribution:

Before:
-------------------- Histogram --------------------
[87.431 ns ; 89.140 ns) | @@@@@@@@@@@@@
---------------------------------------------------

After:
-------------------- Histogram --------------------
[ 79.009 ns ; 183.239 ns) | @@@@@@@@@@
[183.239 ns ; 287.047 ns) | @
[287.047 ns ; 391.277 ns) | @@@@@@@@
[391.277 ns ; 501.821 ns) | @
---------------------------------------------------

Note: we might need to run more than 20 iterations to get the full distribution and we can't exclude outliers. Otherwise for the distribution below:

-------------------- Histogram --------------------
[ 70.333 ns ; 151.954 ns) | @@
[151.954 ns ; 224.021 ns) | @@@@@@@@@@@@@@
[224.021 ns ; 262.438 ns) | @@@
[262.438 ns ; 334.505 ns) | 
[334.505 ns ; 421.098 ns) | 
[421.098 ns ; 493.165 ns) | @
---------------------------------------------------

The 421.098 ns ; 493.165 ns bucket with just a single result might get recognized as an outlier and removed.

Type	Method	OutlierMode	Random	Size	Mean	Median	Min	Max	Ratio
CopyTo<Int32>	Array	DontRemove	True	2048	195.86 ns	188.54 ns	106.37 ns	457.13 ns	1.23
CopyTo<Int32>	Array	RemoveUpper	False	2048	160.13 ns	158.17 ns	156.13 ns	167.71 ns	1.00

CopyTo<String>	Array	DontRemove	True	2048	497.14 ns	493.98 ns	473.03 ns	521.19 ns	1.09
CopyTo<String>	Array	RemoveUpper	False	2048	457.89 ns	458.05 ns	455.23 ns	460.50 ns	1.00

CopyTo<Int32>	Span	DontRemove	True	2048	240.38 ns	187.61 ns	106.56 ns	449.71 ns	2.82
CopyTo<Int32>	Span	RemoveUpper	False	2048	88.11 ns	88.06 ns	87.62 ns	88.95 ns	1.00

CopyTo<String>	Span	DontRemove	True	2048	551.28 ns	553.19 ns	525.40 ns	561.59 ns	1.01
CopyTo<String>	Span	RemoveUpper	False	2048	547.60 ns	548.00 ns	544.06 ns	550.11 ns	1.00

CopyTo<Int32>	ReadOnlySpan	DontRemove	True	2048	304.46 ns	243.03 ns	107.91 ns	512.46 ns	3.52
CopyTo<Int32>	ReadOnlySpan	RemoveUpper	False	2048	88.72 ns	88.83 ns	87.90 ns	90.13 ns	1.00

CopyTo<String>	ReadOnlySpan	DontRemove	True	2048	542.40 ns	540.23 ns	529.51 ns	570.88 ns	1.06
CopyTo<String>	ReadOnlySpan	RemoveUpper	False	2048	516.74 ns	517.69 ns	510.92 ns	522.16 ns	1.00

CopyTo<Int32>	Memory	DontRemove	True	2048	320.58 ns	316.07 ns	110.51 ns	518.71 ns	3.41
CopyTo<Int32>	Memory	RemoveUpper	False	2048	93.92 ns	92.91 ns	91.43 ns	100.19 ns	1.00

CopyTo<String>	Memory	DontRemove	True	2048	564.98 ns	562.26 ns	546.27 ns	588.31 ns	1.00
CopyTo<String>	Memory	RemoveUpper	False	2048	564.37 ns	563.69 ns	558.38 ns	572.91 ns	1.00

CopyTo<Int32>	ReadOnlyMemory	DontRemove	True	2048	277.53 ns	285.12 ns	108.95 ns	415.08 ns	3.06
CopyTo<Int32>	ReadOnlyMemory	RemoveUpper	False	2048	91.81 ns	91.81 ns	91.33 ns	92.37 ns	1.00

CopyTo<String>	ReadOnlyMemory	DontRemove	True	2048	584.96 ns	584.08 ns	560.89 ns	600.57 ns	1.12
CopyTo<String>	ReadOnlyMemory	RemoveUpper	False	2048	521.95 ns	522.85 ns	513.46 ns	526.16 ns	1.00

CopyTo<Int32>	List	DontRemove	True	2048	394.03 ns	416.93 ns	223.23 ns	467.90 ns	0.84
CopyTo<Int32>	List	RemoveUpper	False	2048	470.78 ns	471.84 ns	465.79 ns	474.09 ns	1.00

CopyTo<String>	List	DontRemove	True	2048	503.69 ns	493.76 ns	477.75 ns	561.17 ns	1.02
CopyTo<String>	List	RemoveUpper	False	2048	486.84 ns	486.59 ns	481.98 ns	491.72 ns	1.00

CopyTo<Int32>	ImmutableArray	DontRemove	True	2048	338.59 ns	373.97 ns	220.69 ns	505.24 ns	1.57
CopyTo<Int32>	ImmutableArray	RemoveUpper	False	2048	222.62 ns	222.42 ns	221.79 ns	224.19 ns	1.00

CopyTo<String>	ImmutableArray	DontRemove	True	2048	494.27 ns	489.13 ns	471.58 ns	526.61 ns	1.04
CopyTo<String>	ImmutableArray	RemoveUpper	False	2048	478.20 ns	477.09 ns	473.28 ns	484.69 ns	1.00

adamsitnik · 2020-11-10T18:40:43Z

The solution is not a silver bullet, there are some unstable benchmarks like System.Collections.CreateAddAndClear.Stack
where randomizing memory does not help. We still have a flat distribution, but the reported time slightly changes (most probably due to not having a perfectly warmed up memory):

Before:

-------------------- Histogram --------------------
[1.846 us ; 1.960 us) | @@@@@@@@@@@@@@@
---------------------------------------------------

After:

-------------------- Histogram --------------------
[2.087 us ; 2.204 us) | @@@@@@@@@@@@@@@
---------------------------------------------------

Method	OutlierMode	MemoryRandomization	Size	Mean	Median	Min	Max	Ratio	Allocated
Stack	DontRemove	True	512	2.130 us	2.122 us	2.093 us	2.188 us	1.13	4 KB
Stack	RemoveUpper	False	512	1.883 us	1.874 us	1.850 us	1.944 us	1.00	4 KB

While in the Reporting System we can see:

adamsitnik · 2020-11-10T19:04:37Z

There are also benchmarks that were quite stable so far and enabling this feature turns them into unstable benchmarks.

Example:

Before:

-------------------- Histogram --------------------
[136.962 ns ; 144.806 ns) | @@@@@@@@@@@@@@
---------------------------------------------------

After:

-------------------- Histogram --------------------
[137.882 ns ; 153.224 ns) | @@@@@@@@@@@@@@@@
[153.224 ns ; 160.511 ns) |
[160.511 ns ; 175.853 ns) | @@@
[175.853 ns ; 195.154 ns) |
[195.154 ns ; 210.496 ns) | @
---------------------------------------------------

Historical Data shows that it's a stable benchmark:

However, we can't be realy sure if consider the historical outliers:

adamsitnik · 2020-11-10T19:17:01Z

For System.Memory.Span.IndexOfAnyThreeValues which is quite unstable when looking at historical data:

The randomization brings the desired effect:

Before:

-------------------- Histogram --------------------
[19.563 ns ; 19.919 ns) | @@@@@@@@@@@@@@
---------------------------------------------------

After:

-------------------- Histogram --------------------
[18.258 ns ; 20.703 ns) | @@@@@@@@@@@@@@@@
[20.703 ns ; 22.180 ns) | @
[22.180 ns ; 25.094 ns) |
[25.094 ns ; 28.005 ns) | @@@
---------------------------------------------------

adamsitnik · 2020-11-10T19:44:51Z

The same goes for System.Collections.ContainsTrue<Int32>.Span:

Before:

-------------------- Histogram --------------------
[36.393 us ; 37.218 us) | @@@@@@@@@@@@@@@
---------------------------------------------------

After:

-------------------- Histogram --------------------
[36.313 us ; 37.255 us) | @@@@@@@@@@@@@@@@@@
[37.255 us ; 38.198 us) |
[38.198 us ; 39.683 us) | @
[39.683 us ; 40.931 us) | @
---------------------------------------------------

danmoseley · 2020-11-10T19:54:31Z

@adamsitnik I'm following along as an observer, but I'm confused by your last two histograms above. Those seem to be more multimodal after your changes, not less?

adamsitnik · 2020-11-12T17:59:29Z

Those seem to be more multimodal after your changes, not less?

@danmosemsft You are right and this is a very good question.

Some benchmarks seem to be stable when you run them just once. Example:

-------------------- Histogram --------------------
[19.563 ns ; 19.919 ns) | @@@@@@@@@@@@@@
--------------------------------------------------

But when you run them many times, they turn to be unstable or multimodal. Example:

Very often it's caused by code|memory alignment changes. When it comes to memory, we recommend allocating whatever is needed for the benchmark run in a [GlobalSetup] method. In the case of this particular benchmark, we allocate the array once and use it for a given benchmark run. If it's aligned, then we get one mode, when it's not we get another.

performance/src/benchmarks/micro/libraries/System.Collections/Contains/ContainsTrue.cs

Lines 38 to 42 in ca80d8e

    
           [GlobalSetup] 
        
           public void Setup() 
        
           { 
        
               _found = ValuesGenerator.ArrayOfUniqueValues<T>(Size); 
        
               _array = _found.ToArray();

With the "memory randomization", BDN calls [GlobalSetup] after every benchmark iteration. By doing that we test the same code for different inputs (aligned and not aligned). We don't get a nice flat distribution, but the distribution that shows all possible scenarios (at least this is what I hope for).

The problem that we have encountered while working on the new bot is benchmarks that... tend to stay in one mode for a while (typically weeks) after they switch to 2nd mode for another few weeks.

With this "randomization" I hope that we can switch from using averages in the historical data to "Median" and by doing that flatten the charts.

So for example for this histogram:

-------------------- Histogram --------------------
[18.258 ns ; 20.703 ns) | @@@@@@@@@@@@@@@@
[20.703 ns ; 22.180 ns) | @
[22.180 ns ; 25.094 ns) |
[25.094 ns ; 28.005 ns) | @@@
---------------------------------------------------

we would show a value from the first bucket (18.258 ns ; 20.703) and most probably never anything from 25.094 ns ; 28.005 ns

danmoseley · 2020-11-12T18:52:12Z

Got it - thanks for the explanation.

adamsitnik · 2020-11-18T15:05:58Z

I've decided to create a new issue with a final report: #1602

adamsitnik changed the title ~~Memory Randomization experiment study~~ [Removed] Nov 18, 2020

adamsitnik closed this as completed Nov 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Removed] #1588

[Removed] #1588

adamsitnik commented Nov 10, 2020 •

edited

Loading

adamsitnik commented Nov 10, 2020

adamsitnik commented Nov 10, 2020

adamsitnik commented Nov 10, 2020

adamsitnik commented Nov 10, 2020 •

edited

Loading

adamsitnik commented Nov 10, 2020

danmoseley commented Nov 10, 2020

adamsitnik commented Nov 12, 2020

danmoseley commented Nov 12, 2020

adamsitnik commented Nov 18, 2020

[Removed] #1588

[Removed] #1588

Comments

adamsitnik commented Nov 10, 2020 • edited Loading

adamsitnik commented Nov 10, 2020

adamsitnik commented Nov 10, 2020

adamsitnik commented Nov 10, 2020

adamsitnik commented Nov 10, 2020 • edited Loading

adamsitnik commented Nov 10, 2020

danmoseley commented Nov 10, 2020

adamsitnik commented Nov 12, 2020

danmoseley commented Nov 12, 2020

adamsitnik commented Nov 18, 2020

adamsitnik commented Nov 10, 2020 •

edited

Loading

adamsitnik commented Nov 10, 2020 •

edited

Loading