Make memory alignment more random #1513

adamsitnik · 2020-08-11T17:46:24Z

While working on a new bot for auto-filing performance regressions in dotnet/runtime repository (sample issue) we have found out that quite a few microbenchmarks from dotnet/performance repository are bimodal and the modality tends to be stable for a few days before it switches to the other mode

Example:

So it's very often something like:

_____~~~~~~_____~~~~~~______~~~~~~

A while ago @AndyAyersMS has mentioned stabilizer which performs full randomization.

.NET does not allow for full control of memory alignment, but we could at least try to make it more random.

In dotnet/runtime#37814 @jkotas has provided a small repro that shows "the many modal nature of memory copying":

using System;
using System.Diagnostics;

class Program
{
    static void Work(ReadOnlySpan<string> from, Span<string> to)
    {
        for (int i = 0; i < 1000000; i++) from.CopyTo(to);
    }

    static void Main(string[] args)
    {
        Random r = new Random();
        for (;;)
        {
            var sw = new Stopwatch();
            GC.KeepAlive(new byte[r.Next(32)]); // the trick
            var from = new string[2048];
            GC.KeepAlive(new byte[r.Next(32)]);
            var to = new string[2048];
            if (r.Next(10) == 0) GC.Collect();
            sw.Start();
            Work(from, to);       
            sw.Stop();
            Console.WriteLine(sw.ElapsedMilliseconds);
        }
    }
}

So the first step could be to allocate a variable-size byte array between iterations to have more randomized memory alignment of the objects allocated during benchmarking.

The problem is that very often the input is allocated in [GlobalSetup] method (to exclude the cost of allocation from the benchmark which is good) which we promise to call only once during benchmarking ;) Perhaps we could add an optional config flag to invoke it once per every iteration? (but somehow avoid the [IterationSetup] hell)
Some benchmarks are initialized in constructors, so we might also consider allocating a new instance of benchmarked type for every iteration.

@AndreyAkinshin what do you think?

@DrewScoggins is there any chance you could provide a list of such benchmarks to use them for experimenting?

/cc @billwert @tannergooding @kunalspathak

The text was updated successfully, but these errors were encountered:

DrewScoggins · 2020-08-12T23:30:46Z

I can put together a list of places where we are seeing this and link it here.

AndreyAkinshin · 2020-08-13T08:15:19Z

@adamsitnik what do you think about customizing the LaunchCount that controls the number of started benchmark processes? I designed this property exactly for such cases. As a bonus, we can add some additional memory randomization before the GlobalSetup.

I do not like the idea of "GlobalSetup-like" initialization between iterations because it can destroy benchmark stability in the case of microbenchmarks.

The LaunchCount approach may be expensive from the total benchmarking time point of view, but it should be a reliable and stable way that solves the problem (at least, I don't know other acceptable solutions).

P.S. In the good old days, the default value for LaunchCount was 3, so it was easier to detect such problems. However, it didn't provide benefits in most "simple" cases, so I changed it to 1 in order to reduce the total benchmarking time.

AndreyAkinshin · 2020-08-13T08:20:47Z

Also, I have a more "adaptive" idea. We can introduce an additional [MemoryRandomization] attribute that will do the following:

Set initial LaucnhCount to 5 and perform 5 launches with additional random allocations before GlobalSetup.
If all launches provide similar results, stop the benchmark. Otherwise, continue to launch new benchmarking processes until we get a stable result across all launches.

@adamsitnik what do you think?

danmoseley · 2020-09-16T18:38:49Z

The example at the top (stable over several days) is interesting, but the most common case I see (which I assume is the general case of the above issue) is more like this

Of course CopyTo is pure memcpy so it's very alignment sensitive, but I see this all over benchmarks for collections that are backed by arrays eg

I wonder if this is something my team can help with (possibly @adamsitnik, or not) since it apparently affects us quite a bit.

danmoseley · 2020-09-16T18:41:09Z

This may be worse on ARM64, Another example dotnet/runtime#41741

DrewScoggins · 2020-09-16T18:41:09Z

I have also been working on some filters, that I think should do a good job of filtering out this kind of test. So that should at least help from a reporting side.

danmoseley · 2020-09-16T18:42:42Z

@DrewScoggins could you say more? I wouldn't want us to filter out these tests - they're core tests. It seems to me this is a problem ideally fixed in BDN or else possibly with workarounds in the benchmarks.

DrewScoggins · 2020-09-16T18:45:34Z

I mean that when we go to report regressions in the auto-filing we would not report a jump between bimodal points as a regression unless we see change in the character of the bimodal behavior. We won't be getting rid of them or stop running them, and certainly it is better to fix these to not have this behavior, but in the meantime I don't want tests like this to take up our triaging time.

danmoseley · 2020-09-16T18:50:13Z

Ah right - yeah, but we still need to detect this kind of change, even if it's a manual business (faked picture)

Anyway - this is the BDN repo and this issue is about potentially improving BDN so we should focus on that :)

AndyAyersMS · 2020-09-16T19:35:15Z

We've talked about this a fair bit in perf triage -- basically if the underlying distribution shifts, than that is a significant event, even if the distributions themselves are broad (noisy) or bimodal.

The main questions are how to determine what data points logically belong to the "same" distribution and how much data you need to accurately characterize the distributions (especially when bimodal or multimodal).

As for fixing this behavior, we can either try and regularize alignments or randomize them -- long term we should perhaps pursue both.

Regularization helps us and also our customers, who will experience the same sort of uncontrolled perf fluctuations in their code.
Luckily most customers don't depend heavily on microbenchmarks, and in larger bodies of code these things tend to average out.

For regularization of code alignment we've started a little ways down this path but need to go further, and consider controlling loop alignment. But it is a tricky thing to get right.

Randomization helps benchmarking by ensuring that each set of runs visits a wide variety of possible behaviors, so we more quickly can get a sense of the true distribution. Currently in benchmarks with broad / complex distributions we can only effectively detect regressions after some time has passed and the new behavior becomes clear. Randomization would speed up this process and perhaps get us back to the point where we could reliably detect regressions with just one set of base/diff runs in most tests.

danmoseley · 2020-09-16T19:45:48Z

@AndyAyersMS by regularize/randomize - are you referring to something the runtime could do (eg an opt in mode where the GC adds a little random padding before array allocations) or something that BDN could potentially do (eg @AndreyAkinshin suggestion of increasing launch counts and adding some random allocation on each launch)

AndyAyersMS · 2020-09-16T19:58:02Z

The runtime would regularize (or the jit would). BDN would randomize.

danmoseley · 2020-09-16T20:02:19Z

Is there an issue tracking the runtime/JIT work @AndyAyersMS , or would you mind creating one? It might be agood candidate for work before 6.0 features.

AndyAyersMS · 2020-09-16T20:09:11Z

On the jit side we need to implement loop head alignment (dotnet/runtime#8108); as a prerequisite we need to do method entry alignment more broadly (dotnet/runtime#9912).

x64 alignment is now 32 bytes for Tier1 methods with loops, see dotnet/runtime#2249. I had hoped this would reduce/eliminate some of the bimodal behavior but our benchmark results seemingly say otherwise (though I haven't done a systematic search...)

[edit: fixed links]

JulieLeeMSFT · 2020-09-24T18:36:33Z

@kunalspathak will look into this with Andy and others.

kunalspathak · 2020-10-30T08:24:18Z

Few benchmarks that are suspected to be affected because of memory alignment:

DrewScoggins/performance-2#2290 (comment)
DrewScoggins/performance-2#2271 (comment)

adamsitnik · 2021-01-20T18:57:20Z

Fixed by #1587

adamsitnik added the discussion label Aug 11, 2020

danmoseley mentioned this issue Sep 16, 2020

[ARM64] Performance regression: DictionarySequentialKeys.ContainsKey dotnet/runtime#41743

Closed

kunalspathak mentioned this issue Oct 9, 2020

Stabilize performance measurement dotnet/runtime#43227

Open

20 tasks

This was referenced Oct 23, 2020

Hight Volatility & Strange4 Disasm #1124

Closed

Performance changes when I use WithId #1342

Closed

adamsitnik mentioned this issue Nov 6, 2020

Memory Randomization #1587

Merged

adamsitnik closed this as completed Jan 20, 2021

adamsitnik mentioned this issue Jun 28, 2021

unknown Bug/NotBug about span benchmarks #1733

Closed

eiriktsarpalis mentioned this issue Oct 29, 2021

[Perf -16%] System.Collections.CopyTo<String> (6) dotnet/runtime#37814

Closed

AndyAyersMS mentioned this issue Aug 14, 2023

.net 7.0 perf regression with stopwatch dotnet/runtime#90498

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make memory alignment more random #1513

Make memory alignment more random #1513

adamsitnik commented Aug 11, 2020

DrewScoggins commented Aug 12, 2020

AndreyAkinshin commented Aug 13, 2020

AndreyAkinshin commented Aug 13, 2020

danmoseley commented Sep 16, 2020

danmoseley commented Sep 16, 2020

DrewScoggins commented Sep 16, 2020

danmoseley commented Sep 16, 2020

DrewScoggins commented Sep 16, 2020

danmoseley commented Sep 16, 2020

AndyAyersMS commented Sep 16, 2020

danmoseley commented Sep 16, 2020

AndyAyersMS commented Sep 16, 2020

danmoseley commented Sep 16, 2020

AndyAyersMS commented Sep 16, 2020 •

edited

Loading

JulieLeeMSFT commented Sep 24, 2020

kunalspathak commented Oct 30, 2020

adamsitnik commented Jan 20, 2021

Make memory alignment more random #1513

Make memory alignment more random #1513

Comments

adamsitnik commented Aug 11, 2020

DrewScoggins commented Aug 12, 2020

AndreyAkinshin commented Aug 13, 2020

AndreyAkinshin commented Aug 13, 2020

danmoseley commented Sep 16, 2020

danmoseley commented Sep 16, 2020

DrewScoggins commented Sep 16, 2020

danmoseley commented Sep 16, 2020

DrewScoggins commented Sep 16, 2020

danmoseley commented Sep 16, 2020

AndyAyersMS commented Sep 16, 2020

danmoseley commented Sep 16, 2020

AndyAyersMS commented Sep 16, 2020

danmoseley commented Sep 16, 2020

AndyAyersMS commented Sep 16, 2020 • edited Loading

JulieLeeMSFT commented Sep 24, 2020

kunalspathak commented Oct 30, 2020

adamsitnik commented Jan 20, 2021

AndyAyersMS commented Sep 16, 2020 •

edited

Loading