Never ignore JitFramed flag #105850

EgorBo · 2024-08-02T03:25:09Z

Contributes to #105690

dotnet-policy-service · 2024-08-02T03:25:38Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

jkotas · 2024-08-02T03:51:21Z

Do you understand why this produces better stacktraces?

EgorBot · 2024-08-02T04:03:58Z

Benchmark results on Intel

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores
  Job-XMRLBT : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-GHOVFD : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Method	Toolchain	Mean	Error	Ratio
JsonStatham	Main	103.0 μs	0.15 μs	1.00
JsonStatham	PR	102.8 μs	0.13 μs	1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBot · 2024-08-02T04:04:01Z

Benchmark results on Intel

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores
  Job-JAZZDA : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-HMQMQM : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Method	Toolchain	Mean	Error	Ratio
JsonStatham	Main	105.8 μs	0.12 μs	1.00
JsonStatham	PR	103.5 μs	0.21 μs	0.98

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBot · 2024-08-02T04:05:13Z

Benchmark results on Amd

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 8 logical and 4 physical cores
  Job-SUUSBW : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-NMIAFJ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Method	Toolchain	Mean	Error	Ratio
JsonStatham	Main	120.2 μs	0.15 μs	1.00
JsonStatham	PR	117.6 μs	0.21 μs	0.98

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBot · 2024-08-02T04:06:20Z

Benchmark results on Amd

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 8 logical and 4 physical cores
  Job-YSLYSP : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-VGGPOY : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Method	Toolchain	Mean	Error	Ratio
JsonStatham	Main	118.3 μs	0.12 μs	1.00
JsonStatham	PR	115.1 μs	0.16 μs	0.97

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBo · 2024-08-02T04:17:51Z

Do you understand why this produces better stacktraces?

It just seemed odd to me that JIT just ignores DOTNET_JitFramed=1. My bot ran the benchmark on two configs twice with DOTNET_JitNoInline=1 (so code is optimized, but literally nothing is inlined in both base and diff). I already can see a difference e.g. in Grisu::TryRonShortest

it reproduces for both Intel and Amd and for both runs, so basically 4 exactly the same "diffs" for this part.

Related: https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html

jkotas · 2024-08-02T04:22:49Z

src/coreclr/jit/compiler.cpp

        // The VM sets JitFlags::JIT_FLAG_FRAMED for two reasons: (1) the DOTNET_JitFramed variable is set, or
        // (2) the function is marked "noinline". The reason for #2 is that people mark functions
-        // noinline to ensure the show up on in a stack walk. But for AMD64, we don't need a frame


The deleted comment is still true on Windows. I guess it was written before we cared about non-Windows.

jkotas · 2024-08-02T04:29:54Z

It just seemed odd to me that JIT just ignores DOTNET_JitFramed=1

It makes sense on Windows x64. The RBP-frames are useless on Windows x64.

I would like to understand why this helps. Our strategy for omitting the RBP-frames should not mess with RBP-based stackwalking on Windows x86 or Linux x64: We do not use EBP/RBP as a general purpose register, so the effect of methods without the frame should be similar to inlining or tailcalling. It should not break the stackwalking.

EgorBot · 2024-08-02T05:38:58Z

Benchmark results on Intel

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores
  Job-YZSLNG : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-JLRZYO : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Method	Toolchain	Mean	Error	Ratio
JsonStatham	Main	101.2 μs	0.52 μs	1.00
JsonStatham	PR	100.4 μs	0.23 μs	0.99

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBot · 2024-08-02T05:41:56Z

Benchmark results on Amd

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 8 logical and 4 physical cores
  Job-WBNCOD : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-ESOWHE : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Method	Toolchain	Mean	Error	Ratio
JsonStatham	Main	118.2 μs	0.12 μs	1.00
JsonStatham	PR	119.0 μs	0.07 μs	1.01

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBot · 2024-08-02T06:33:59Z

Benchmark results on Intel

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores
  Job-FSTTJQ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-OGLWUT : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Method	Toolchain	Mean	Error	Ratio
JsonStatham	Main	102.1 μs	0.36 μs	1.00
JsonStatham	PR	103.2 μs	0.32 μs	1.01

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBot · 2024-08-02T06:35:43Z

Benchmark results on Amd

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 8 logical and 4 physical cores
  Job-QGOVFI : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-WHTEKQ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Method	Toolchain	Mean	Error	Ratio
JsonStatham	Main	119.3 μs	0.28 μs	1.00
JsonStatham	PR	115.7 μs	0.12 μs	0.97

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBo · 2024-08-02T06:56:23Z

Here are the runs with completely default settings (no attempts to stop any inlining)

Intel (Main):

Intel (this PR):

Amd (Main):

Amd (this PR):

I think I cannot confirm that my change "fixes" the flamegraphs, e.g. AMD (this PR) two still look random (while Main is ok). I guess I need to find the problem elsewhere, e.g. collect traces and compare inlining/tailcall decisions etc.

I still have an impression that arm64 is more stable

EgorBot · 2024-08-02T07:29:50Z

Benchmark results on Arm64

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-RIIJTD : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-LRGWEA : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD

Method	Toolchain	Mean	Error	Ratio
JsonStatham	Main	121.9 μs	0.24 μs	1.00
JsonStatham	PR	120.7 μs	0.26 μs	0.99

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBot · 2024-08-02T07:33:10Z

Benchmark results on Arm64

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-ASKJND : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-ILUEYY : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD

Method	Toolchain	Mean	Error	Ratio
JsonStatham	Main	123.5 μs	0.23 μs	1.00
JsonStatham	PR	123.0 μs	0.13 μs	1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBo · 2024-08-02T07:33:59Z

4 flamegraphs for arm64:

the last one has differences 🤔

EgorBo · 2024-08-02T07:34:38Z

I wonder if it's possible to add Tier name to symbols..
UPD: ah, there is PerfMapShowOptimizationTiers

EgorBo · 2024-08-02T08:00:56Z

Test run with DOTNET_PerfMapShowOptimizationTiers=1

@EgorBot -arm64 -profiler

using System;
using System.Text.Json;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<MyBench>(args: args);

public class MyBench
{
    static MyObj[] Data = TestData();
    static TimeSpan TS = TimeSpan.FromDays(42);

    public static MyObj[] TestData()
    {
        MyObj[] testData = new MyObj[100];
        for (int i = 0; i < testData.Length; i++)
        {
            MyObj obj1 = new("Some long ASCII text bla bla bla", 42, Guid.NewGuid(), 3.14, TS, null);
            MyObj obj2 = new("'')((*&&^%$@#$%$^&*())''';(*&^%$E##^%$&%^*(", i, Guid.NewGuid(), 3.14, TS, obj1);
            testData[i] = obj2;
        }
        return testData;
    }

    [Benchmark]
    public object JsonStatham() => JsonSerializer.Serialize(Data);
}

public record MyObj(string Name, int Age, Guid Id, double SomeFloat, TimeSpan Ts, MyObj? InnerObj);

EgorBot · 2024-08-02T08:31:49Z

Benchmark results on Arm64

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-REKULD : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-MMYPOS : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD

Method	Toolchain	Mean	Error	Ratio
JsonStatham	Main	124.1 μs	0.39 μs	1.00
JsonStatham	PR	123.0 μs	0.34 μs	0.99

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBo · 2024-08-07T13:42:00Z

@EgorBot -amd -intel -profiler

using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

public class Bench
{
    [Benchmark]
    public void WB()
    {
        Foo foo = new Foo();
        for (long i = 0; i < 200000000; i++)
            foo.x = foo;
    }
}

internal class Foo
{
    public volatile Foo x;
}

EgorBot · 2024-08-07T14:00:03Z

Benchmark results on Intel

BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 16 logical and 8 physical cores
  Job-LVMPHD : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-UNZSHV : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Method	Toolchain	Mean	Error	Ratio
WB	Main	286.8 ms	0.03 ms	1.00
WB	PR	286.8 ms	0.03 ms	1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBot · 2024-08-07T14:01:23Z

Benchmark results on Amd

BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 16 logical and 8 physical cores
  Job-BZIGNZ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-XVSRMV : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Method	Toolchain	Mean	Error	Ratio
WB	Main	370.8 ms	0.13 ms	1.00
WB	PR	432.6 ms	0.05 ms	1.17

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBo · 2024-08-07T14:03:32Z

@EgorBot -arm64 -profiler

using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

public class Bench
{
    [Benchmark]
    public void WB()
    {
        Foo foo = new Foo();
        for (long i = 0; i < 200000000; i++)
            foo.x = foo;
    }
}

internal class Foo
{
    public volatile Foo x;
}

EgorBot · 2024-08-07T14:29:46Z

Benchmark results on Arm64

BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-UFZEOE : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-IUYDGS : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD

Method	Toolchain	Mean	Error	Ratio
WB	Main	470.1 ms	0.77 ms	1.00
WB	PR	469.2 ms	0.53 ms	1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBo · 2024-08-07T22:01:32Z

@EgorBot -intel -amd -profiler

using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

public class Bench
{
    [Benchmark]
    public void WB()
    {
        Foo foo = new Foo();
        for (long i = 0; i < 200000000; i++)
            foo.x = foo;
    }
}

internal class Foo
{
    public volatile Foo x;
}

EgorBot · 2024-08-07T22:19:46Z

Benchmark results on Intel

BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 16 logical and 8 physical cores
  Job-ZWXGRL : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-AXOANX : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Method	Toolchain	Mean	Error	Ratio
WB	Main	229.8 ms	0.88 ms	1.00
WB	PR	229.6 ms	0.05 ms	1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBot · 2024-08-07T22:20:42Z

Benchmark results on Amd

BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 16 logical and 8 physical cores
  Job-MZBLJW : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-TQIWLQ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Method	Toolchain	Mean	Error	Ratio
WB	Main	432.6 ms	0.06 ms	1.00
WB	PR	432.5 ms	0.05 ms	1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

EgorBo added 2 commits August 2, 2024 05:16

Always respect JitFramed on x64

68c12ac

enable for perfmap

71545b5

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Never ignore JitFramed flag #105850

Never ignore JitFramed flag #105850

EgorBo commented Aug 2, 2024

dotnet-policy-service bot commented Aug 2, 2024

This comment was marked as resolved.

This comment was marked as resolved.

jkotas commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBo commented Aug 2, 2024 •

edited

Loading

jkotas Aug 2, 2024

jkotas commented Aug 2, 2024 •

edited

Loading

This comment was marked as resolved.

EgorBot commented Aug 2, 2024

EgorBot commented Aug 2, 2024

This comment was marked as resolved.

EgorBot commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBo commented Aug 2, 2024 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

EgorBot commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBo commented Aug 2, 2024

EgorBo commented Aug 2, 2024 •

edited

Loading

EgorBo commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBo commented Aug 7, 2024

EgorBot commented Aug 7, 2024

EgorBot commented Aug 7, 2024

EgorBo commented Aug 7, 2024

EgorBot commented Aug 7, 2024

EgorBo commented Aug 7, 2024

EgorBot commented Aug 7, 2024

EgorBot commented Aug 7, 2024

Never ignore JitFramed flag #105850

Never ignore JitFramed flag #105850

Conversation

EgorBo commented Aug 2, 2024

dotnet-policy-service bot commented Aug 2, 2024

This comment was marked as resolved.

This comment was marked as resolved.

jkotas commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBo commented Aug 2, 2024 • edited Loading

jkotas Aug 2, 2024

Choose a reason for hiding this comment

jkotas commented Aug 2, 2024 • edited Loading

This comment was marked as resolved.

EgorBot commented Aug 2, 2024

EgorBot commented Aug 2, 2024

This comment was marked as resolved.

EgorBot commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBo commented Aug 2, 2024 • edited Loading

This comment was marked as resolved.

This comment was marked as resolved.

EgorBot commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBo commented Aug 2, 2024

EgorBo commented Aug 2, 2024 • edited Loading

EgorBo commented Aug 2, 2024

EgorBot commented Aug 2, 2024

EgorBo commented Aug 7, 2024

EgorBot commented Aug 7, 2024

EgorBot commented Aug 7, 2024

EgorBo commented Aug 7, 2024

EgorBot commented Aug 7, 2024

EgorBo commented Aug 7, 2024

EgorBot commented Aug 7, 2024

EgorBot commented Aug 7, 2024

EgorBo commented Aug 2, 2024 •

edited

Loading

jkotas commented Aug 2, 2024 •

edited

Loading

EgorBo commented Aug 2, 2024 •

edited

Loading

EgorBo commented Aug 2, 2024 •

edited

Loading