Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Never ignore JitFramed flag #105850

Closed
wants to merge 2 commits into from
Closed

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Aug 2, 2024

Contributes to #105690

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 2, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@dotnet dotnet deleted a comment from EgorBot Aug 2, 2024
@dotnet dotnet deleted a comment from EgorBot Aug 2, 2024
@dotnet dotnet deleted a comment from EgorBot Aug 2, 2024
@dotnet dotnet deleted a comment from EgorBot Aug 2, 2024
@EgorBo

This comment was marked as resolved.

@EgorBo

This comment was marked as resolved.

@jkotas
Copy link
Member

jkotas commented Aug 2, 2024

Do you understand why this produces better stacktraces?

@EgorBot
Copy link

EgorBot commented Aug 2, 2024

Benchmark results on Intel
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores
  Job-XMRLBT : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-GHOVFD : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Toolchain Mean Error Ratio
JsonStatham Main 103.0 μs 0.15 μs 1.00
JsonStatham PR 102.8 μs 0.13 μs 1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBot
Copy link

EgorBot commented Aug 2, 2024

Benchmark results on Intel
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores
  Job-JAZZDA : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-HMQMQM : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Toolchain Mean Error Ratio
JsonStatham Main 105.8 μs 0.12 μs 1.00
JsonStatham PR 103.5 μs 0.21 μs 0.98

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBot
Copy link

EgorBot commented Aug 2, 2024

Benchmark results on Amd
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 8 logical and 4 physical cores
  Job-SUUSBW : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-NMIAFJ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method Toolchain Mean Error Ratio
JsonStatham Main 120.2 μs 0.15 μs 1.00
JsonStatham PR 117.6 μs 0.21 μs 0.98

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBot
Copy link

EgorBot commented Aug 2, 2024

Benchmark results on Amd
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 8 logical and 4 physical cores
  Job-YSLYSP : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-VGGPOY : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method Toolchain Mean Error Ratio
JsonStatham Main 118.3 μs 0.12 μs 1.00
JsonStatham PR 115.1 μs 0.16 μs 0.97

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 2, 2024

Do you understand why this produces better stacktraces?

It just seemed odd to me that JIT just ignores DOTNET_JitFramed=1. My bot ran the benchmark on two configs twice with DOTNET_JitNoInline=1 (so code is optimized, but literally nothing is inlined in both base and diff). I already can see a difference e.g. in Grisu::TryRonShortest

image

it reproduces for both Intel and Amd and for both runs, so basically 4 exactly the same "diffs" for this part.

Related: https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html

// The VM sets JitFlags::JIT_FLAG_FRAMED for two reasons: (1) the DOTNET_JitFramed variable is set, or
// (2) the function is marked "noinline". The reason for #2 is that people mark functions
// noinline to ensure the show up on in a stack walk. But for AMD64, we don't need a frame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deleted comment is still true on Windows. I guess it was written before we cared about non-Windows.

@jkotas
Copy link
Member

jkotas commented Aug 2, 2024

It just seemed odd to me that JIT just ignores DOTNET_JitFramed=1

It makes sense on Windows x64. The RBP-frames are useless on Windows x64.

I would like to understand why this helps. Our strategy for omitting the RBP-frames should not mess with RBP-based stackwalking on Windows x86 or Linux x64: We do not use EBP/RBP as a general purpose register, so the effect of methods without the frame should be similar to inlining or tailcalling. It should not break the stackwalking.

@EgorBo

This comment was marked as resolved.

@EgorBot
Copy link

EgorBot commented Aug 2, 2024

Benchmark results on Intel
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores
  Job-YZSLNG : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-JLRZYO : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Toolchain Mean Error Ratio
JsonStatham Main 101.2 μs 0.52 μs 1.00
JsonStatham PR 100.4 μs 0.23 μs 0.99

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBot
Copy link

EgorBot commented Aug 2, 2024

Benchmark results on Amd
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 8 logical and 4 physical cores
  Job-WBNCOD : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-ESOWHE : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method Toolchain Mean Error Ratio
JsonStatham Main 118.2 μs 0.12 μs 1.00
JsonStatham PR 119.0 μs 0.07 μs 1.01

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBo

This comment was marked as resolved.

@EgorBot
Copy link

EgorBot commented Aug 2, 2024

Benchmark results on Intel
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores
  Job-FSTTJQ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-OGLWUT : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Toolchain Mean Error Ratio
JsonStatham Main 102.1 μs 0.36 μs 1.00
JsonStatham PR 103.2 μs 0.32 μs 1.01

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBot
Copy link

EgorBot commented Aug 2, 2024

Benchmark results on Amd
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 8 logical and 4 physical cores
  Job-QGOVFI : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-WHTEKQ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method Toolchain Mean Error Ratio
JsonStatham Main 119.3 μs 0.28 μs 1.00
JsonStatham PR 115.7 μs 0.12 μs 0.97

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 2, 2024

Here are the runs with completely default settings (no attempts to stop any inlining)

Intel (Main):

Intel (this PR):

Amd (Main):

Amd (this PR):

I think I cannot confirm that my change "fixes" the flamegraphs, e.g. AMD (this PR) two still look random (while Main is ok). I guess I need to find the problem elsewhere, e.g. collect traces and compare inlining/tailcall decisions etc.

I still have an impression that arm64 is more stable

@EgorBo EgorBo closed this Aug 2, 2024
@EgorBo

This comment was marked as resolved.

1 similar comment
@EgorBo

This comment was marked as resolved.

@EgorBot
Copy link

EgorBot commented Aug 2, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-RIIJTD : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-LRGWEA : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain Mean Error Ratio
JsonStatham Main 121.9 μs 0.24 μs 1.00
JsonStatham PR 120.7 μs 0.26 μs 0.99

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBot
Copy link

EgorBot commented Aug 2, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-ASKJND : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-ILUEYY : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain Mean Error Ratio
JsonStatham Main 123.5 μs 0.23 μs 1.00
JsonStatham PR 123.0 μs 0.13 μs 1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 2, 2024

I wonder if it's possible to add Tier name to symbols..
UPD: ah, there is PerfMapShowOptimizationTiers

@EgorBo
Copy link
Member Author

EgorBo commented Aug 2, 2024

Test run with DOTNET_PerfMapShowOptimizationTiers=1

@EgorBot -arm64 -profiler

using System;
using System.Text.Json;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<MyBench>(args: args);

public class MyBench
{
    static MyObj[] Data = TestData();
    static TimeSpan TS = TimeSpan.FromDays(42);

    public static MyObj[] TestData()
    {
        MyObj[] testData = new MyObj[100];
        for (int i = 0; i < testData.Length; i++)
        {
            MyObj obj1 = new("Some long ASCII text bla bla bla", 42, Guid.NewGuid(), 3.14, TS, null);
            MyObj obj2 = new("'')((*&&^%$@#$%$^&*())''';(*&^%$E##^%$&%^*(", i, Guid.NewGuid(), 3.14, TS, obj1);
            testData[i] = obj2;
        }
        return testData;
    }

    [Benchmark]
    public object JsonStatham() => JsonSerializer.Serialize(Data);
}

public record MyObj(string Name, int Age, Guid Id, double SomeFloat, TimeSpan Ts, MyObj? InnerObj);

@EgorBot
Copy link

EgorBot commented Aug 2, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-REKULD : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-MMYPOS : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain Mean Error Ratio
JsonStatham Main 124.1 μs 0.39 μs 1.00
JsonStatham PR 123.0 μs 0.34 μs 0.99

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@dotnet dotnet deleted a comment from EgorBot Aug 7, 2024
@EgorBo
Copy link
Member Author

EgorBo commented Aug 7, 2024

@EgorBot -amd -intel -profiler

using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

public class Bench
{
    [Benchmark]
    public void WB()
    {
        Foo foo = new Foo();
        for (long i = 0; i < 200000000; i++)
            foo.x = foo;
    }
}

internal class Foo
{
    public volatile Foo x;
}

@dotnet dotnet deleted a comment from EgorBot Aug 7, 2024
@EgorBot
Copy link

EgorBot commented Aug 7, 2024

Benchmark results on Intel
BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 16 logical and 8 physical cores
  Job-LVMPHD : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-UNZSHV : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Toolchain Mean Error Ratio
WB Main 286.8 ms 0.03 ms 1.00
WB PR 286.8 ms 0.03 ms 1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBot
Copy link

EgorBot commented Aug 7, 2024

Benchmark results on Amd
BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 16 logical and 8 physical cores
  Job-BZIGNZ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-XVSRMV : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method Toolchain Mean Error Ratio
WB Main 370.8 ms 0.13 ms 1.00
WB PR 432.6 ms 0.05 ms 1.17

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 7, 2024

@EgorBot -arm64 -profiler

using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

public class Bench
{
    [Benchmark]
    public void WB()
    {
        Foo foo = new Foo();
        for (long i = 0; i < 200000000; i++)
            foo.x = foo;
    }
}

internal class Foo
{
    public volatile Foo x;
}

@EgorBot
Copy link

EgorBot commented Aug 7, 2024

Benchmark results on Arm64
BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Unknown processor
  Job-UFZEOE : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
  Job-IUYDGS : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD
Method Toolchain Mean Error Ratio
WB Main 470.1 ms 0.77 ms 1.00
WB PR 469.2 ms 0.53 ms 1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 7, 2024

@EgorBot -intel -amd -profiler

using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

public class Bench
{
    [Benchmark]
    public void WB()
    {
        Foo foo = new Foo();
        for (long i = 0; i < 200000000; i++)
            foo.x = foo;
    }
}

internal class Foo
{
    public volatile Foo x;
}

@EgorBot
Copy link

EgorBot commented Aug 7, 2024

Benchmark results on Intel
BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 16 logical and 8 physical cores
  Job-ZWXGRL : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-AXOANX : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Toolchain Mean Error Ratio
WB Main 229.8 ms 0.88 ms 1.00
WB PR 229.6 ms 0.05 ms 1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@EgorBot
Copy link

EgorBot commented Aug 7, 2024

Benchmark results on Amd
BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD EPYC 7763, 1 CPU, 16 logical and 8 physical cores
  Job-MZBLJW : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-TQIWLQ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method Toolchain Mean Error Ratio
WB Main 432.6 ms 0.06 ms 1.00
WB PR 432.5 ms 0.05 ms 1.00

BDN_Artifacts.zip

Flame graphs: Main vs PR 🔥
Hot asm: Main vs PR
Hot functions: Main vs PR

For clean perf results, make sure you have just one [Benchmark] in your app.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants