Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGO: test failure in VolatileTest_op_mul on linux arm32 #57219

Closed
AndyAyersMS opened this issue Aug 11, 2021 · 14 comments
Closed

PGO: test failure in VolatileTest_op_mul on linux arm32 #57219

AndyAyersMS opened this issue Aug 11, 2021 · 14 comments
Assignees
Labels
arch-arm32 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro)
Milestone

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Aug 11, 2021

See CI Run -- this test has been failing on and off.

_Uarm_arm_exidx_step: ip and cfa unchanged; stopping here (ip=0xf4dae536)
_Uarm_arm_exidx_step: ip and cfa unchanged; stopping here (ip=0xf4dae536)
/root/helix/work/workitem/JIT/jit64/opt/cse/VolatileTest_op_mul/VolatileTest_op_mul.sh: line 348:  1535 Aborted                 (core dumped)

category:correctness
theme:profile-feedback

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Aug 11, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@AndyAyersMS AndyAyersMS self-assigned this Aug 11, 2021
@AndyAyersMS AndyAyersMS added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner labels Aug 11, 2021
@AndyAyersMS AndyAyersMS added this to the 6.0.0 milestone Aug 11, 2021
@AndyAyersMS AndyAyersMS added arch-arm32 os-linux Linux OS (any supported distro) labels Aug 11, 2021
@AndyAyersMS
Copy link
Member Author

This doesn't seem to be PGO specific, just that PGO seems more likely to trigger it somehow.

Running this slightly modified version on arm Linux, built release, with COMPlus_TC_QuickJitForLoops=1, and with additional CPU stress, on a checked runtime, I can get it to hang about 1 run out of every 3.

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System;
using System.Collections.Generic;
using System.Runtime.CompilerServices;
using System.Threading;
using System.Text;

internal class Program
{
    private static int Main(string[] args)
    {
        Console.WriteLine("this test is designed to hang if jit cse doesnt honor volatile");

        for (int i = 0; i < 10; i++)
        {
            if (!TestCSE.Test()) return 1;
        }
        if (TestCSE.Test()) return 100;
        return 1;
    }
}

public class TestCSE
{
    private const int VAL1 = 0x404;
    private const int VAL2 = 0x03;
    private static volatile bool s_timeUp = false;

    private volatile int _a;
    private volatile int _b;

    private static int[] s_objs;

    static TestCSE()
    {
        s_objs = new int[3];
        s_objs[0] = VAL1;
        s_objs[1] = VAL1;
        s_objs[2] = VAL2;
    }

    public TestCSE()
    {
        _a = s_objs[0];
        _b = s_objs[1];
    }

    public static bool Equal(int val1, int val2)
    {
        if (val1 == val2)
            return true;
        return false;
    }

    public static bool TestFailed(int result, int expected1, int expected2, string tname)
    {
        if (result == expected1)
            return false;
        if (result == expected2)
            return false;
        Console.WriteLine("this failure may not repro everytime");
        Console.WriteLine("ERROR FAILED:" + tname + ",got val1=" + result + " expected value is, either " + expected1 + " or " + expected2);
        throw new Exception("check failed for " + tname);
    }

    [MethodImplAttribute(MethodImplOptions.NoInlining)]
    public bool TestOp()
    {
        long i;

        Thread.Sleep(0);


        _a = VAL1;
        _b = VAL1;
        for (i = 0; ; i++)
        {
            if (!Equal(_a * _b, _a * _b)) break;
            if (!Equal(_a * _b, _a * _b)) break;
            i++;
        }
        Console.WriteLine("Test 1 passed after " + i + " tries");


        _a = VAL1;
        _b = VAL1;
        for (i = 0; ; i++)
        {
            if (!Equal(_a * _b, VAL1 * VAL2)) break;
            if (!Equal(_a * _b, VAL1 * VAL2)) break;
        }
        Console.WriteLine("Test 2 passed after " + i + " tries");


        bool passed = false;
        _a = VAL1;
        _b = VAL1;
        for (i = 0; ; i++)
        {
            int ans1 = _a * _b;
            int ans2 = _a * _b;
            if (TestFailed(ans1, VAL1 * VAL1, VAL1 * VAL2, "Test 3") || TestFailed(ans2, VAL1 * VAL1, VAL1 * VAL2, "Test 3"))
            {
                passed = false;
                break;
            }

            if (ans1 != ans2)
            {
                passed = true;
                break;
            }
        }
        Console.WriteLine("Test 3 " + (passed ? "passed" : "failed") + " after " + i + " tries");

        return passed;
    }


    private void Flip()
    {
        for (uint i = 0; !s_timeUp; i++)
        {
            _a = s_objs[i % 2];

            _b = s_objs[(i % 2) + 1];
        }
    }


    public static bool Test()
    {
        s_timeUp = false;
        TestCSE o = new TestCSE();
        Thread t = new Thread(new ThreadStart(o.Flip));

        t.Start();
        bool ans = o.TestOp();
        s_timeUp = true;
        t.Join();

        return ans;
    }
}

In runtime we eventually hit a debug break in thread suspend on the tiered compilation worker thread indicating that thread suspension is taking too long:

* thread #7, name = '.NET Tiered Com', stop reason = signal SIGTRAP
  * frame #0: 0xf7ac99b6 libcoreclr.so`DBG_DebugBreak + 2
    frame #1: 0xf7a6c916 libcoreclr.so`::DebugBreak() at debug.cpp:406
    frame #2: 0xf7865eaa libcoreclr.so`ThreadSuspend::SuspendRuntime(reason=<unavailable>) at threadsuspend.cpp:3569
    frame #3: 0xf786847c libcoreclr.so`ThreadSuspend::SuspendEE(reason=SUSPEND_OTHER) at threadsuspend.cpp:5700
    frame #4: 0xf761d9b8 libcoreclr.so`CallCountingManager::StopAndDeleteAllCallCountingStubs() at callcounting.cpp:961
    frame #5: 0xf7721248 libcoreclr.so`TieredCompilationManager::DoBackgroundWork(this=0x0045e414, workDurationTicksRef=<unavailable>, minWorkDurationTicks=<unavailable>, maxWorkDurationTicks=50000000) at tieredcompilation.cpp:861
    frame #6: 0xf7720270 libcoreclr.so`TieredCompilationManager::BackgroundWorkerStart(this=0x0045e414) at tieredcompilation.cpp:483
    frame #7: 0xf77200f2 libcoreclr.so`TieredCompilationManager::BackgroundWorkerBootstrapper1((null)=<unavailable>) at tieredcompilation.cpp:431
    frame #8: 0xf771b86c libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchInner(pCallState=<unavailable>) at threads.cpp:7312
    frame #9: 0xf771b866 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7356
    frame #10: 0xf771b7b4 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchOuter(this=<unavailable>, pParam=<unavailable>)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(Param*)::operator()(Param*) const at threads.cpp:7514
    frame #11: 0xf771b7ae libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7516
    frame #12: 0xf771b798 libcoreclr.so`ManagedThreadBase_DispatchOuter(pCallState=<unavailable>) at threads.cpp:7540
    frame #13: 0xf771bdf4 libcoreclr.so`ManagedThreadBase::KickOff(void (*)(void*), void*) [inlined] ManagedThreadBase_FullTransition(pTarget=<unavailable>, args=<unavailable>, filterType=ManagedThread)(void*), void*, UnhandledExceptionLocation) at threads.cpp:7560
    frame #14: 0xf771bdea libcoreclr.so`ManagedThreadBase::KickOff(pTarget=<unavailable>, args=<unavailable>)(void*), void*) at threads.cpp:7595
    frame #15: 0xf7720046 libcoreclr.so`TieredCompilationManager::BackgroundWorkerBootstrapper0(args=0x0048eba8) at tieredcompilation.cpp:412
    frame #16: 0xf7ac6434 libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0048e5d8) at thread.cpp:1862
    frame #17: 0xf7f93614 libpthread.so.0`start_thread(arg=0xc3af7410) at pthread_create.c:463
    frame #18: 0xf7d717fc libc.so.6 at clone.S:73

There are two threads in managed code, one in TestCSE.TestOp (tier0) and one in Test.Flip (tier0). TestCSE.Equal has just been jitted at Tier1 and has no GC safepoints:

; Assembly listing for method TestCSE:Equal(int,int):bool
; Emitting BLENDED_CODE for generic ARM CPU - Unix
; Tier-1 compilation
; optimized code
; sp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )     int  ->   r0         single-def
;  V01 arg1         [V01,T01] (  3,  3   )     int  ->   r1         single-def
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+00H]   "OutgoingArgSpace"
;
; Lcl frame size = 4

G_M17750_IG01:        ; gcVars=00000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, nogc <-- Prolog IG
000000  B508           push    {r3,lr}
                                                ;; bbWeight=1    PerfScore 1.00
G_M17750_IG02:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
000002  4288           cmp     r0, r1
000004  D101           bne     SHORT G_M17750_IG05
                                                ;; bbWeight=1    PerfScore 2.00
G_M17750_IG03:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
000006  2001           movs    r0, 1
                                                ;; bbWeight=0.50 PerfScore 0.50
G_M17750_IG04:        ; , epilog, nogc, extend
000008  BD08           pop     {r3,pc}
                                                ;; bbWeight=0.50 PerfScore 0.50
G_M17750_IG05:        ; gcVars=00000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
00000A  2000           movs    r0, 0
                                                ;; bbWeight=0.50 PerfScore 0.50
G_M17750_IG06:        ; , epilog, nogc, extend
00000C  BD08           pop     {r3,pc}
                                                ;; bbWeight=0.50 PerfScore 0.50

; Total bytes of code 14, prolog size 2, PerfScore 6.40, instruction count 7, allocated bytes for code 14 (MethodHash=977bbaa9) for method TestCSE:Equal(int,int):bool

The jit reports Equal is partially interruptible in the listing above, but doesn't define any interruptible ranges.

There's a seemingly relevant discussion over in #35274 -- seems like this fits the bill, a long running loop with short-lived calls.

@janvorli any ideas on why we seemingly can't suspend threads via hijacks in this case?
cc @dotnet/jit-contrib

@briansull
Copy link
Contributor

briansull commented Aug 11, 2021

Doesn't a method have to be fully-interruptible to support hijack-ing?

Since Equal is (incorrectly) marked partially interruptible, the hyjacking logic is not going to try to hijack when the PC is in that method. A partially interruptible method should always have a callsite in it.

@briansull
Copy link
Contributor

briansull commented Aug 11, 2021

I guess that for a method with straight-line code (no loops), we expect that the caller of the method will be partially interruptible, and that eventually the hi-jack retry logic will stop inside the caller method (and it will be partially interruptible)

@briansull
Copy link
Contributor

Also this statement/comment doesn't give me very much confidence that this test is reliable:

Console.WriteLine("this test is designed to hang if jit cse doesnt honor volatile");

@briansull
Copy link
Contributor

Also on (some?) Linix systems there used to be an issue with hi-jacking, such that we need to add GC-Polls to the method.

@AndyAyersMS
Copy link
Member Author

Yes, it seems a bit odd to mark a method as partially interruptible without having any places it can actually be interrupted. Still not clear to me if the jit is at fault here. We do something similar for win x64 codegen, and I can't get the test to hang there.

@AndyAyersMS
Copy link
Member Author

Looks like from the stress log LF_SYNC data that we never managed to stop in Equals:

Method Name:          TestCSE.Equal(Int32, Int32)
Class:                f15f0a98
MethodTable:          f4f38c28
mdToken:              06000005
Module:               f4fb962c
IsJitted:             yes
Current CodeAddr:     ef62ac88
Version History:
  ILCodeVersion:      00000000
  ReJIT ID:           0
  IL Addr:            f7fe72e3
     CodeAddr:           ef62ac88  (OptimizedTier1)
     NativeCodeVersion:  EE813DC8
     CodeAddr:           ef629f90  (QuickJitted)
     NativeCodeVersion:  00000000

grep  Stopped stress.txt | grep ef62ac   --> no hits

all the stopping points are in the caller, and (apparenly) none of them at a safepoint in the caller.

@JulieLeeMSFT
Copy link
Member

CC VM team @mangod9, @davidwrighton, and @janvorli as an early headsup because @AndyAyersMS suspects this is either a runtime problem or a bad test.

@JulieLeeMSFT
Copy link
Member

VM team, please take a look.

@davidwrighton
Copy link
Member

What's supposed to happen is that we are supposed to hijack the thread running TestCSE.Equal, and at some point we should suspend during the following sequence of instructions.

000002  4288           cmp     r0, r1
000004  D101           bne     SHORT G_M17750_IG05
                                                ;; bbWeight=1    PerfScore 2.00
G_M17750_IG03:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
000006  2001           movs    r0, 1

If we did that, then we would hijack the return address. Unfortunately, from my understanding of the way instructions are handled by the various CPU's we support, I suspect the window for that happening is pretty small, as those are exceptionally cheap instructions. and the rest of the loop is heavy with a fair number of more expensive instructions such as CPU memory barriers, branches, and multiplies, etc.

@AndyAyersMS
Copy link
Member Author

David looked at the stress logs and doesn't see anything we can address for .NET 6. So moving this to future.

@AndyAyersMS AndyAyersMS modified the milestones: 6.0.0, Future Aug 17, 2021
@echesakov
Copy link
Contributor

JIT/jit64/opt/cse/VolatileTest_op_xor also crashes on linux-arm with PGO - see console.log

@AndyAyersMS
Copy link
Member Author

Believe this should be fixed by #95565

@github-actions github-actions bot locked and limited conversation to collaborators Jul 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm32 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests

5 participants