PGO: test failure in VolatileTest_op_mul on linux arm32 #57219

AndyAyersMS · 2021-08-11T16:40:49Z

See CI Run -- this test has been failing on and off.

_Uarm_arm_exidx_step: ip and cfa unchanged; stopping here (ip=0xf4dae536)
_Uarm_arm_exidx_step: ip and cfa unchanged; stopping here (ip=0xf4dae536)
/root/helix/work/workitem/JIT/jit64/opt/cse/VolatileTest_op_mul/VolatileTest_op_mul.sh: line 348:  1535 Aborted                 (core dumped)

category:correctness
theme:profile-feedback

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2021-08-11T16:40:52Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

AndyAyersMS · 2021-08-11T22:36:09Z

This doesn't seem to be PGO specific, just that PGO seems more likely to trigger it somehow.

Running this slightly modified version on arm Linux, built release, with COMPlus_TC_QuickJitForLoops=1, and with additional CPU stress, on a checked runtime, I can get it to hang about 1 run out of every 3.

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System;
using System.Collections.Generic;
using System.Runtime.CompilerServices;
using System.Threading;
using System.Text;

internal class Program
{
    private static int Main(string[] args)
    {
        Console.WriteLine("this test is designed to hang if jit cse doesnt honor volatile");

        for (int i = 0; i < 10; i++)
        {
            if (!TestCSE.Test()) return 1;
        }
        if (TestCSE.Test()) return 100;
        return 1;
    }
}

public class TestCSE
{
    private const int VAL1 = 0x404;
    private const int VAL2 = 0x03;
    private static volatile bool s_timeUp = false;

    private volatile int _a;
    private volatile int _b;

    private static int[] s_objs;

    static TestCSE()
    {
        s_objs = new int[3];
        s_objs[0] = VAL1;
        s_objs[1] = VAL1;
        s_objs[2] = VAL2;
    }

    public TestCSE()
    {
        _a = s_objs[0];
        _b = s_objs[1];
    }

    public static bool Equal(int val1, int val2)
    {
        if (val1 == val2)
            return true;
        return false;
    }

    public static bool TestFailed(int result, int expected1, int expected2, string tname)
    {
        if (result == expected1)
            return false;
        if (result == expected2)
            return false;
        Console.WriteLine("this failure may not repro everytime");
        Console.WriteLine("ERROR FAILED:" + tname + ",got val1=" + result + " expected value is, either " + expected1 + " or " + expected2);
        throw new Exception("check failed for " + tname);
    }

    [MethodImplAttribute(MethodImplOptions.NoInlining)]
    public bool TestOp()
    {
        long i;

        Thread.Sleep(0);


        _a = VAL1;
        _b = VAL1;
        for (i = 0; ; i++)
        {
            if (!Equal(_a * _b, _a * _b)) break;
            if (!Equal(_a * _b, _a * _b)) break;
            i++;
        }
        Console.WriteLine("Test 1 passed after " + i + " tries");


        _a = VAL1;
        _b = VAL1;
        for (i = 0; ; i++)
        {
            if (!Equal(_a * _b, VAL1 * VAL2)) break;
            if (!Equal(_a * _b, VAL1 * VAL2)) break;
        }
        Console.WriteLine("Test 2 passed after " + i + " tries");


        bool passed = false;
        _a = VAL1;
        _b = VAL1;
        for (i = 0; ; i++)
        {
            int ans1 = _a * _b;
            int ans2 = _a * _b;
            if (TestFailed(ans1, VAL1 * VAL1, VAL1 * VAL2, "Test 3") || TestFailed(ans2, VAL1 * VAL1, VAL1 * VAL2, "Test 3"))
            {
                passed = false;
                break;
            }

            if (ans1 != ans2)
            {
                passed = true;
                break;
            }
        }
        Console.WriteLine("Test 3 " + (passed ? "passed" : "failed") + " after " + i + " tries");

        return passed;
    }


    private void Flip()
    {
        for (uint i = 0; !s_timeUp; i++)
        {
            _a = s_objs[i % 2];

            _b = s_objs[(i % 2) + 1];
        }
    }


    public static bool Test()
    {
        s_timeUp = false;
        TestCSE o = new TestCSE();
        Thread t = new Thread(new ThreadStart(o.Flip));

        t.Start();
        bool ans = o.TestOp();
        s_timeUp = true;
        t.Join();

        return ans;
    }
}

In runtime we eventually hit a debug break in thread suspend on the tiered compilation worker thread indicating that thread suspension is taking too long:

* thread #7, name = '.NET Tiered Com', stop reason = signal SIGTRAP
  * frame #0: 0xf7ac99b6 libcoreclr.so`DBG_DebugBreak + 2
    frame #1: 0xf7a6c916 libcoreclr.so`::DebugBreak() at debug.cpp:406
    frame #2: 0xf7865eaa libcoreclr.so`ThreadSuspend::SuspendRuntime(reason=<unavailable>) at threadsuspend.cpp:3569
    frame #3: 0xf786847c libcoreclr.so`ThreadSuspend::SuspendEE(reason=SUSPEND_OTHER) at threadsuspend.cpp:5700
    frame #4: 0xf761d9b8 libcoreclr.so`CallCountingManager::StopAndDeleteAllCallCountingStubs() at callcounting.cpp:961
    frame #5: 0xf7721248 libcoreclr.so`TieredCompilationManager::DoBackgroundWork(this=0x0045e414, workDurationTicksRef=<unavailable>, minWorkDurationTicks=<unavailable>, maxWorkDurationTicks=50000000) at tieredcompilation.cpp:861
    frame #6: 0xf7720270 libcoreclr.so`TieredCompilationManager::BackgroundWorkerStart(this=0x0045e414) at tieredcompilation.cpp:483
    frame #7: 0xf77200f2 libcoreclr.so`TieredCompilationManager::BackgroundWorkerBootstrapper1((null)=<unavailable>) at tieredcompilation.cpp:431
    frame #8: 0xf771b86c libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchInner(pCallState=<unavailable>) at threads.cpp:7312
    frame #9: 0xf771b866 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7356
    frame #10: 0xf771b7b4 libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) [inlined] ManagedThreadBase_DispatchOuter(this=<unavailable>, pParam=<unavailable>)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::'lambda'(Param*)::operator()(Param*) const at threads.cpp:7514
    frame #11: 0xf771b7ae libcoreclr.so`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) at threads.cpp:7516
    frame #12: 0xf771b798 libcoreclr.so`ManagedThreadBase_DispatchOuter(pCallState=<unavailable>) at threads.cpp:7540
    frame #13: 0xf771bdf4 libcoreclr.so`ManagedThreadBase::KickOff(void (*)(void*), void*) [inlined] ManagedThreadBase_FullTransition(pTarget=<unavailable>, args=<unavailable>, filterType=ManagedThread)(void*), void*, UnhandledExceptionLocation) at threads.cpp:7560
    frame #14: 0xf771bdea libcoreclr.so`ManagedThreadBase::KickOff(pTarget=<unavailable>, args=<unavailable>)(void*), void*) at threads.cpp:7595
    frame #15: 0xf7720046 libcoreclr.so`TieredCompilationManager::BackgroundWorkerBootstrapper0(args=0x0048eba8) at tieredcompilation.cpp:412
    frame #16: 0xf7ac6434 libcoreclr.so`CorUnix::CPalThread::ThreadEntry(pvParam=0x0048e5d8) at thread.cpp:1862
    frame #17: 0xf7f93614 libpthread.so.0`start_thread(arg=0xc3af7410) at pthread_create.c:463
    frame #18: 0xf7d717fc libc.so.6 at clone.S:73

There are two threads in managed code, one in TestCSE.TestOp (tier0) and one in Test.Flip (tier0). TestCSE.Equal has just been jitted at Tier1 and has no GC safepoints:

; Assembly listing for method TestCSE:Equal(int,int):bool
; Emitting BLENDED_CODE for generic ARM CPU - Unix
; Tier-1 compilation
; optimized code
; sp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )     int  ->   r0         single-def
;  V01 arg1         [V01,T01] (  3,  3   )     int  ->   r1         single-def
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+00H]   "OutgoingArgSpace"
;
; Lcl frame size = 4

G_M17750_IG01:        ; gcVars=00000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, nogc <-- Prolog IG
000000  B508           push    {r3,lr}
                                                ;; bbWeight=1    PerfScore 1.00
G_M17750_IG02:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
000002  4288           cmp     r0, r1
000004  D101           bne     SHORT G_M17750_IG05
                                                ;; bbWeight=1    PerfScore 2.00
G_M17750_IG03:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
000006  2001           movs    r0, 1
                                                ;; bbWeight=0.50 PerfScore 0.50
G_M17750_IG04:        ; , epilog, nogc, extend
000008  BD08           pop     {r3,pc}
                                                ;; bbWeight=0.50 PerfScore 0.50
G_M17750_IG05:        ; gcVars=00000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
00000A  2000           movs    r0, 0
                                                ;; bbWeight=0.50 PerfScore 0.50
G_M17750_IG06:        ; , epilog, nogc, extend
00000C  BD08           pop     {r3,pc}
                                                ;; bbWeight=0.50 PerfScore 0.50

; Total bytes of code 14, prolog size 2, PerfScore 6.40, instruction count 7, allocated bytes for code 14 (MethodHash=977bbaa9) for method TestCSE:Equal(int,int):bool

The jit reports Equal is partially interruptible in the listing above, but doesn't define any interruptible ranges.

There's a seemingly relevant discussion over in #35274 -- seems like this fits the bill, a long running loop with short-lived calls.

@janvorli any ideas on why we seemingly can't suspend threads via hijacks in this case?
cc @dotnet/jit-contrib

briansull · 2021-08-11T22:57:24Z

Doesn't a method have to be fully-interruptible to support hijack-ing?

Since Equal is (incorrectly) marked partially interruptible, the hyjacking logic is not going to try to hijack when the PC is in that method. A partially interruptible method should always have a callsite in it.

briansull · 2021-08-11T23:05:22Z

I guess that for a method with straight-line code (no loops), we expect that the caller of the method will be partially interruptible, and that eventually the hi-jack retry logic will stop inside the caller method (and it will be partially interruptible)

briansull · 2021-08-11T23:10:28Z

Also this statement/comment doesn't give me very much confidence that this test is reliable:

Console.WriteLine("this test is designed to hang if jit cse doesnt honor volatile");

briansull · 2021-08-11T23:12:44Z

Also on (some?) Linix systems there used to be an issue with hi-jacking, such that we need to add GC-Polls to the method.

AndyAyersMS · 2021-08-13T01:36:05Z

Yes, it seems a bit odd to mark a method as partially interruptible without having any places it can actually be interrupted. Still not clear to me if the jit is at fault here. We do something similar for win x64 codegen, and I can't get the test to hang there.

AndyAyersMS · 2021-08-13T02:19:35Z

Looks like from the stress log LF_SYNC data that we never managed to stop in Equals:

Method Name:          TestCSE.Equal(Int32, Int32)
Class:                f15f0a98
MethodTable:          f4f38c28
mdToken:              06000005
Module:               f4fb962c
IsJitted:             yes
Current CodeAddr:     ef62ac88
Version History:
  ILCodeVersion:      00000000
  ReJIT ID:           0
  IL Addr:            f7fe72e3
     CodeAddr:           ef62ac88  (OptimizedTier1)
     NativeCodeVersion:  EE813DC8
     CodeAddr:           ef629f90  (QuickJitted)
     NativeCodeVersion:  00000000

grep  Stopped stress.txt | grep ef62ac   --> no hits

all the stopping points are in the caller, and (apparenly) none of them at a safepoint in the caller.

JulieLeeMSFT · 2021-08-13T21:51:34Z

CC VM team @mangod9, @davidwrighton, and @janvorli as an early headsup because @AndyAyersMS suspects this is either a runtime problem or a bad test.

JulieLeeMSFT · 2021-08-16T17:52:14Z

VM team, please take a look.

davidwrighton · 2021-08-16T19:19:20Z

What's supposed to happen is that we are supposed to hijack the thread running TestCSE.Equal, and at some point we should suspend during the following sequence of instructions.

000002  4288           cmp     r0, r1
000004  D101           bne     SHORT G_M17750_IG05
                                                ;; bbWeight=1    PerfScore 2.00
G_M17750_IG03:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
000006  2001           movs    r0, 1

If we did that, then we would hijack the return address. Unfortunately, from my understanding of the way instructions are handled by the various CPU's we support, I suspect the window for that happening is pretty small, as those are exceptionally cheap instructions. and the rest of the loop is heavy with a fair number of more expensive instructions such as CPU memory barriers, branches, and multiplies, etc.

AndyAyersMS · 2021-08-17T16:00:34Z

David looked at the stress logs and doesn't see anything we can address for .NET 6. So moving this to future.

echesakov · 2021-08-23T18:31:57Z

JIT/jit64/opt/cse/VolatileTest_op_xor also crashes on linux-arm with PGO - see console.log

AndyAyersMS · 2024-06-19T16:30:03Z

Believe this should be fixed by #95565

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Aug 11, 2021

AndyAyersMS self-assigned this Aug 11, 2021

AndyAyersMS added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed untriaged New issue has not been triaged by the area owner labels Aug 11, 2021

AndyAyersMS added this to the 6.0.0 milestone Aug 11, 2021

AndyAyersMS added arch-arm32 os-linux Linux OS (any supported distro) labels Aug 11, 2021

JulieLeeMSFT assigned davidwrighton Aug 13, 2021

AndyAyersMS modified the milestones: 6.0.0, Future Aug 17, 2021

AndyAyersMS mentioned this issue Feb 11, 2022

AddRemoveFromDifferentThreads<string>.ConcurrentStack benchmark hangs on ARM64 #64980

Closed

AndyAyersMS closed this as completed Jun 19, 2024

github-actions bot locked and limited conversation to collaborators Jul 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PGO: test failure in VolatileTest_op_mul on linux arm32 #57219

PGO: test failure in VolatileTest_op_mul on linux arm32 #57219

AndyAyersMS commented Aug 11, 2021 •

edited by BruceForstall

Loading

dotnet-issue-labeler bot commented Aug 11, 2021

AndyAyersMS commented Aug 11, 2021

briansull commented Aug 11, 2021 •

edited

Loading

briansull commented Aug 11, 2021 •

edited

Loading

briansull commented Aug 11, 2021

briansull commented Aug 11, 2021

AndyAyersMS commented Aug 13, 2021

AndyAyersMS commented Aug 13, 2021

JulieLeeMSFT commented Aug 13, 2021

JulieLeeMSFT commented Aug 16, 2021

davidwrighton commented Aug 16, 2021

AndyAyersMS commented Aug 17, 2021

echesakov commented Aug 23, 2021

AndyAyersMS commented Jun 19, 2024

PGO: test failure in VolatileTest_op_mul on linux arm32 #57219

PGO: test failure in VolatileTest_op_mul on linux arm32 #57219

Comments

AndyAyersMS commented Aug 11, 2021 • edited by BruceForstall Loading

dotnet-issue-labeler bot commented Aug 11, 2021

AndyAyersMS commented Aug 11, 2021

briansull commented Aug 11, 2021 • edited Loading

briansull commented Aug 11, 2021 • edited Loading

briansull commented Aug 11, 2021

briansull commented Aug 11, 2021

AndyAyersMS commented Aug 13, 2021

AndyAyersMS commented Aug 13, 2021

JulieLeeMSFT commented Aug 13, 2021

JulieLeeMSFT commented Aug 16, 2021

davidwrighton commented Aug 16, 2021

AndyAyersMS commented Aug 17, 2021

echesakov commented Aug 23, 2021

AndyAyersMS commented Jun 19, 2024

AndyAyersMS commented Aug 11, 2021 •

edited by BruceForstall

Loading

briansull commented Aug 11, 2021 •

edited

Loading

briansull commented Aug 11, 2021 •

edited

Loading