Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paltest_pal_sxs_test1 failing on NetBSD #5158

Closed
krytarowski opened this issue Feb 21, 2016 · 44 comments
Closed

paltest_pal_sxs_test1 failing on NetBSD #5158

krytarowski opened this issue Feb 21, 2016 · 44 comments
Labels
help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@krytarowski
Copy link
Contributor

Currently paltest_pal_sxs_test1 is disabled on FreeBSD. I would like to know why?

Should I disable it on NetBSD as well? Is it testing crucial functionality?

@krytarowski
Copy link
Contributor Author

@jkotas @janvorli

@ghost
Copy link

ghost commented Feb 21, 2016

Related to https://github.com/dotnet/coreclr/issues/2090. That issue has up-for-grabs label, so we can probably take a stab at it.

@krytarowski
Copy link
Contributor Author

Thanks. For now I will push a patch to disable this test on NetBSD as well and try to work with @mikem8361 towards investigating it.

krytarowski referenced this issue in krytarowski/coreclr Feb 22, 2016
This test has been also disabled on FreeBSD as hardware exceptions
always seem to abort on NetBSD as well.

Related issues: dotnet/coreclr#2090 dotnet/coreclr#3287
@krytarowski
Copy link
Contributor Author

This functionality is critical for CoreFX tests.

Part of dlltest1.cpp.i

extern "C"
int DllTest1()
{
    Trace("Starting pal_sxs test1 DllTest1\n");

    { void* __param = 0; auto tryBlock = [](void* unused) {
    {
        volatile int* p = (volatile int *)0x11;

        bTry = 1;
        *p = 1;

        Fail("ERROR: code was executed after the access violation.\n");
    }
    }; const bool isFinally = false; auto finallyBlock = []() {}; EXCEPTION_DISPOSITION disposition = -1; auto exceptionFilter = [&disposition, &__param](PAL_SEHException& ex) { disposition = 1; do { if (!(disposition != -1)) { PAL_fprintf ((PAL_get_stderr(0)), "ASSERT FAILED\n" "\tExpression: %s\n" "\tLocation:   line %d in %s\n" "\tFunction:   %s\n" "\tProcess:    %d\n", "disposition != EXCEPTION_CONTINUE_EXECUTION", 40, "/tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/tests/palsuite/exception_handling/pal_sxs/test1/dlltest1.cpp", __FUNCTION__, GetCurrentProcessId()); DebugBreak(); } }while (0); return disposition; }; try { CatchHardwareExceptionHolder __catchHardwareException; auto __exceptionHolder = NativeExceptionHolderFactory::CreateHolder(&exceptionFilter); __exceptionHolder.Push(); tryBlock(__param); } catch (PAL_SEHException& ex) { if (disposition == -1) { exceptionFilter(ex); } if (disposition == 0) { throw; }
    {
        if (!bTry)
        {
            Fail("ERROR: PAL_EXCEPT was hit without PAL_TRY being hit.\n");
        }


        if (ex.ExceptionRecord.ExceptionInformation[1] != 0x11)
        {
            Fail("ERROR: PAL_EXCEPT ExceptionInformation[1] != 0x11\n");
        }

        bExcept = 1;
    }
    }; if (isFinally) { try { tryBlock(__param); } catch (...) { finallyBlock(); throw; } finallyBlock(); } };

    if (!bTry)
    {
        Trace("ERROR: the code in the PAL_TRY block was not executed.\n");
    }

    if (!bExcept)
    {
        Trace("ERROR: the code in the PAL_EXCEPT block was not executed.\n");
    }


    if(!bTry || !bExcept)
    {
        Fail("DllTest1 FAILED\n");
    }

    Trace("DLLTest1 PASSED\n");
    return PASS;
}

On NetBSD this tests fails because there is thrown and not caught exception (it's C++ exception with "throw").

@janvorli @jkotas have you got pointers what may be wrong/missing? The same issue is likely on FreeBSD.

@krytarowski
Copy link
Contributor Author

$ LD_LIBRARY_PATH=. ./paltest_pal_sxs_test1  
PAL_SXS test1 SIGSEGV handler 0x0
Starting pal_sxs test1 DllTest2
terminate called after throwing an instance of 'PAL_SEHException'
Abort (core dumped)

@krytarowski
Copy link
Contributor Author

DllTest1 fails the same way -- it doesn't matter that there are two DllTests. Calling just one or the other results in the same termination.

@janvorli
Copy link
Member

janvorli commented Mar 8, 2016

@krytarowski I would try to run it under a debugger with breakpoints set to __cxa_throw and __cxa_begin_catch. Then you would see exactly which throw was not being caught and where each throw is caught.

@krytarowski
Copy link
Contributor Author

GDB/LLDB doesn't want to attach to __cxa_throw neither __cxa_begin_catch. I checked that these symbols do exist in libstdc++.

@krytarowski
Copy link
Contributor Author

C++ exceptions may be caught:
        catch throw               - all exceptions, when thrown
        catch catch               - all exceptions, when caught

Doesn't work either. I will try to investigate it.

@janvorli
Copy link
Member

janvorli commented Mar 8, 2016

That's strange, it works fine on Linux, I've been using it a lot for debugging in the past. Maybe the function names have different number of underscores or something?

@krytarowski
Copy link
Contributor Author

$ nm /usr/lib/libstdc++.so|grep -E 'cxa_throw|begin_catch'
0000000000081b17 T __cxa_begin_catch
0000000000080c2a T __cxa_throw

@krytarowski
Copy link
Contributor Author

OK, I know what was going on. I had to first load library and make it sound to GDB:

$ LD_LIBRARY_PATH=. gdb --args ./paltest_pal_sxs_test1  
GNU gdb (GDB) 7.10.1
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./paltest_pal_sxs_test1...done.
(gdb) b __cxa_throw
Function "__cxa_throw" not defined.
Make breakpoint pending on future shared library load? (y or [n]) n
(gdb) r
Starting program: /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/bin/obj/NetBSD.x64.Debug/src/pal/tests/palsuite/exception_handling/pal_sxs/test1/paltest_pal_sxs_test1 
PAL_SXS test1 SIGSEGV handler 0x0
Starting pal_sxs test1 DllTest2

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1]
0x00007f7ff74074f4 in DllTest2::$_1::operator() (this=0x7f7fffffd9d8, unused=0x0)
    at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/tests/palsuite/exception_handling/pal_sxs/test1/dlltest2.cpp:36
36              *p = 2;                                 // Causes an access violation exception
(gdb) b __cxa_throw
Breakpoint 1 at 0x7f7ff5c80c2a: file /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_throw.cc, line 62.
(gdb) b __cxa_begin_catch
Breakpoint 2 at 0x7f7ff5c81b17: file /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_catch.cc, line 41.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/bin/obj/NetBSD.x64.Debug/src/pal/tests/palsuite/exception_handling/pal_sxs/test1/paltest_pal_sxs_test1 
PAL_SXS test1 SIGSEGV handler 0x0
Starting pal_sxs test1 DllTest2

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1]
0x00007f7ff74074f4 in DllTest2::$_1::operator() (this=0x7f7fffffd9d8, unused=0x0)
    at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/tests/palsuite/exception_handling/pal_sxs/test1/dlltest2.cpp:36
36              *p = 2;                                 // Causes an access violation exception
(gdb)

@krytarowski
Copy link
Contributor Author

It looks like SIGSEGV isn't caught.

I know that GNU libsigsegv works on NetBSD. I'm not sure how about GCJ.

@janvorli
Copy link
Member

janvorli commented Mar 8, 2016

The GDB always catches the sigsegv first, you should just do "c" after that.

@krytarowski
Copy link
Contributor Author

I see. You are right.

(gdb) c
Continuing.

Breakpoint 1, __cxxabiv1::__cxa_throw (obj=0x7f7ff7321880, tinfo=0x7f7ff772b0a0 <typeinfo for PAL_SEHException>, dest=0x0)
    at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_throw.cc:62
62      {
(gdb) n
65        __cxa_eh_globals *globals = __cxa_get_globals ();
(gdb) 
66        globals->uncaughtExceptions += 1;
(gdb) 
71        header->referenceCount = 1;
(gdb) 
72        header->exc.exceptionType = tinfo;
(gdb) 
73        header->exc.exceptionDestructor = dest;
(gdb) 
74        header->exc.unexpectedHandler = __unexpected_handler;
(gdb) 
75        header->exc.terminateHandler = __terminate_handler;
(gdb) 
76        __GXX_INIT_PRIMARY_EXCEPTION_CLASS(header->exc.unwindHeader.exception_class);
(gdb) 
77        header->exc.unwindHeader.exception_cleanup = __gxx_exception_cleanup;
(gdb) 
82        _Unwind_RaiseException (&header->exc.unwindHeader);
(gdb) 
86        __cxa_begin_catch (&header->exc.unwindHeader);
(gdb) 

Breakpoint 2, __cxxabiv1::__cxa_begin_catch (exc_obj_in=0x7f7ff7321860) at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_catch.cc:41
41      {
(gdb) 
44        __cxa_eh_globals *globals = __cxa_get_globals ();
(gdb) 
45        __cxa_exception *prev = globals->caughtExceptions;
(gdb) 
46        __cxa_exception *header = __get_exception_header_from_ue (exceptionObject);
(gdb) 
53        if (!__is_gxx_exception_class(header->unwindHeader.exception_class))
(gdb) 
66        int count = header->handlerCount;
(gdb) 
69        if (count < 0)
(gdb) 
72          count += 1;
(gdb) 
73        header->handlerCount = count;
(gdb) 
74        globals->uncaughtExceptions -= 1;
(gdb) 
76        if (header != prev)
(gdb) 
78            header->nextException = prev;
(gdb) 
79            globals->caughtExceptions = header;
(gdb) 
82        objectp = __gxx_caught_object(exceptionObject);
(gdb) 
90      }
(gdb) 
__cxxabiv1::__cxa_throw (obj=0x7f7ff7321880, tinfo=0x7f7ff772b0a0 <typeinfo for PAL_SEHException>, dest=0x0)
    at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_throw.cc:87
87        std::terminate ();
(gdb) 
terminate called after throwing an instance of 'PAL_SEHException'

Breakpoint 2, __cxxabiv1::__cxa_begin_catch (exc_obj_in=0x7f7ff7321860) at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_catch.cc:41
41      {
(gdb) 
44        __cxa_eh_globals *globals = __cxa_get_globals ();
(gdb) 
45        __cxa_exception *prev = globals->caughtExceptions;
(gdb) 
46        __cxa_exception *header = __get_exception_header_from_ue (exceptionObject);
(gdb) 
53        if (!__is_gxx_exception_class(header->unwindHeader.exception_class))
(gdb) 
66        int count = header->handlerCount;
(gdb) 
69        if (count < 0)
(gdb) 
70          count = -count + 1;
(gdb) 
73        header->handlerCount = count;
(gdb) 
74        globals->uncaughtExceptions -= 1;
(gdb) 
76        if (header != prev)
(gdb) 
82        objectp = __gxx_caught_object(exceptionObject);
(gdb) 
90      }
(gdb) 
__gnu_cxx::__verbose_terminate_handler () at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/vterminate.cc:95
95          abort();
(gdb) 

Program received signal SIGABRT, Aborted.
0x00007f7ff590669a in _lwp_kill () from /usr/lib/libc.so.12
(gdb) 

@krytarowski
Copy link
Contributor Author

How/what to extract useful information from this position?

@janvorli
Copy link
Member

janvorli commented Mar 8, 2016

Can you also dump stack when you hit each breakpoint?

@krytarowski
Copy link
Contributor Author

(gdb) r
Starting program: /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/bin/obj/NetBSD.x64.Debug/src/pal/tests/palsuite/exception_handling/pal_sxs/test1/paltest_pal_sxs_test1 
PAL_SXS test1 SIGSEGV handler 0x0
Starting pal_sxs test1 DllTest2

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1]
0x00007f7ff74074f4 in DllTest2::$_1::operator() (this=0x7f7fffffd9d8, unused=0x0)
    at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/tests/palsuite/exception_handling/pal_sxs/test1/dlltest2.cpp:36
36              *p = 2;                                 // Causes an access violation exception
(gdb) c
Continuing.

Breakpoint 1, __cxxabiv1::__cxa_throw (obj=0x7f7ff7321880, tinfo=0x7f7ff772b0a0 <typeinfo for PAL_SEHException>, dest=0x0)
    at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_throw.cc:62
62      {
(gdb) bt
#0  __cxxabiv1::__cxa_throw (obj=0x7f7ff7321880, tinfo=0x7f7ff772b0a0 <typeinfo for PAL_SEHException>, dest=0x0)
    at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_throw.cc:62
dotnet/coreclr#1  0x00007f7ff742a5e4 in SEHProcessException (pointers=0x7f7fffffd460) at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/src/exception/seh.cpp:178
dotnet/coreclr#2  0x00007f7ff742c98b in common_signal_handler (pointers=0x7f7fffffd460, code=11, ucontext=0x7f7fffffd5a0)
    at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/src/exception/signal.cpp:657
dotnet/coreclr#3  0x00007f7ff742beab in sigsegv_handler (code=11, siginfo=0x7f7fffffd520, context=0x7f7fffffd5a0)
    at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/src/exception/signal.cpp:332
dotnet/coreclr#4  0x00007f7ff589b0b0 in _opendir (name=<optimized out>) at /usr/src/lib/libc/gen/opendir.c:72
dotnet/coreclr#5  0x000000010000000b in ?? ()
dotnet/coreclr#6  0x0000000000000000 in ?? ()
(gdb) c
Continuing.

Breakpoint 2, __cxxabiv1::__cxa_begin_catch (exc_obj_in=0x7f7ff7321860) at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_catch.cc:41
41      {
(gdb) bt
#0  __cxxabiv1::__cxa_begin_catch (exc_obj_in=0x7f7ff7321860) at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_catch.cc:41
dotnet/coreclr#1  0x00007f7ff5c80c98 in __cxxabiv1::__cxa_throw (obj=0x7f7ff7321880, tinfo=0x7f7ff772b0a0 <typeinfo for PAL_SEHException>, dest=0x0)
    at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_throw.cc:86
dotnet/coreclr#2  0x00007f7ff742a5e4 in SEHProcessException (pointers=0x7f7fffffd460) at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/src/exception/seh.cpp:178
dotnet/coreclr#3  0x00007f7ff742c98b in common_signal_handler (pointers=0x7f7fffffd460, code=11, ucontext=0x7f7fffffd5a0)
    at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/src/exception/signal.cpp:657
dotnet/coreclr#4  0x00007f7ff742beab in sigsegv_handler (code=11, siginfo=0x7f7fffffd520, context=0x7f7fffffd5a0)
    at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/src/exception/signal.cpp:332
dotnet/coreclr#5  0x00007f7ff589b0b0 in _opendir (name=<optimized out>) at /usr/src/lib/libc/gen/opendir.c:72
dotnet/coreclr#6  0x000000010000000b in ?? ()
dotnet/coreclr#7  0x0000000000000000 in ?? ()
(gdb) c
Continuing.
terminate called after throwing an instance of 'PAL_SEHException'

Breakpoint 2, __cxxabiv1::__cxa_begin_catch (exc_obj_in=0x7f7ff7321860) at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_catch.cc:41
41      {
(gdb) bt
#0  __cxxabiv1::__cxa_begin_catch (exc_obj_in=0x7f7ff7321860) at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_catch.cc:41
dotnet/coreclr#1  0x00007f7ff5c790f1 in __gnu_cxx::__verbose_terminate_handler () at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/vterminate.cc:90
dotnet/coreclr#2  0x00007f7ff5c80cf0 in __cxxabiv1::__terminate (handler=<optimized out>) at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_terminate.cc:38
dotnet/coreclr#3  0x00007f7ff5c80d33 in std::terminate () at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_terminate.cc:48
dotnet/coreclr#4  0x00007f7ff5c80c9d in __cxxabiv1::__cxa_throw (obj=0x7f7ff7321880, tinfo=0x7f7ff772b0a0 <typeinfo for PAL_SEHException>, dest=0x0)
    at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_throw.cc:87
dotnet/coreclr#5  0x00007f7ff742a5e4 in SEHProcessException (pointers=0x7f7fffffd460) at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/src/exception/seh.cpp:178
dotnet/coreclr#6  0x00007f7ff742c98b in common_signal_handler (pointers=0x7f7fffffd460, code=11, ucontext=0x7f7fffffd5a0)
    at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/src/exception/signal.cpp:657
dotnet/coreclr#7  0x00007f7ff742beab in sigsegv_handler (code=11, siginfo=0x7f7fffffd520, context=0x7f7fffffd5a0)
    at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/src/exception/signal.cpp:332
dotnet/coreclr#8  0x00007f7ff589b0b0 in _opendir (name=<optimized out>) at /usr/src/lib/libc/gen/opendir.c:72
dotnet/coreclr#9  0x000000010000000b in ?? ()
dotnet/coreclr#10 0x0000000000000000 in ?? ()
(gdb) c
Continuing.

Program received signal SIGABRT, Aborted.
0x00007f7ff590669a in _lwp_kill () from /usr/lib/libc.so.12
(gdb) bt
#0  0x00007f7ff590669a in _lwp_kill () from /usr/lib/libc.so.12
dotnet/coreclr#1  0x00007f7ff5906325 in abort () at /usr/src/lib/libc/stdlib/abort.c:74
dotnet/coreclr#2  0x00007f7ff5c790b6 in __gnu_cxx::__verbose_terminate_handler () at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/vterminate.cc:95
dotnet/coreclr#3  0x00007f7ff5c80cf0 in __cxxabiv1::__terminate (handler=<optimized out>) at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_terminate.cc:38
dotnet/coreclr#4  0x00007f7ff5c80d33 in std::terminate () at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_terminate.cc:48
dotnet/coreclr#5  0x00007f7ff5c80c9d in __cxxabiv1::__cxa_throw (obj=0x7f7ff7321880, tinfo=0x7f7ff772b0a0 <typeinfo for PAL_SEHException>, dest=0x0)
    at /usr/src/external/gpl3/gcc.old/dist/libstdc++-v3/libsupc++/eh_throw.cc:87
dotnet/coreclr#6  0x00007f7ff742a5e4 in SEHProcessException (pointers=0x7f7fffffd460) at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/src/exception/seh.cpp:178
dotnet/coreclr#7  0x00007f7ff742c98b in common_signal_handler (pointers=0x7f7fffffd460, code=11, ucontext=0x7f7fffffd5a0)
    at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/src/exception/signal.cpp:657
dotnet/coreclr#8  0x00007f7ff742beab in sigsegv_handler (code=11, siginfo=0x7f7fffffd520, context=0x7f7fffffd5a0)
    at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/src/exception/signal.cpp:332
dotnet/coreclr#9  0x00007f7ff589b0b0 in _opendir (name=<optimized out>) at /usr/src/lib/libc/gen/opendir.c:72
dotnet/coreclr#10 0x000000010000000b in ?? ()
dotnet/coreclr#11 0x0000000000000000 in ?? ()
(gdb)

@krytarowski
Copy link
Contributor Author

I will be back to it on the evening.

@janvorli
Copy link
Member

janvorli commented Mar 8, 2016

@krytarowski My guess is that NetBSD is unable to propagate exception and through the signal trampoline. And probably even unwind stack through it, which would explain the nonsense frames 10 and 11.

@janvorli
Copy link
Member

janvorli commented Mar 8, 2016

If I am right, then we will need to handle hardware errors differently, in a way similar to what we do on OSX. That means by modifying the context passed in by the signal, redirecting it to an exception handling function and then returning from the signal. See HijackFaultingThread in the src/pal/src/exception/machexception.cpp

@krytarowski
Copy link
Contributor Author

Thanks! I will have a look at it.

@mikem8361
Copy link
Member

@krytarowski, @janvorli, it looks the NetBSD’s C++ runtime isn’t allowing the throw of a PAL_SEHException to be catch by the try/catch (which are wrapped in the PAL_TRY/PAL_EXCEPT/etc. macros) in the dlltest1/dlltest2 test code. In the dlltest1.cpp.i file I don’t actually see the “try”, “catch”. The CatchHardwareExceptionHolder count was non-zero (the reason the SEHProcessException code is throwing the exception) which means the PAL_TRY/etc. macros must have done that much. The h/w exception holder “enable” is in the HardwareExceptionHolder macro inside of PAL_EXCEPT macro. I hope this helps.

@janvorli
Copy link
Member

janvorli commented Mar 8, 2016

@mikem8361 - I think you may have missed the the try / catches in there, since they are all in one line. I have reformatted the code here so that it is visible well:

extern "C"
int DllTest1()
{
    Trace("Starting pal_sxs test1 DllTest1\n");

    { 
        void* __param = 0; 
        auto tryBlock = [](void* unused) 
        {
            {
                volatile int* p = (volatile int *)0x11;

                bTry = 1;
                *p = 1;

                Fail("ERROR: code was executed after the access violation.\n");
            }
        }; 
        const bool isFinally = false; 
        auto finallyBlock = []() {}; 
        EXCEPTION_DISPOSITION disposition = -1; 
        auto exceptionFilter = [&disposition, &__param](PAL_SEHException& ex) 
        { 
            disposition = 1; 
            do 
            { 
                if (!(disposition != -1)) 
                { 
                    PAL_fprintf ((PAL_get_stderr(0)), "ASSERT FAILED\n" "\tExpression: %s\n" "\tLocation:   line %d in %s\n" "\tFunction:   %s\n" "\tProcess:    %d\n", "disposition != EXCEPTION_CONTINUE_EXECUTION", 40, "/tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/tests/palsuite/exception_handling/pal_sxs/test1/dlltest1.cpp", __FUNCTION__, GetCurrentProcessId()); DebugBreak(); 
                } 
            }while (0); 
            return disposition; 
        }; 
        try 
        {    
            CatchHardwareExceptionHolder __catchHardwareException; 
            auto __exceptionHolder = NativeExceptionHolderFactory::CreateHolder(&exceptionFilter); 
            __exceptionHolder.Push(); 
            tryBlock(__param); 
        } 
        catch (PAL_SEHException& ex) 
        { 
            if (disposition == -1) 
            { 
                exceptionFilter(ex); 
            } 
            if (disposition == 0) 
            { 
                throw; 
            }
            {
                if (!bTry)
                {
                    Fail("ERROR: PAL_EXCEPT was hit without PAL_TRY being hit.\n");
                }

                if (ex.ExceptionRecord.ExceptionInformation[1] != 0x11)
                {
                    Fail("ERROR: PAL_EXCEPT ExceptionInformation[1] != 0x11\n");
                }

                bExcept = 1;
            }
        }; 
        if (isFinally) 
        { 
            try 
            { 
                tryBlock(__param); 
            } 
            catch (...) 
            { 
                finallyBlock(); 
                throw; 
            } 
            finallyBlock(); 
        } 
    };

    if (!bTry)
    {
        Trace("ERROR: the code in the PAL_TRY block was not executed.\n");
    }

    if (!bExcept)
    {
        Trace("ERROR: the code in the PAL_EXCEPT block was not executed.\n");
    }


    if(!bTry || !bExcept)
    {
        Fail("DllTest1 FAILED\n");
    }

    Trace("DLLTest1 PASSED\n");
    return PASS;
}

@mikem8361
Copy link
Member

@krytarowski, @janvorli. Thanks. I see them now. But I don’t understand why on NetBSD they don’t catch the PAL_SEHException that is thrown. Your suggestion of looking at the mach exception hijacking may not help. On OSX that code sets up the fautling thread from the exception thread to end up in the same SEHProcessException code that throws the software exception.

@janvorli
Copy link
Member

janvorli commented Mar 8, 2016

@mikem8361 As I've said to @krytarowski, the problem is most likely that the exception unwinding cannot unwind stack across the signal handler trampoline. As you can see from the stack dumps, the frames displayed by the GDB after the sigsegv_handler frame don't make sense, so the GDB itself is not able to cross the signal trampoline either.
That's why I've suggested that we will need to do that in a similar way to what we do on OSX - create helper frame that the stack walker can cross, modify the context to point to our helper frame and return from the signal handler.

@krytarowski
Copy link
Contributor Author

I asked @jsonn, and he confirmed that so far we have no support for unwinding this stack.

To my understanding (I may be incorrect) to add support in for it in NetBSD, we need to add __register_frame in libc(3).

For the current stable release (7.0) I will try with the path recommended by @janvorli

@krytarowski
Copy link
Contributor Author

I was debugging it with help from Christos Zoulas (christos / netbsd.org).

It seems that the trick to switch context and throw from EH is to go for the following patch:

diff --git a/coreclr-git/patches/patch-src_pal_src_exception_signal.cpp b/coreclr-git/patches/patch-src_pal_src_exception_signal.cpp
new file mode 100644
index 0000000..54b4fa8
--- /dev/null
+++ b/coreclr-git/patches/patch-src_pal_src_exception_signal.cpp
@@ -0,0 +1,16 @@
+$NetBSD$
+
+--- src/pal/src/exception/signal.cpp.orig      2016-02-17 20:49:34.000000000 +0000
++++ src/pal/src/exception/signal.cpp
+@@ -654,7 +654,10 @@ static void common_signal_handler(PEXCEP
+         ASSERT("pthread_sigmask failed; error number is %d\n", sigmaskRet);
+     }
+ 
+-    SEHProcessException(pointers);
++    MCREG_Rip(ucontext->uc_mcontext) = reinterpret_cast<unsigned long>(SEHProcessException);
++    MCREG_Rdi(ucontext->uc_mcontext) = reinterpret_cast<unsigned long>(pointers);
++    setcontext(ucontext);
++//    SEHProcessException(pointers);
+ }
+ 
+ /*++

However it still requires stack restoration.

(gdb) bt
#0  SEHProcessException (pointers=0x7f7fffffd410) at /tmp/pkgsrc-tmp/wip/coreclr-git/work/coreclr/src/pal/src/exception/seh.cpp:167
dotnet/coreclr#1  0x00007f7fffffd910 in ?? ()
dotnet/coreclr#2  0x0000000000000022 in ?? ()
dotnet/coreclr#3  0x0000000000000000 in ?? ()

@janvorli have you got pointers how to continue? Some LLVM libunwind usage?

Thanks!

@krytarowski
Copy link
Contributor Author

Another question:

static void common_signal_handler(PEXCEPTION_POINTERS pointers, int code,
                                  native_context_t *ucontext)
{
    sigset_t signal_set;
    CONTEXT context;

    // Pre-populate context with data from current frame, because ucontext doesn't have some data (e.g. SS register)
    // which is required for restoring context
    RtlCaptureContext(&context);

    // Fill context record with required information. from pal.h:
    // On non-Win32 platforms, the CONTEXT pointer in the
    // PEXCEPTION_POINTERS will contain at least the CONTEXT_CONTROL registers.
    CONTEXTFromNativeContext(ucontext, &context, CONTEXT_CONTROL | CONTEXT_INTEGER | CONTEXT_FLOATING_POINT);

    pointers->ContextRecord = &context;

    /* Unmask signal so we can receive it again */
    sigemptyset(&signal_set);
    sigaddset(&signal_set, code);
    int sigmaskRet = pthread_sigmask(SIG_UNBLOCK, &signal_set, NULL);
    if (sigmaskRet != 0)
    {
        ASSERT("pthread_sigmask failed; error number is %d\n", sigmaskRet);
    }

    MCREG_Rip(ucontext->uc_mcontext) = reinterpret_cast<unsigned long>(SEHProcessException);
    MCREG_Rdi(ucontext->uc_mcontext) = reinterpret_cast<unsigned long>(pointers);
    setcontext(ucontext);
//    SEHProcessException(pointers);
}

In the above function we are saving pointer to a stack object (CONTEXT context). Will be that garbage after calling other function? If not, why?

@krytarowski
Copy link
Contributor Author

Could be that logic rewritten with kqueue(2) or sigwait(2) -- the current approach looks very fragile.

I was told that mixing EH and program context leads to dangerous situations like calling non-signal-safe functions may leave program in unpredictable state (like malloc(3) or printf(3)).

(Actually I'm not volunteering this redesign myself, since CoreFX support remaining bugs on NetBSD are much more important)

@janvorli
Copy link
Member

janvorli commented Mar 9, 2016

@krytarowski To implement it like we do on the OSX, it would be more involved than this. We would need to create a fake frame on the stack that would contain the context and exception record and allow the unwinder to keep the stack walkable from the exception handler to the actual code with the exception. Basically, it is a stack frame that has its return address on stack set to the SEHProcessException, below that is the RBP of the context where the exception happened, then the context and finally an address in the middle of a fake function that is never called, but provides unwind info for the stack walker - like the PAL_DispatchExceptionWrapper. Then you set the RIP in the ucontext like you do, set the RSP in it to the address on the stack where the address in the middle of a fake function was stored. And also RSI to point to the context and RDI to the exception record in that helper frame (the AMD64 calling convention passes the EXCEPTION_POINTERS in registers, that's what we form in the RDI/RSI this way).
Then you'd set the current context and the control should transfer to the SEHProcessException with the stack still being walkable.

I was originally thinking that we would just modify the context passed in by the signal handler and return from the handler, but that would have the problem you've mentioned - any function called between the return from the signal handler and the context restoration by the system would potentially overwrite our fake stack frame with the context. Moreover, we would not be able to put the fake frame right below the faulting frame.

However it seems there is still a potential problem - the red zone in case the function that has caused the exception was a leaf one. We would probably need to write the fake function in assembly with manual CFI annotations to allow us to skip the red zone. Or, we can disable the red zone using the -mno-red-zone compiler option for all of our code. Since hardware exception out of our code leads to fail fast, we don't have to care about polluting red zone of the platform library functions,

You are right that one has to be extremely careful what to call from the signal handler. But again, we try to handle only exceptions in our managed code (and some of our native code), exceptions at other places cause abort, so we just need to make sure we don't call any non-signal safe function before the point where we check where the RIP of the faulting instruction is located.

As for the idea of using kqueue and sigwait - I am not sure how we would do that. Did you have some specific ideas?

@krytarowski
Copy link
Contributor Author

Thank you for your feedback! I need to process it and try to produce a working solution,

@krytarowski
Copy link
Contributor Author

Surprisingly I think the easiest solution is to write a kernel module to unwind the stack for the process.. I'm researching the kernel internals for sendsig_siginfo().

@janvorli
Copy link
Member

I'm not sure if a kernel module would be the best solution here. I'm not sure how that is viewed in the NetBSD world, but as far as I've understood on Linux, installing a kernel module is considered tainting the kernel.

@krytarowski
Copy link
Contributor Author

In the Linux world tainting the kernel is inserting a module with not a GPL (or few other GPL-compatible alternatives) license.

The shortcoming is that there is need to be a superuser to insert it. It's not trivial for my current knowledge on unwinding on AMD64, it takes time. Thank you for your support!

@krytarowski
Copy link
Contributor Author

I'm still stuck with it. Is upstream considering to redesign the code and go for an sig_atomic_t approach? So far I can do is longjmp(3) out of a handler.

@janvorli
Copy link
Member

Actually, some time ago I got an idea that I believe could work. It is similar to the thing we do for unwinding native frames during exception handling. There is a "StartUnwindingNativeFrames" function that gets a context and the PAL_SEHException to throw. What it does is that it sets the current context right below the frame from which the exception should get unwound by the C++ exception handling and calls a helper C++ function to throw the passed in PAL_SEHException.
In our case, we would pass in the context of the exception (the context we got from the signal converted to the CONTEXT type).
The resulting effect would be that we would remove the signal trampoline from the stack and then the C++ unwinder would not need to cross that boundary.

@krytarowski
Copy link
Contributor Author

I will happily test it. Sadly too little time (in my spare time) to redesign the code myself.

@janvorli
Copy link
Member

@krytarowski I can check the core of the idea on my Ubuntu and if it works, then it should not be complicated to finalize it.

@janvorli
Copy link
Member

@krytarowski I have made a quick test for the idea. It would be nice if you could give it a try on BSD. Here is my branch with the experimental change:
https://github.com/janvorli/coreclr/tree/hw-exceptions-change
The PAL test passes with it correctly on Ubuntu, so I hope it will work well on BSD too.

@krytarowski
Copy link
Contributor Author

I tested your patch on NetBSD and it works!

$ LD_LIBRARY_PATH=./bin/obj/NetBSD.x64.Debug/src/pal/tests/palsuite/exception_handling/pal_sxs/test1 ./bin/obj/NetBSD.x64.Debug/src/pal/tests/palsuite/exception_handling/pal_sxs/test1/paltest_pal_sxs_test1
PAL_SXS test1 SIGSEGV handler 0x0
Starting pal_sxs test1 DllTest2
DLLTest2 PASSED
Starting pal_sxs test1 DllTest1
DLLTest1 PASSED
Starting pal_sxs test1 DllTest2
DLLTest2 PASSED
Starting PAL_SXS test1 signal chaining
pal_sxs test1: signal handler called
Signal chaining PASSED

Please push it for master. I'm 1month behind now on porting NetBSD for .Net Core 1.0.... it's still possible to happen!

@janvorli
Copy link
Member

Great! It will still need a little cleanup though, so I guess I'll have a PR ready tomorrow.

@krytarowski
Copy link
Contributor Author

Fixed by @janvorli

@choikwa
Copy link
Contributor

choikwa commented Jul 29, 2016

Just for the record, PR dotnet/coreclr#5140 solved this issue. I am observing similar issue but with LTO build of CoreCLR on ubuntu. Going to test after that commit to see if it passes.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@msftgits msftgits added this to the Future milestone Jan 30, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests

5 participants