Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRASH using libpthread in a client #956

Open
derekbruening opened this issue Nov 28, 2014 · 8 comments
Open

CRASH using libpthread in a client #956

derekbruening opened this issue Nov 28, 2014 · 8 comments

Comments

@derekbruening
Copy link
Contributor

From [email protected] on October 19, 2012 06:16:31

For the Summary, please follow the guidelines at https://code.google.com/p/dynamorio/wiki/BugReporting and use one of the CRASH, APP CRASH, HANG, or ASSERT keywords What version of DynamoRIO are you using? 3.2.0-3 What operating system version are you running on? 64 bit Ubuntu 12.04
Linux kernel 3.2.0-32-generic What application are you running? a simple systemc example:

\#include "systemc.h"

SC_MODULE(first) {
    SC_CTOR(first) {
        SC_THREAD(first_foo);
    }
    void first_foo() {
        cout \<< "hello from first module\n";
    }
};

SC_MODULE(second) {
    SC_CTOR(second) {
        SC_THREAD(second_foo);
    }
    void second_foo() {
        cout \<< "hello from second module\n";
    }
};

SC_MODULE(top) {
    first *f;
    second *s;
    SC_CTOR(top) {
        f = new first("1st");
        s = new second("2nd");
    }
};

int sc_main(int argc, char* argv[]) {
    top("top");
    sc_start();
    return 0;
} Is your application 32-bit or 64-bit? 64-bit How are you running the application under DynamoRIO? drrun -client /PATH/TO/DR/bin/libwrapperpp.so 0 "" /PATH/TO/APP/a.out What happens when you run without any client? it gives me the regular application output in addition to the first two lines about basename such as:

basename: missing operand
Try `basename --help' for more information.

             SystemC 2.3.0-ASI --- Sep 20 2012 23:24:20
        Copyright (c) 1996-2012 by all Contributors,
        ALL RIGHTS RESERVED

hello from first module
hello from second module

What happens when you run with debug build ("-debug" flag to drrun/drconfig/drinject)? Same crash with/without -debug flag. This is the output when run with -debug:


<Starting application a.out (6490)>
<Initial options = -client_lib '/PATH/TO/DR/
bin/libwrapperpp.so;0;' -code_api -stack_size 56K -max_elide_jmp 0 -max_elide_ca
ll 0 -no_inline_ignored_syscalls -no_native_exec -no_indcall2direct >
Segmentation fault (core dumped) What steps will reproduce the problem? I have prepared a minimal example of the client where I observe the crash:

\#include "dr_api.h"
\#include "drwrap.h"
\#include "drsyms.h"

\#include \<systemc>  // this is the header of my application library

static void event_exit(void);

DR_EXPORT void
dr_init(client_id_t id)
{
    drwrap_init();
    drsym_init(0);
    dr_printf("tracing started..\n");
    dr_register_exit_event(event_exit);
}

static void
event_exit()
{
    drwrap_exit();
    drsym_exit();
    dr_printf("tracing finished..\n");
}

If I exclude #include \<systemc> it works fine. If I include, I get a segmentation fault. What is the expected output? What do you see instead? Is this an application crash, a DynamoRIO crash, a DynamoRIO assert, or a hang (see https://code.google.com/p/dynamorio/wiki/BugReporting and set the title appropriately)? I would expect just the output of my systemc program since I haven't really done anything in my client. Please provide any additional information below. Just to clear it out, DynamoRIO works fine with SystemC applications, it just doesn't like it when I try to include systemc header in my client.

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=956

@derekbruening
Copy link
Contributor Author

From [email protected] on October 19, 2012 03:18:33

forgot to mention, this was first discussed in: https://groups.google.com/forum/?fromgroups=#!topic/DynamoRIO-Users/GkVL9IxxdnQ

@derekbruening
Copy link
Contributor Author

From [email protected] on November 20, 2012 15:48:35

I started to debug this.

The systemc header includes sc_ver.h which creates a little static global in every file that includes it. If I don't do anything to make sure the client is linking against systemc, it wil crash when it tries to call the constructor.

More info:
$ gdb --args /bin/bash ./bin64/drrun -debug -c ./api/samples/bin/libsysc_min.so -- ./suite/tests/bin/common.fib
<WARNING! symbol lookup error: libsysc_min.so undefined symbol _ZN7sc_core20sc_api_version_2_3_0C1Ev>
...
Breakpoint 1, module_undef_symbols () at ../core/linux/module.c:1525
1525 FATAL_USAGE_ERROR(UNDEFINED_SYMBOL_REFERENCE, 0, "");
(gdb) bt
#0 module_undef_symbols () at ../core/linux/module.c:1525
#1 0x0000000072000b78 in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535)
at /usr/local/google/home/rnk/Downloads/systemc-2.3.0/src/sysc/kernel/sc_ver.h:104
#2 0x0000000072000b8d in _GLOBAL__sub_I_sysc_min.cc(void) () at ../api/samples/sysc_min.cc:24
#3 0x00000000712eed00 in privload_call_lib_func (func=0x72000b7a <_GLOBAL__sub_I_sysc_min.cc(void)>) at ../core/linux/loader.c:897
#4 0x00000000712ee121 in privload_call_entry (privmod=0x4001ee20, reason=1) at ../core/linux/loader.c:624
#5 0x00000000712eec5e in privload_call_modules_entry (mod=0x4001ee20, reason=1) at ../core/linux/loader.c:873
#6 0x00000000712ed1fe in os_loader_thread_init_prologue (dcontext=0x40006a40) at ../core/linux/loader.c:222
#7 0x0000000071200541 in loader_thread_init (dcontext=0x40006a40) at ../core/loader_shared.c:178
#8 0x0000000071092040 in dynamo_thread_init (dstack_in=0x0, mc=0x0, client_thread=false) at ../core/dynamo.c:2145
#9 0x000000007108f7f7 in dynamorio_app_init () at ../core/dynamo.c:581
#10 0x00007ffff7bd09d1 in _init (argc=1, argv=0x7fffffffe678, envp=0x7fffffffe688) at ../core/linux/preload.c:189
#11 0x00007ffff7de92bb in call_init (l=0x7ffff7ffab58, argc=1, argv=0x7fffffffe678, env=0x7fffffffe688) at dl-init.c:70
#12 0x00007ffff7de93df in call_init (env=, argv=, argc=, l=) at dl-init.c:52
#13 _dl_init (main_map=0x7ffff7ffe2c8, argc=1, argv=0x7fffffffe678, env=0x7fffffffe688) at dl-init.c:134
#14 0x00007ffff7ddb6ea in _dl_start_user () from /lib64/ld-linux-x86-64.so.2

You may have to put the systemc library on LD_LIBRARY_PATH.

Owner: [email protected]

@derekbruening
Copy link
Contributor Author

From [email protected] on November 20, 2012 16:02:04

Going further and actually linking it in, it looks like systemc wants to use pthreads, which we don't support.

It crashes in libpthread initialization:
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000000049561b1d in __pthread_initialize_minimal_internal () at nptl-init.c:441
#2 0x00000000495602b9 in ?? ()
#3 0x00007fffffffdb90 in ?? ()
#4 0x00000000712eed00 in privload_call_lib_func (func=0x495602b0) at ../core/linux/loader.c:897
#5 0x00000000712ee0cb in privload_call_entry (privmod=0x40025260, reason=1) at ../core/linux/loader.c:616
#6 0x00000000712eec5e in privload_call_modules_entry (mod=0x40025260, reason=1) at ../core/linux/loader.c:873
#7 0x00000000712eec3b in privload_call_modules_entry (mod=0x40023d00, reason=1) at ../core/linux/loader.c:871

If systemc doesn't actually use pthreads but just links against it, we could provide a no-op compatibility layer so you can load systemc and use it's threading unrelated components.

If not, supporting pthreads is a much bigger work item that I don't think we'll implement soon.

Status: Started

@derekbruening
Copy link
Contributor Author

From [email protected] on November 21, 2012 06:50:34

The idea behind systemc is that it maps each sc_thread to a pthread
or qthread and never let them run in parallel but rather according to its
own scheduling (e.g. make early awoken threads sleep again)

  1. Does it support qthreads instead by any chance?
  2. Just to be sure, by 'support' do you mean including in the client or
    running the client on pthreads applications? (because we have been using a
    function wrapping client on systemc examples without a problem)

If systemc doesn't actually use pthreads but just links against it, we

We were planning to cast one of the function parameters that we get as a
void* in our wrap function to its actual class type in systemc and then
try to call one of its methods. Since we were not going to create a
sc_module in our client, I'm guessing we wouldn't be using pthreads. We
have solved that problem using an alternative way already but still it may
be useful to know how.

@derekbruening
Copy link
Contributor Author

From [email protected] on November 21, 2012 15:54:32

Yes, apps that use pthreads (or any other threading) are definitely supported.

Clients, however, can't use pthreads. It integrates tightly into the loader for TLS support among other things, and it interferes with the isolation of DR and the client from the app. There is support in DR for launching threads from the client using dr_create_client_thread(): http://dynamorio.org/docs/dr__tools_8h.html#ac6b80b83502ff13d4674b13e7b30b555 In theory, someone could figure out how to map pthreads onto our primitives, but it's pretty tricky.

Anyway, it sounds like you don't actually need pthreads at runtime, you just depend on some code that is linked with code that depends on pthreads. You could try something gross like editing out the PT_NEEDED phdr for pthreads.

@derekbruening
Copy link
Contributor Author

From [email protected] on November 27, 2012 10:43:57

Looked into this a bit more, and it's crashing here in __pthread_initialize_minimal_internal():
/* Transfer the old value from the dynamic linker's internal location. _/
*__libc_dl_error_tsd () = *(_GL(dl_error_catch_tsd)) ();
GL(dl_error_catch_tsd) = &__libc_dl_error_tsd;

I'm assuming its that _dl_error_catch_tsd global that's set to zero, and then we do a function call to NULL.

@derekbruening derekbruening changed the title CRASH segmentation fault when including systemc header CRASH using libpthread in a client Mar 22, 2018
@derekbruening
Copy link
Contributor Author

While the docs do say we don't support a private libpthread, it seems best to also have the private loader give a warning, to help diagnose problems like https://groups.google.com/forum/#!topic/DynamoRIO-Users/Sk4D0w2LC7s.

derekbruening added a commit that referenced this issue Oct 11, 2021
Adds a new extension "drcallstack" which provides callstack walking
facilities.  This initial implementation adds a libunwind-based
implementation and targets only Linux for now.

Adds an interface to walk one step at a time over callstack frames.
The implementation converts the dr_mcontext_t into libunwind's context
structure and invokes the libunwind step API.

Getting libunwind to work requires several steps:

+ Ignore libpthread exports when importing any symbol that does not start with
  "pthread".  Otherwise libunwind crashes using __errno_location from libpthread
  instead of from libc.  We add a warning when a private libpthread is loaded
  to help diagnose any other potential problems (xref #956).

+ Have dl_iterate_phdr operate on app libraries instead of private libraries.
  This is done with a new flag and logic in the redirection code, with the
  flag set for libraries named "libunwind*".

A statically-linked libunwind is not supported.

Updates drwrap to set the mcontext pc field to simplify usage.

Adds a test and documentation.

Adds a sample client showing how to use this library.

Issue: #2414, #956
derekbruening added a commit that referenced this issue Oct 13, 2021
Adds a new extension "drcallstack" which provides callstack walking
facilities.  This initial implementation adds a libunwind-based
implementation and targets only Linux for now.

Adds an interface to walk one step at a time over callstack frames.
The implementation converts the dr_mcontext_t into libunwind's context
structure and invokes the libunwind step API.

Getting libunwind to work requires several steps:

+ Ignore libpthread exports when importing any symbol that does not start with
  "pthread".  Otherwise libunwind crashes using __errno_location from libpthread
  instead of from libc.  We add a warning when a private libpthread is loaded
  to help diagnose any other potential problems (xref #956).

+ Have dl_iterate_phdr operate on app libraries instead of private libraries.
  This is done with a new flag and logic in the redirection code, with the
  flag set for libraries named "libunwind*".

A statically-linked libunwind is not supported.

Updates drwrap to set the mcontext pc field to simplify usage.

Adds a test and documentation.

Adds a sample client showing how to use this library.

Installs libunwind for cross-compiling and for 32-bit x86 using manual download and unpack steps, as there are no ready-made packages (GA CI uses a microsoft.com repository which has only 64-bit libunwind, and file conflicts are hit when trying to add a new repository for 32-bit only).

Issue: #2414, #956
@derekbruening
Copy link
Contributor Author

In case it's not clear, a warning on libpthread being loaded was added in PR #5154:

core/unix/loader.c-                    SYSLOG_INTERNAL_WARNING(
core/unix/loader.c:                        "private libpthread.so loaded but not fully supported (i#956)");

derekbruening added a commit that referenced this issue Feb 18, 2022
Adds private loader redirection of open, close, read, and write to
DR's syscall-wrapper versions (plus file descriptor isolation, for
open and close).  The libc write invokes pthread code for cancel
features, and we are not able to create a private libpthread or
isolate pthread resources (#956) which leads to poor interactions with
application pthread uses and observed hangs.

Tested on the AArch64 Jenkins machine where these tests all hung every
5 to 10 runs in release build before and now they succeed 20,000 times
in a row:
--------------------------------------------------
derek@dynamorio:~/dr/build_rel$ for i in sim.threads\$ sim.TLB-threads sim.coherence sim.threads-with; do echo $i; ctest --repeat-until-fail 20000 -R $i > RUN-$i 2>&1; done
sim.threads$
sim.TLB-threads
sim.coherence
sim.threads-with
derek@dynamorio:~/dr/build_rel$ grep -c Passed RUN-*
RUN-sim.coherence:20000
RUN-sim.threads$:20000
RUN-sim.threads-with:20000
RUN-sim.TLB-threads:20000
derek@dynamorio:~/dr/build_rel$ grep failed RUN-*
RUN-sim.coherence:100% tests passed, 0 tests failed out of 1
RUN-sim.threads$:100% tests passed, 0 tests failed out of 1
RUN-sim.threads-with:100% tests passed, 0 tests failed out of 1
RUN-sim.TLB-threads:100% tests passed, 0 tests failed out of 1
--------------------------------------------------

While at it, removes drcachesim.invariants which was tested as well
and has no failures.

Issue: #4928, #4954, #2417, #956
Fixes #4928
Fixes #4954
Fixes #2892
derekbruening added a commit that referenced this issue Feb 18, 2022
Adds private loader redirection of open, close, read, and write to
DR's syscall-wrapper versions (plus file descriptor isolation, for
open and close).  The libc write invokes pthread code for cancel
features, and we are not able to create a private libpthread or
isolate pthread resources (#956) which leads to poor interactions with
application pthread uses and observed hangs.

Tested on the AArch64 Jenkins machine where these tests all hung every
5 to 10 runs in release build before and now they succeed 20,000 times
in a row:
```
--------------------------------------------------
derek@dynamorio:~/dr/build_rel$ for i in sim.threads\$ sim.TLB-threads sim.coherence sim.threads-with; do echo $i; ctest --repeat-until-fail 20000 -R $i > RUN-$i 2>&1; done
sim.threads$
sim.TLB-threads
sim.coherence
sim.threads-with
derek@dynamorio:~/dr/build_rel$ grep -c Passed RUN-*
RUN-sim.coherence:20000
RUN-sim.threads$:20000
RUN-sim.threads-with:20000
RUN-sim.TLB-threads:20000
derek@dynamorio:~/dr/build_rel$ grep failed RUN-*
RUN-sim.coherence:100% tests passed, 0 tests failed out of 1
RUN-sim.threads$:100% tests passed, 0 tests failed out of 1
RUN-sim.threads-with:100% tests passed, 0 tests failed out of 1
RUN-sim.TLB-threads:100% tests passed, 0 tests failed out of 1
--------------------------------------------------
```
While at it, removes drcachesim.invariants which was tested as well
and has no failures, under the theory that the original failures were
these same release-build hangs.  Today, it's a debug-only test.

Issue: #4928, #4954, #2417, #956
Fixes #4928
Fixes #4954
Fixes #2892
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant