Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ustack not symbolicated if traced process exits first #246

Open
ahupp opened this issue Nov 10, 2018 · 4 comments
Open

ustack not symbolicated if traced process exits first #246

ahupp opened this issue Nov 10, 2018 · 4 comments
Labels
bug Something isn't working priority: medium
Milestone

Comments

@ahupp
Copy link

ahupp commented Nov 10, 2018

I have process that allocates once and then sleeps:

#include <unistd.h>
#include <stdlib.h>

int main() {
  void* a = malloc(2);
  sleep(1000);
  return 0;
}

And I trace it with this command:

sudo bpftrace -e 'uprobe:/lib64/libc.so.6:malloc /comm=="test_malloc"/ { @[ustack] = count(); }'

If I exit test_malloc first before exiting bpftrace, I get unsymbolicated output:

@[
0x7fc578a18800
0x7fc5789b5445
]: 2

But if I exit bpftrace first it is symbolicated:

@[
__libc_malloc+0
__libc_start_main+245
]: 2

So it seems like the symbolication step depends on having the process running. It would be convenient for the things I'm doing to have it symbolicate regardless of whether the process has exited.

@ahupp ahupp changed the title ustack not symbolicated if traced process exists first ustack not symbolicated if traced process exits first Nov 10, 2018
@tyroguru
Copy link
Contributor

I've just taken a quick look at this and we don't appear to attempt to resolve the stacks until bpftrace exits in this case (at least that's how it appears under a debugger). If I'm interpreting the code correctly then this would be the wrong thing to do. We should try and resolve symbols as soon as we are handed the array of PCs.

Maybe someone who understands this code better will chip in but I'll take a look next week if nobody does in the meantime.

@mmarchini
Copy link
Contributor

We should try and resolve symbols as soon as we are handed the array of PCs.

As @tjfontaine said in #286, if we try to resolve symbols too soon we'll introduce extra overhead into the traced application. I guess we could try to resolve symbols in a separate bpftrace thread, but even then there might be some unforeseen overhead.

What we can do is to keep the memory mapped information of traced processes cached in bpftrace, and use this cached information if the process has exited when we're resolving symbols. This might be more laborious than it sounds though, since symbol resolution is implemented in bcc.

@aktau
Copy link

aktau commented Dec 6, 2019

What we can do is to keep the memory mapped information of traced processes cached in bpftrace, and use this cached information if the process has exited when we're resolving symbols. This might be more laborious than it sounds though, since symbol resolution is implemented in bcc.

Could symbol resolution be extracted into a separate library? It seems like a self-contained thing. (I came here because I noticed lots of Error looking up stack id 0 (pid 0) in offwake.bt output.

@danobi
Copy link
Member

danobi commented Dec 10, 2019

I'm considering an offline symbol resolution approach. If we store base addr and exe path, we should be able to resolve symbols even after the process exits. However, it'll only work for .text addrs. The advantage over real time symbolizing is this avoids races (the exception being the binary is updated).

I spoke w/ @yonghong-song and he's open to putting this functionality in bcc.

mmisono added a commit to mmisono/bpftrace that referenced this issue Dec 26, 2019
`resolve_usym()` caches a `bcc_symbol` object using an executable name
as a key, but on the ASLR-enabled platform, symbol addresses change with
each execution. Disable (discard) a cache, in this case, to resolve
symbol names properly.

Note and known issues:

- A cache is discarded whenever resolve_usym is called even if a pid is
the same as the old one. This is because pid may be reused.
- This does not check whether a binary is PIE ASLAR or not. Note that
even if a binary is not PIE ASLR, addresses of shared libraries are
randomized if ASLR is enabled.  (If a binary is not PIE ASLR and
`resolve_usym()` resolves symbol in a binary, we can utilize a cache.)
- If ASLR is disabled on the first execution but enabled on the second
execution, `resolve_usym()` for the second run will use the previous
cache.
- I'm not sure how much performance impact this has. If the impact is
huge, maybe this should be an option.
- As discussed in bpftrace#246, symbolizing will fail after process termination
(this is a separate issue). For example:

```
% bpftrace -e 'u:/lib/x86_64-linux-gnu/libc.so.6:*nanosleep* /comm == "sleep"/ { @[ustack] = count(); }'
Attaching 7 probes...
^C

@[
    0x7ff1917cb990
]: 3
@no such file or directory: /proc/3557/personality
[
    0x7fea4211c990
]: 3
@no such file or directory: /proc/3554/personality
[
    0x7f32bc51a990
]: 3
```

-----

Closes bpftrace#1031 and solves the second part of bpftrace#75.
mmisono added a commit to mmisono/bpftrace that referenced this issue Jan 5, 2020
… given

`resolve_usym()` caches a `bcc_symbol` object using an executable name
as a key, but on the ASLR-enabled platform, symbol addresses change with
each execution. Disable (discard) a cache, in this case, to resolve
symbol names properly.

Introduce `BPFTRACE_CACHE_USER_SYMBOLS` env variable to force caching.
Caching is fine if only trace one program execution.

Note and known issues:

- A cache is discarded whenever resolve_usym is called even if a pid is
the same as the old one. This is because pid may be reused.
- This does not check whether a binary is PIE ASLAR or not. Note that
even if a binary is not PIE ASLR, addresses of shared libraries are
randomized if ASLR is enabled.  (If a binary is not PIE ASLR and
`resolve_usym()` resolves symbol in a binary, we can utilize a cache.)
- If ASLR is disabled on the first execution but enabled on the second
execution, `resolve_usym()` for the second run will use the previous
cache.
- As discussed in bpftrace#246, symbolizing will fail after process termination
(this is a separate issue). For example:

```
% bpftrace -e 'u:/lib/x86_64-linux-gnu/libc.so.6:*nanosleep* /comm == "sleep"/ { @[ustack] = count(); }'
Attaching 7 probes...
^C

@[
    0x7ff1917cb990
]: 3
@
[
    0x7fea4211c990
]: 3
@
[
    0x7f32bc51a990
]: 3
```

-----

Closes bpftrace#1031 and solves the second part of bpftrace#75.
mmisono added a commit to mmisono/bpftrace that referenced this issue Jan 5, 2020
… given

`resolve_usym()` caches a `bcc_symbol` object using an executable name
as a key, but on the ASLR-enabled platform, symbol addresses change with
each execution. Disable (discard) a cache, in this case, to resolve
symbol names properly.

Introduce `BPFTRACE_CACHE_USER_SYMBOLS` env variable to force caching.
Caching is fine if only trace one program execution.

Note and known issues:

- A cache is discarded whenever resolve_usym is called even if a pid is
the same as the old one. This is because pid may be reused.
- This does not check whether a binary is PIE ASLAR or not. Note that
even if a binary is not PIE ASLR, addresses of shared libraries are
randomized if ASLR is enabled.  (If a binary is not PIE ASLR and
`resolve_usym()` resolves symbol in a binary, we can utilize a cache.)
- If ASLR is disabled on the first execution but enabled on the second
execution, `resolve_usym()` for the second run will use the previous
cache.
- As discussed in bpftrace#246, symbolizing will fail after process termination
(this is a separate issue). For example:

```
% bpftrace -e 'u:/lib/x86_64-linux-gnu/libc.so.6:*nanosleep* /comm == "sleep"/ { @[ustack] = count(); }'
Attaching 7 probes...
^C

@[
    0x7ff1917cb990
]: 3
@
[
    0x7fea4211c990
]: 3
@
[
    0x7f32bc51a990
]: 3
```

-----

Closes bpftrace#1031 and solves the second part of bpftrace#75.
mmisono added a commit to mmisono/bpftrace that referenced this issue Jan 21, 2020
… given

`resolve_usym()` caches a `bcc_symbol` object using an executable name
as a key, but on the ASLR-enabled platform, symbol addresses change with
each execution. Disable (discard) a cache, in this case, to resolve
symbol names properly.

Introduce `BPFTRACE_CACHE_USER_SYMBOLS` env variable to force caching.
Caching is fine if only trace one program execution.

Note and known issues:

- A cache is discarded whenever resolve_usym is called even if a pid is
the same as the old one. This is because pid may be reused.
- This does not check whether a binary is PIE ASLAR or not. Note that
even if a binary is not PIE ASLR, addresses of shared libraries are
randomized if ASLR is enabled.  (If a binary is not PIE ASLR and
`resolve_usym()` resolves symbol in a binary, we can utilize a cache.)
- If ASLR is disabled on the first execution but enabled on the second
execution, `resolve_usym()` for the second run will use the previous
cache.
- As discussed in bpftrace#246, symbolizing will fail after process termination
(this is a separate issue). For example:

```
% bpftrace -e 'u:/lib/x86_64-linux-gnu/libc.so.6:*nanosleep* /comm == "sleep"/ { @[ustack] = count(); }'
Attaching 7 probes...
^C

@[
    0x7ff1917cb990
]: 3
@
[
    0x7fea4211c990
]: 3
@
[
    0x7f32bc51a990
]: 3
```

-----

Closes bpftrace#1031 and solves the second part of bpftrace#75.
mmisono added a commit to mmisono/bpftrace that referenced this issue Jan 23, 2020
… given

`resolve_usym()` caches a `bcc_symbol` object using an executable name
as a key, but on the ASLR-enabled platform, symbol addresses change with
each execution. Disable (discard) a cache, in this case, to resolve
symbol names properly.

Introduce `BPFTRACE_CACHE_USER_SYMBOLS` env variable to force caching.
Caching is fine if only trace one program execution.

Note and known issues:

- A cache is discarded whenever resolve_usym is called even if a pid is
the same as the old one. This is because pid may be reused.
- This does not check whether a binary is PIE ASLAR or not. Note that
even if a binary is not PIE ASLR, addresses of shared libraries are
randomized if ASLR is enabled.  (If a binary is not PIE ASLR and
`resolve_usym()` resolves symbol in a binary, we can utilize a cache.)
- If ASLR is disabled on the first execution but enabled on the second
execution, `resolve_usym()` for the second run will use the previous
cache.
- As discussed in bpftrace#246, symbolizing will fail after process termination
(this is a separate issue). For example:

```
% bpftrace -e 'u:/lib/x86_64-linux-gnu/libc.so.6:*nanosleep* /comm == "sleep"/ { @[ustack] = count(); }'
Attaching 7 probes...
^C

@[
    0x7ff1917cb990
]: 3
@
[
    0x7fea4211c990
]: 3
@
[
    0x7f32bc51a990
]: 3
```

-----

Closes bpftrace#1031 and solves the second part of bpftrace#75.
fbs pushed a commit that referenced this issue Feb 12, 2020
… given

`resolve_usym()` caches a `bcc_symbol` object using an executable name
as a key, but on the ASLR-enabled platform, symbol addresses change with
each execution. Disable (discard) a cache, in this case, to resolve
symbol names properly.

Introduce `BPFTRACE_CACHE_USER_SYMBOLS` env variable to force caching.
Caching is fine if only trace one program execution.

Note and known issues:

- A cache is discarded whenever resolve_usym is called even if a pid is
the same as the old one. This is because pid may be reused.
- This does not check whether a binary is PIE ASLAR or not. Note that
even if a binary is not PIE ASLR, addresses of shared libraries are
randomized if ASLR is enabled.  (If a binary is not PIE ASLR and
`resolve_usym()` resolves symbol in a binary, we can utilize a cache.)
- If ASLR is disabled on the first execution but enabled on the second
execution, `resolve_usym()` for the second run will use the previous
cache.
- As discussed in #246, symbolizing will fail after process termination
(this is a separate issue). For example:

```
% bpftrace -e 'u:/lib/x86_64-linux-gnu/libc.so.6:*nanosleep* /comm == "sleep"/ { @[ustack] = count(); }'
Attaching 7 probes...
^C

@[
    0x7ff1917cb990
]: 3
@
[
    0x7fea4211c990
]: 3
@
[
    0x7f32bc51a990
]: 3
```

-----

Closes #1031 and solves the second part of #75.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: medium
Projects
None yet
Development

No branches or pull requests

6 participants