Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for remote target devices via BPFd #1675

Closed
wants to merge 5 commits into from

Conversation

jcanseco
Copy link
Contributor

Overview

These patches add features that allow BCC to support cross-development workflows where the development machine and the target machine running the BPF program are different.

This is achieved by integrating the BPFd (BPF Daemon) project into BCC. BPFd is a standalone executable that can be loaded onto a remote target device, and which then can act as a proxy for BCC for whenever BCC wishes to perform an operation on the system (e.g. load BPF programs, read /proc/kallsyms, attach kprobes, etc.).

Through this arrangement, BCC can be used to profile a remote target device while it mostly runs on a separate host machine. The advantage of this arrangement is that it removes the need to have kernel sources and the LLVM stack on the target machine. These can, instead, be kept on the host. This arrangement therefore reduces the space required on a target for BCC tools to run, which is a key benefit for devices that have a more limited disk space (e.g. embedded devices).

In addition, the above set-up also allows embedded developers who use a cross-compiler in their workflow to run clang on a different architecture than the target's architecture, thus facilitating cross-compilation development.

For more information, please check out this LWN article which explains the purpose of BPFd and how it works in more detail.

Integration of BPFd sources into BCC

These patches add the sources for the BPFd executable into the BCC source tree. This is done to ensure that BCC and BPFd remain compatible with each other.

BPFd depends on existing BCC components such as libbpf.c and bcc_syms.cc to function. However, the converse is also true: BCC makes calls to BPFd via BCC's Python interface. The Python interface is the main way by which communication happens between BCC and BPFd, and so any changes there could break interoperability. As a result, it is not sufficient for BPFd to just use libbcc and remain independent like other projects (e.g. bpftrace).

Therefore, to ensure that BCC and BPFd are always compatible with each other, it is more feasible to keep them in the same tree instead of keeping them separate. This is also why these patches come with smoke tests which ensure that the interoperability between BCC and BPFd isn't broken silently when changes are made to either.

Tools

The tools that currently work for remote devices with these patches are as follows:

  • Biolatency
  • Biosnoop
  • Biotop
  • Cachestat
  • Cachetop
  • Filetop
  • Hardirqs
  • Offcputime
  • Opensnoop
  • Profile
  • Runqlen
  • Stackcount
  • Syscount
  • Trace

Reviewed-by: Joel Fernandes ([email protected])

@yonghong-song
Copy link
Collaborator

[buildbot, ok to test]

@jcanseco
Copy link
Contributor Author

Update: applied style changes suggested by the style-check.sh script

@jcanseco jcanseco force-pushed the upstream_submit branch 5 times, most recently from 43fcf37 to 1d2b3f8 Compare April 11, 2018 23:01
@yonghong-song
Copy link
Collaborator

@drzaeus77 Could you add python module pexpect on our unit test VMs? Looks like tests failed due to this missing module.

@drzaeus77
Copy link
Collaborator

Looking into it.

@drzaeus77
Copy link
Collaborator

[buildbot, test this please]

This is just a fc27 test, if that passes I will apply the update to the other vm images.

@drzaeus77
Copy link
Collaborator

The fc27 test failed, though pexpect was installed:

28: Test command: /home/fedora/jenkins/workspace/bcc-pr/label/fc27/build/tests/wrapper.sh "py_test_tools_on_remote" "sudo" "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/tests/python/test_tools_on_remote.py"
28: Test timeout computed to be: 10000000
28: Traceback (most recent call last):
28:   File "../../tools/biolatency.py", line 114, in <module>
28:     b = BPF(text=bpf_text)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 284, in __init__
28:     BPF._libremote = BPF._open_connection_to_remote_target()
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 348, in _open_connection_to_remote_target
28:     return libremote.LibRemote(os.environ.get("BCC_REMOTE"), os.environ.get("BCC_REMOTE_ARGS"))
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/libremote.py", line 47, in __init__
28:     self.remote = cls(remote_arg)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/shell.py", line 33, in __init__
28:     self.client = pe.spawn(cmd)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 198, in __init__
28:     self._spawn(command, args, preexec_fn, dimensions)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 271, in _spawn
28:     'executable: %s.' % self.command)
28: pexpect.exceptions.ExceptionPexpect: The command was not found or was not executable: bpfd.
28: FTraceback (most recent call last):
28:   File "../../tools/biosnoop.py", line 122, in <module>
28:     """, debug=0)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 284, in __init__
28:     BPF._libremote = BPF._open_connection_to_remote_target()
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 348, in _open_connection_to_remote_target
28:     return libremote.LibRemote(os.environ.get("BCC_REMOTE"), os.environ.get("BCC_REMOTE_ARGS"))
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/libremote.py", line 47, in __init__
28:     self.remote = cls(remote_arg)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/shell.py", line 33, in __init__
28:     self.client = pe.spawn(cmd)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 198, in __init__
28:     self._spawn(command, args, preexec_fn, dimensions)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 271, in _spawn
28:     'executable: %s.' % self.command)
28: pexpect.exceptions.ExceptionPexpect: The command was not found or was not executable: bpfd.
28: FTraceback (most recent call last):
28:   File "../../tools/biotop.py", line 174, in <module>
28:     b = BPF(text=bpf_text)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 284, in __init__
28:     BPF._libremote = BPF._open_connection_to_remote_target()
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 348, in _open_connection_to_remote_target
28:     return libremote.LibRemote(os.environ.get("BCC_REMOTE"), os.environ.get("BCC_REMOTE_ARGS"))
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/libremote.py", line 47, in __init__
28:     self.remote = cls(remote_arg)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/shell.py", line 33, in __init__
28:     self.client = pe.spawn(cmd)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 198, in __init__
28:     self._spawn(command, args, preexec_fn, dimensions)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 271, in _spawn
28:     'executable: %s.' % self.command)
28: pexpect.exceptions.ExceptionPexpect: The command was not found or was not executable: bpfd.
28: FTraceback (most recent call last):
28:   File "../../tools/cachestat.py", line 107, in <module>
28:     b = BPF(text=bpf_text)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 284, in __init__
28:     BPF._libremote = BPF._open_connection_to_remote_target()
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 348, in _open_connection_to_remote_target
28:     return libremote.LibRemote(os.environ.get("BCC_REMOTE"), os.environ.get("BCC_REMOTE_ARGS"))
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/libremote.py", line 47, in __init__
28:     self.remote = cls(remote_arg)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/shell.py", line 33, in __init__
28:     self.client = pe.spawn(cmd)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 198, in __init__
28:     self._spawn(command, args, preexec_fn, dimensions)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 271, in _spawn
28:     'executable: %s.' % self.command)
28: pexpect.exceptions.ExceptionPexpect: The command was not found or was not executable: bpfd.
28: FTraceback (most recent call last):
28:   File "../../tools/filetop.py", line 160, in <module>
28:     b = BPF(text=bpf_text)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 284, in __init__
28:     BPF._libremote = BPF._open_connection_to_remote_target()
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 348, in _open_connection_to_remote_target
28:     return libremote.LibRemote(os.environ.get("BCC_REMOTE"), os.environ.get("BCC_REMOTE_ARGS"))
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/libremote.py", line 47, in __init__
28:     self.remote = cls(remote_arg)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/shell.py", line 33, in __init__
28:     self.client = pe.spawn(cmd)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 198, in __init__
28:     self._spawn(command, args, preexec_fn, dimensions)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 271, in _spawn
28:     'executable: %s.' % self.command)
28: pexpect.exceptions.ExceptionPexpect: The command was not found or was not executable: bpfd.
28: FTraceback (most recent call last):
28:   File "../../tools/hardirqs.py", line 147, in <module>
28:     b = BPF(text=bpf_text)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 284, in __init__
28:     BPF._libremote = BPF._open_connection_to_remote_target()
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 348, in _open_connection_to_remote_target
28:     return libremote.LibRemote(os.environ.get("BCC_REMOTE"), os.environ.get("BCC_REMOTE_ARGS"))
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/libremote.py", line 47, in __init__
28:     self.remote = cls(remote_arg)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/shell.py", line 33, in __init__
28:     self.client = pe.spawn(cmd)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 198, in __init__
28:     self._spawn(command, args, preexec_fn, dimensions)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 271, in _spawn
28:     'executable: %s.' % self.command)
28: pexpect.exceptions.ExceptionPexpect: The command was not found or was not executable: bpfd.
28: FTraceback (most recent call last):
28:   File "../../tools/offcputime.py", line 233, in <module>
28:     b = BPF(text=bpf_text)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 284, in __init__
28:     BPF._libremote = BPF._open_connection_to_remote_target()
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 348, in _open_connection_to_remote_target
28:     return libremote.LibRemote(os.environ.get("BCC_REMOTE"), os.environ.get("BCC_REMOTE_ARGS"))
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/libremote.py", line 47, in __init__
28:     self.remote = cls(remote_arg)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/shell.py", line 33, in __init__
28:     self.client = pe.spawn(cmd)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 198, in __init__
28:     self._spawn(command, args, preexec_fn, dimensions)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 271, in _spawn
28:     'executable: %s.' % self.command)
28: pexpect.exceptions.ExceptionPexpect: The command was not found or was not executable: bpfd.
28: FTraceback (most recent call last):
28:   File "../../tools/opensnoop.py", line 137, in <module>
28:     b = BPF(text=bpf_text)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 284, in __init__
28:     BPF._libremote = BPF._open_connection_to_remote_target()
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 348, in _open_connection_to_remote_target
28:     return libremote.LibRemote(os.environ.get("BCC_REMOTE"), os.environ.get("BCC_REMOTE_ARGS"))
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/libremote.py", line 47, in __init__
28:     self.remote = cls(remote_arg)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/shell.py", line 33, in __init__
28:     self.client = pe.spawn(cmd)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 198, in __init__
28:     self._spawn(command, args, preexec_fn, dimensions)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 271, in _spawn
28:     'executable: %s.' % self.command)
28: pexpect.exceptions.ExceptionPexpect: The command was not found or was not executable: bpfd.
28: FTraceback (most recent call last):
28:   File "../../tools/profile.py", line 219, in <module>
28:     b = BPF(text=bpf_text)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 284, in __init__
28:     BPF._libremote = BPF._open_connection_to_remote_target()
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 348, in _open_connection_to_remote_target
28:     return libremote.LibRemote(os.environ.get("BCC_REMOTE"), os.environ.get("BCC_REMOTE_ARGS"))
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/libremote.py", line 47, in __init__
28:     self.remote = cls(remote_arg)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/shell.py", line 33, in __init__
28:     self.client = pe.spawn(cmd)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 198, in __init__
28:     self._spawn(command, args, preexec_fn, dimensions)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 271, in _spawn
28:     'executable: %s.' % self.command)
28: pexpect.exceptions.ExceptionPexpect: The command was not found or was not executable: bpfd.
28: FTraceback (most recent call last):
28:   File "../../tools/runqlen.py", line 186, in <module>
28:     b = BPF(text=bpf_text)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 284, in __init__
28:     BPF._libremote = BPF._open_connection_to_remote_target()
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 348, in _open_connection_to_remote_target
28:     return libremote.LibRemote(os.environ.get("BCC_REMOTE"), os.environ.get("BCC_REMOTE_ARGS"))
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/libremote.py", line 47, in __init__
28:     self.remote = cls(remote_arg)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/shell.py", line 33, in __init__
28:     self.client = pe.spawn(cmd)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 198, in __init__
28:     self._spawn(command, args, preexec_fn, dimensions)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 271, in _spawn
28:     'executable: %s.' % self.command)
28: pexpect.exceptions.ExceptionPexpect: The command was not found or was not executable: bpfd.
28: FFTraceback (most recent call last):
28:   File "../../tools/syscount.py", line 498, in <module>
28:     bpf = BPF(text=text)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 284, in __init__
28:     BPF._libremote = BPF._open_connection_to_remote_target()
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/__init__.py", line 348, in _open_connection_to_remote_target
28:     return libremote.LibRemote(os.environ.get("BCC_REMOTE"), os.environ.get("BCC_REMOTE_ARGS"))
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/libremote.py", line 47, in __init__
28:     self.remote = cls(remote_arg)
28:   File "/home/fedora/jenkins/workspace/bcc-pr/label/fc27/src/python/bcc/remote/shell.py", line 33, in __init__
28:     self.client = pe.spawn(cmd)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 198, in __init__
28:     self._spawn(command, args, preexec_fn, dimensions)
28:   File "/usr/lib/python2.7/site-packages/pexpect/pty_spawn.py", line 271, in _spawn
28:     'executable: %s.' % self.command)
28: pexpect.exceptions.ExceptionPexpect: The command was not found or was not executable: bpfd.

@jcanseco
Copy link
Contributor Author

Hi @drzaeus77, thanks for taking the time. Is pexpect still installed for the fc27 machine? And from what I can see, everything else is passing again except test_libbcc and test_tools_on_remote (the one added by this PR).

I'll take a look the moment I get into the office on Monday, but I think I know what the issue may be.

@drzaeus77
Copy link
Collaborator

Yes, it is still installed. If you push more changes to this PR it will run against the updated fc27 with pexpect.

@jcanseco jcanseco force-pushed the upstream_submit branch 3 times, most recently from 8917d1b to e9e4e65 Compare April 16, 2018 19:40
Libclang uses functions from libbcc, so it does depend on it.
When defining new symbols in bpf_common.cc, linker errors appear
when the same symbols are used in libclang. This is because of
incorrect linker dependency. This patch fixes the issue by making
sure the dependency is correctly tracked.

Signed-off-by: Joel Fernandes <[email protected]>
@jcanseco
Copy link
Contributor Author

jcanseco commented Apr 16, 2018

Hi @drzaeus77, I think I figured out why test_tools_on_remote was failing. The VMs are not configured to build the bpfd executable, which the test depends on. This shouldn't be a problem if the VMs invoke make all before invoking make test (which was my original assumption), but I believe the VMs are actually configured to build specific targets instead of just the all target.

For instance, I believe that the VMs build the bcc-static, bpf-static, usdt-static targets (and many others), but not the LLCStat, TCPSendStack, and CPUDistribution targets. I determined this by printing the BCC directory structure in fc27 during test execution and verified the presence (or lack of presence) of binaries associated with these targets.

Is my understanding here correct? If so, what am I supposed to do to have the VMs build the bpfd target as well (i.e. are there any config files I am supposed to change, or can this change only be done from your end?)

@drzaeus77
Copy link
Collaborator

We make test to build just the test binaries, the rest of the libraries/binaries are built through the rpm/deb build steps and then installed (rpm -i, dpkg -i). So, if you are missing a target, that means it needs to be added to a package spec/debian rules.

@jcanseco
Copy link
Contributor Author

Thank you @drzaeus77, you were completely right. I updated the rpm and deb build steps. The tests for fc27 are passing now.

Can you please now install the pexpect module for the other VMs as well?

@jcanseco
Copy link
Contributor Author

Can you try now? I applied a modified variant of your patches and fixed other Python 3 support issues that I identified.

@yonghong-song
Copy link
Collaborator

[buildbot, ok to test]

@yonghong-song
Copy link
Collaborator

much better now. The test test_tools_on_remote.py still failed though (with python3 of course).
I have the following diff to fix a couple of things:

[yhs@localhost remote]$ git diff
diff --git a/src/python/bcc/remote/libremote.py b/src/python/bcc/remote/libremote.py
index 012d50d..3fcfeb2 100644
--- a/src/python/bcc/remote/libremote.py
+++ b/src/python/bcc/remote/libremote.py
@@ -114,12 +114,12 @@ class LibRemote(object):
         return ret[0]
 
     def bpf_attach_kprobe(self, fd, t, evname, fnname):
-        cmd = "BPF_ATTACH_KPROBE {} {} {} {}".format(fd, t, evname, fnname)
+        cmd = "BPF_ATTACH_KPROBE {} {} {} {}".format(fd, t, evname.decode(), fnname.decode())
         ret = self._remote_send_command(cmd)
         return ret[0]
 
     def bpf_detach_kprobe(self, evname):
-        cmd = "BPF_DETACH_KPROBE {}".format(evname)
+        cmd = "BPF_DETACH_KPROBE {}".format(evname.decode())
         ret = self._remote_send_command(cmd)
         return 0
 
@@ -141,7 +141,7 @@ class LibRemote(object):
 
     def bpf_create_map(self, map_type, name, key_size, leaf_size, max_entries,
                        flags):
-        cmd = "BPF_CREATE_MAP {} {} {} {} {} {}".format(map_type, name, key_size,
+        cmd = "BPF_CREATE_MAP {} {} {} {} {} {}".format(map_type, name.decode(), key_size,
                                     leaf_size, max_entries, flags)
         ret = self._remote_send_command(cmd)
 
diff --git a/src/python/bcc/remote/shell.py b/src/python/bcc/remote/shell.py
index e5e4d10..396c3be 100644
--- a/src/python/bcc/remote/shell.py
+++ b/src/python/bcc/remote/shell.py
@@ -45,7 +45,7 @@ class ShellRemote(BccRemote):
         except pe.exceptions.EOF:
             return ['Command not recognized (timeout)']
 
-        ret = c.before.split('\n')
+        ret = c.before.decode().split('\n')
 
         # Sanitize command output
         ret = [r.rstrip() for r in ret if r]
[yhs@localhost remote]$ 

The shell.py is necessary to avoid a lot of errors for bytes vs. str.

The libremote.py change is to fix the map-creation failure and attaching kprobe failure.
Without this change, the name becomes b'abc' instead just abc. And of course the
name b'abc' will cause the test failure.

We still have some failures like below, which I suspect they all related with this byte vs. str thing.

[root@localhost python]# ./test_tools_on_remote.py 
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/bcc/__init__.py", line 1305, in cleanup
    self.detach_kprobe_event(k)
  File "/usr/lib/python3.6/site-packages/bcc/__init__.py", line 673, in detach_kprobe_event
    raise Exception("Failed to close kprobe FD")
Exception: Failed to close kprobe FD
.Traceback (most recent call last):
  File "../../tools/biosnoop.py", line 186, in <module>
    b["events"].open_perf_buffer(print_event, page_cnt=64)
  File "/usr/lib/python3.6/site-packages/bcc/table.py", line 598, in open_perf_buffer
    self._open_perf_buffer(i, callback, page_cnt, lost_cb)
  File "/usr/lib/python3.6/site-packages/bcc/table.py", line 631, in _open_perf_buffer
    self[self.Key(cpu)] = self.Leaf(fd)
  File "/usr/lib/python3.6/site-packages/bcc/table.py", line 502, in __setitem__
    super(ArrayBase, self).__setitem__(key, leaf)
  File "/usr/lib/python3.6/site-packages/bcc/table.py", line 233, in __setitem__
    raise Exception("Could not update table")
Exception: Could not update table
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
...

How do you debug bpfd? I could add codes to have debug output to a special file. But I am wondering you have a better way to do this. Or the bpfd logs can go to a special location for easy debugging.

@drzaeus77
Copy link
Collaborator

Do not add any uses of decode. @yonghong-song please see #1582 for alternatives.

@yonghong-song
Copy link
Collaborator

Thanks for reminding. These will be just workaround to pinpoint where is the issue. I will make sure final code will not use decode().

@drzaeus77
Copy link
Collaborator

[buildbot, test this please]

@yonghong-song
Copy link
Collaborator

yonghong-song commented Apr 19, 2018

I did some experiments on the following additional hack. With the following change,

diff --git a/src/python/bcc/__init__.py b/src/python/bcc/__init__.py
index 46bfe85..1b0ad6c 100644
--- a/src/python/bcc/__init__.py
+++ b/src/python/bcc/__init__.py
@@ -1268,7 +1268,7 @@ class BPF(object):
         try:
             if BPF._libremote:
                 fd_callbacks = []
-                for k, v in self.perf_buffers.iteritems():
+                for k, v in self.perf_buffers.items():
                     # Only support polling for per-cpu perf buffers
                     # { (PerfEventArray-Obj, cpu) -> fd }
                     if v and type(k) == tuple:
diff --git a/src/python/bcc/remote/libremote.py b/src/python/bcc/remote/libremote.py
index 3fcfeb2..9969aa3 100644
--- a/src/python/bcc/remote/libremote.py
+++ b/src/python/bcc/remote/libremote.py
@@ -104,12 +104,12 @@ class LibRemote(object):
             return comm
 
     def bpf_attach_tracepoint(self, fd, cat, tp_name):
-        cmd = "BPF_ATTACH_TRACEPOINT {} {} {}".format(fd, cat, tp_name)
+        cmd = "BPF_ATTACH_TRACEPOINT {} {} {}".format(fd, cat.decode(), tp_name.decode())
         ret = self._remote_send_command(cmd)
         return ret[0]
 
     def bpf_detach_tracepoint(self, tp_category, tp_name):
-        cmd = "BPF_DETACH_TRACEPOINT {} {}".format(tp_category, tp_name)
+        cmd = "BPF_DETACH_TRACEPOINT {} {}".format(tp_category.decode(), tp_name.decode())
         ret = self._remote_send_command(cmd)
         return ret[0]
 
@@ -124,18 +124,19 @@ class LibRemote(object):
         return 0
 
     def bpf_attach_uprobe(self, fd, t, evname, binpath, offset, pid):
-        cmd = "BPF_ATTACH_UPROBE {} {} {} {} {} {}".format(fd, t, evname, binpath, offset, pid)
+        cmd = "BPF_ATTACH_UPROBE {} {} {} {} {} {}".format(fd, t, evname.decode(), binpath.decode(), offset, pid)
         ret = self._remote_send_command(cmd)
         return ret[0]
 
     def bpf_detach_uprobe(self, evname):
-        cmd = "BPF_DETACH_UPROBE {}".format(evname)
+        cmd = "BPF_DETACH_UPROBE {}".format(evname.decode())
         ret = self._remote_send_command(cmd)
         return 0
 
     def bpf_prog_load(self, prog_type, name, func_str, license_str, kern_version):
-        cmd = "BPF_PROG_LOAD {} {} {} {} {} {}".format(prog_type, name, len(func_str),
-              license_str, kern_version, base64.b64encode(func_str))
+        func_bstr = base64.b64encode(func_str)
+        cmd = "BPF_PROG_LOAD {} {} {} {} {} {}".format(prog_type, name.decode(), len(func_str),
+              license_str.decode(), kern_version, func_bstr.decode())
         ret = self._remote_send_command(cmd)
         return ret[0]
 
@@ -151,8 +152,8 @@ class LibRemote(object):
         return ret[0]
 
     def bpf_update_elem(self, map_fd, kstr, klen, lstr, llen, flags):
-        cmd = "BPF_UPDATE_ELEM {} {} {} {} {} {}".format(map_fd, kstr, klen,
-                                                         lstr, llen, flags)
+        cmd = "BPF_UPDATE_ELEM {} {} {} {} {} {}".format(map_fd, kstr.decode(), klen,
+                                                         lstr.decode(), llen, flags)
         ret = self._remote_send_command(cmd)
         return ret[0]
 
@@ -167,7 +168,7 @@ class LibRemote(object):
         if map_fd not in self.map_dumped or self.map_dumped[map_fd] == False:
             self.bpf_get_first_key(map_fd, klen, llen, dump_all=True)
 
-        cmd = "BPF_LOOKUP_ELEM {} {} {} {}".format(map_fd, kstr, klen, llen)
+        cmd = "BPF_LOOKUP_ELEM {} {} {} {}".format(map_fd, kstr.decode(), klen, llen)
         ret = self._remote_send_command(cmd)
         return ret
 
@@ -209,12 +210,12 @@ class LibRemote(object):
             if kstr in self.nkey_cache[map_fd]:
                 return (0, [self.nkey_cache[map_fd][kstr]])
 
-        cmd = "BPF_GET_NEXT_KEY {} {} {}".format(map_fd, kstr, klen)
+        cmd = "BPF_GET_NEXT_KEY {} {} {}".format(map_fd, kstr.decode(), klen)
         ret = self._remote_send_command(cmd)
         return ret
 
     def bpf_delete_elem(self, map_fd, kstr, klen):
-        cmd = "BPF_DELETE_ELEM {} {} {}".format(map_fd, kstr, klen)
+        cmd = "BPF_DELETE_ELEM {} {} {}".format(map_fd, kstr.decode(), klen)
         ret = self._remote_send_command(cmd)
         self._invalidate_map_cache(map_fd)
         return ret[0]
@@ -271,10 +272,14 @@ class LibRemote(object):
             return None, addr, None
         else:
             name, offset, module = ret[1][1].split(";")
-            return name, offset, module
+            # the type of name could be "str" or "bytes"
+            if type(name) is str:
+                return bytearray(name, 'utf-8'), offset, module
+            else:
+                return name, offset, module
 
     def ksymname(self, name):
-        cmd = "GET_KSYM_ADDR {}".format(name)
+        cmd = "GET_KSYM_ADDR {}".format(name.decode())
         ret = self._remote_send_command(cmd)
 
         ret_code = ret[0]
@@ -293,7 +298,11 @@ class LibRemote(object):
             return None, addr, None
         else:
             name, offset, module = ret[1][1].split(";")
-            return name, offset, module
+            # the type of name could be "str" or "bytes"
+            if type(name) is str:
+                return bytearray(name, 'utf-8'), offset, module
+            else:
+                return name, offset, module
 
     def usymname(self, pid, name, module):
         cmd = "GET_USYM_ADDR {} {} {}".format(pid, name, module)

python3 test_tools_on_remote can pass. Most decode() here might be justified since we need to base64 conversion and send the commands to bpfd. I prefer to do decode() in python side
so that C side of bpfd should just expect strings (not b'strings').

One annoying thing is related to ksym and usym. Sometimes, a string is returned and some other time is a byte array. I did not do further investigation though.

@jcanseco
Copy link
Contributor Author

@yonghong-song: thanks for investigating. I've been trying to fix this problem today, and I came up with a patch somewhat similar to the one you just posted. I got the tests passing, but there were other problems that required investigating. We also need to do a thorough test of BCC and BPFd on our end on both Python-2 and Python-3, since this is a non-trivial code change. So please sit tight, this'll probably take some time.

Also, to answer your question about debugging from before: I usually run the individual tools that failed if the test fails. You can set the following environment variables to hookup BCC to BPFd. This will make BCC think that there is a remote target device (in the form of a shell process on the same machine):

export ARCH=x86
export BCC_REMOTE=shell

And the following variable to turn on debugging logs:

export BCC_REMOTE_DEBUG=1

This should show you what BCC is sending to BPFd and what BPFd is sending back.

@yonghong-song
Copy link
Collaborator

Thanks, @jcanseco. Additional testing is definitely helpful. Thanks for the tip for run on the shell with env variable. I indeed used it this morning by looking at the test. One thing I noticed is that the error log inside the bpfd is not visible now. But I see a comment in the bpfd.c which says for later work. Indeed, this can be improved after initial patch.

@nirmoy
Copy link
Contributor

nirmoy commented Apr 20, 2018

@jcanseco How would I try bpfd in action, can you point me to some readme/doc ?

@jcanseco
Copy link
Contributor Author

Hello everyone, I talked to Joel recently, and we both agreed that we should close this PR for now. We realize that the patch is probably not yet fully ready. There are a few issues that we need to fix on our end first which then need to be thoroughly tested. We don't wish to take up too much of everyone's time during this process, so we've decided to close this for now and reopen it later.

Thank you @yonghong-song and @drzaeus77 for the time you've spent and all the help you've provided! We really appreciate it, and we hope that you look forward to a more refined version of this patch.

@nirmoy If you still wish to try out BPFd in action, you can read the README and INSTALL docs in the original BPFd repository. If you need help, you can contact Joel ([email protected]). Although I recommend waiting before we resubmit this PR so that you'll be able to try out a more refined version of BPFd.

@jcanseco jcanseco closed this Apr 20, 2018
@yonghong-song
Copy link
Collaborator

@jcanseco Sure. Just post when you are ready. Thanks!

@joelagnel
Copy link
Contributor

joelagnel commented Apr 21, 2018

Thanks a lot @yonghong-song @drzaeus77 and @jcanseco .

Currently I am working on improving BCC tools on Android and I found another way to run tools without needing BPFd called androdeb : http://tinyurl.com/androdeb

So, I am moving my own focus slightly away from BPFd since I use androdeb now for running BCC on Android, however several folks pinged me about BPFd for non-Android so I am sure if there's enough interest and volunteers, then we can do something here. The BCC side changes (this PR) need to be fixed by someone who needs BPFd for their usecases and @jcanseco 's internship at Android which was to get BCC changes completed for BPFd has come to an end (and he's going back to school and said wont get any time).

The other thing to consider is, this PR only makes BCC's python front-end work with BPFd, where as with androdeb, all front ends and even tools like bpftrace will work on Android. So that's why I have been using androdeb more now.

About the BPFd project: Several nice things and side-effects like BCC fixes came out of the BPFd project, and thanks a lot to @jcanseco and @yonghong-song for that - so I would still look at BPFd as a success and a nice experiment - certainly I learned so much about BCC internals from it.

@nirmoy If you want to continue with BPFd work and have usecases for it, please find the latest work in the following branches:

BCC tree (BCC master with changes to make it work with BPFd):
https://github.com/joelagnel/bcc/tree/bcc-bpfd-upstream

Stand-alone BPFd which you can cross-compile and push as a proxy to targets:
https://github.com/joelagnel/bpfd

Thanks a lot.

@Kullu14
Copy link

Kullu14 commented Oct 12, 2018

Is this PR opened yet ? We are really looking run bcc on host target environment. Any leads would be a great help.

@Kullu14
Copy link

Kullu14 commented Oct 12, 2018

@jcanseco were you trying to use bpfd on host and target environment (2 VMs, non-android platforms). You tried it on same machine (remote process).

@joelagnel
Copy link
Contributor

joelagnel commented Oct 12, 2018 via email

@Kullu14
Copy link

Kullu14 commented Oct 12, 2018

@joelagnel I have already tried some of bcc examples locally on the same host. I would really like to test it on host target environment (non-android). Is there anything specific do we need to do to just try some of the examples on host target environment?

@Kullu14
Copy link

Kullu14 commented Nov 19, 2018

@joelagnel, Sorry to bother you again. I could run BPFd on host target environment (non-android). But I need root permission on host to run any tools(filetop.py). Is there any way to run those tools (filetop.py) as normal user ?

@yonghong-song
Copy link
Collaborator

unfortunately, filetop.py uses kprobe which requires CAP_SYS_ADMIN.

@joelagnel
Copy link
Contributor

joelagnel commented Nov 19, 2018 via email

@Kullu14
Copy link

Kullu14 commented Nov 20, 2018

@yonghong-song Thanks for the information.
I just have one question, my target has root permission and host doesn't have root permission. As per BPFd implementation, kprobes will be running on target. So Why do I need root permission for target?

@yonghong-song
Copy link
Collaborator

Could you help debug this a little bit since you have a test case here?

@joelagnel
Copy link
Contributor

The target loads the BPF program. If the program is of type KPROBE, then you need to running as CAP_SYS_ADMIN. I actually hit this too. Only the networking related program types like SOCK_FILTER donot need CAP_SYS_ADMIN

@Kullu14
Copy link

Kullu14 commented Nov 21, 2018

@joel Sorry.. My mistake.. My question was why do I need root permission for host not target?
For host, I should be able to use any tool with normal user permission right?

@Kullu14
Copy link

Kullu14 commented Jan 29, 2019

@joel I have root permission available on target. But I dont have root permission on host while I am running a script there. why do I need to have root permission on host any idea ?

@joelagnel
Copy link
Contributor

Your username or shell may not have the cap_sys_admin needed for kprobes. Many bcc tools use kprobes under the hood

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants