Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle fdinstall events from tcptracer-bpf (aka "accept before kretprobe" issue) #2518

Merged
merged 4 commits into from
May 19, 2017

Conversation

alban
Copy link
Contributor

@alban alban commented May 11, 2017

During the EbpfTracker initialization, find all the processes that are currently blocked on an accept() syscall. feedInitialConnections() will use t.tracer.AddFdInstallWatcher() to subscribe to fd_install events. When a fd_install event is received, synthesise an accept event with the connection tuple and the network namespace (from /proc).

TODO:

  • re-vendor tcptracer-bpf (when add probes on fd_install to catch accept events before kretprobe was installed tcptracer-bpf#39 is merged)
  • tcptracer-bpf event struct should have a field for fd instead of reusing netns
  • handle multi-threaded programs that run blocking accept() syscalls (look in /proc/*/tasks/*/wchan instead of /proc/*/wchan)
  • find the network namespace (currently hard-coded)
  • find the connection tuple (currently hard-coded).
  • handle IPv4-mapped IPv6 addresses
  • fix lint / tests
  • integration test

This is the Scope part of weaveworks/tcptracer-bpf#39
Issue: weaveworks/tcptracer-bpf#10

@@ -166,6 +170,16 @@ func (w *walker) readCmdline(filename string) (cmdline, name string) {
return
}

func IsProcInAccept(procRoot, filename string) (ret bool) {

This comment was marked as abuse.

@2opremio
Copy link
Contributor

2opremio commented May 11, 2017

I thought you were going to extend the proc connection walker to also obtain TCP sockets in LISTEN state. I am a bit concerned of the performance impact of checking pid/wchan.

Also:

find the network namespace (currently hard-coded)

You get this for free out of the proc connection walker.

@2opremio
Copy link
Contributor

handle multi-threaded programs that run blocking accept() syscalls (look in /proc//tasks//wchan instead of /proc/*/wchan)

Won't /proc/pid/wchan also include the content of the tasks?

@alban
Copy link
Contributor Author

alban commented May 11, 2017

I thought you were going to extend the proc connection walker to also obtain TCP sockets in LISTEN state.

That was what I wanted to do initially but then I realised that the set of processes that have sockets in the LISTEN state is bigger than the set of process blocked on the accept syscall. On my laptop, I have only one process blocked on the accept syscall (the test program busybox nc -l 192.168.35.216 -p 8181). But I have more listening sockets (9 just for the initial network namespace sudo netstat -anpel|grep tcp.*LISTEN|wc -l)

I am a bit concerned of the performance impact of checking pid/wchan.

This is done only one time during the initialization of the EbpfTracker when Scope starts.

You get this (network namespace) for free out of the proc connection walker.

This part is not during the initialization but when we receive the fd_install event from tcptracer-bpf. Other events (connect, accept, close) includes the network namespace because the kprobe for them is on a kernel function with the struct sock. But we don't get the network namespace from the fd_install event because we don't have the struct sock, only the struct file.

While it should be possible to get the struct sock from the struct file in general, we don't have any "guess offset" algorithm to find out where to look in the struct file. I thought it was not worth adding complexity in the "guess offset" algorithm because we can get the network namespace from userspace and we need to do that only one time per process that were blocked on an accept syscall at the time of Scope startup.

Won't /proc/pid/wchan also include the content of the tasks?

No, it seems to only report the function of the main thread. For example:

$ for i in  /proc/29241/task/*/wchan  ; do cat $i ; echo ; done
futex_wait_queue_me
futex_wait_queue_me
futex_wait_queue_me
futex_wait_queue_me
futex_wait_queue_me
futex_wait_queue_me
ep_poll
ep_poll
futex_wait_queue_me
futex_wait_queue_me
futex_wait_queue_me
futex_wait_queue_me
futex_wait_queue_me
futex_wait_queue_me
futex_wait_queue_me
futex_wait_queue_me
$ cat /proc/29241/wchan ; echo
futex_wait_queue_me

@2opremio
Copy link
Contributor

2opremio commented May 11, 2017

That was what I wanted to do initially but then I realised that the set of processes that have sockets in the LISTEN state is bigger than the set of process blocked on the accept syscall. On my laptop, I have only one process blocked on the accept syscall (the test program busybox nc -l 192.168.35.216 -p 8181). But I have more listening sockets (9 just for the initial network namespace sudo netstat -anpel|grep tcp.*LISTEN|wc -l)

Would this just impact performance or also accuracy? (If the impact on performance is just an eBPF map lookup with a larger number of entries then I guess it's OK)

This is done only one time during the initialization of the EbpfTracker when Scope starts.

Good point.

@alban
Copy link
Contributor Author

alban commented May 11, 2017

Would this impact performance or also accuracy? (If the impact on performance is just a map lookup with a larger map then I guess it's OK)

Only performances, and the performance with the wchan files should be better since we would receive less false positive "fd_install" events.

@2opremio
Copy link
Contributor

Only performances, and the performance with the wchan files should be better since we would receive less false positive "fd_install" events.

But the false positives won't impact performance so much since it's a map, which has a fast access.

On the other hand, accessing wchan seems to complicate the code.

@2opremio
Copy link
Contributor

2opremio commented May 11, 2017

I would also like to see some integration tests for this.

vendor/manifest Outdated
"revision": "b715a3b635b8d9c4a096bbd6009826b57fe64c38",
"branch": "master",
"revision": "3b09ef1351d865c0b2519e1486f6c88db3905780",
"branch": "alban/fdinstall",

This comment was marked as abuse.

@alban alban force-pushed the alban/fdinstall branch 10 times, most recently from 7b349c5 to 71f25fc Compare May 17, 2017 15:56
@alban
Copy link
Contributor Author

alban commented May 17, 2017

I added an integration test (314_container_accept_before_kretprobe_test.sh) and it passes now.

@2opremio
Copy link
Contributor

But the false positives won't impact performance so much since it's a map, which has a fast access.

On the other hand, accessing wchan seems to complicate the code.

@alban Are you finally discarding the option of obtaining the sockets in LISTEN state from the proc walker?

@alban
Copy link
Contributor Author

alban commented May 18, 2017

But the false positives won't impact performance so much since it's a map, which has a fast access.

@alban Are you finally discarding the option of obtaining the sockets in LISTEN state from the proc walker?

Yes

The process of getting an otherwise-missed accept event should happen only one time for processes that were blocked in the accept() syscall. Once the accept() syscall returns, we don't need that mechanism with fd_install events anymore. So we need a way to know when we can stop the monitoring of fd_install events.

If we monitor the fd_install events for all processes that own a socket in the LISTEN state, we don't know when to stop that monitoring because servers normally keep a socket in the LISTEN state even after a connection has been established (for the purpose of accepting future connections).

So that would be receiving an fd_install event for all servers, during their lifetime. We also receive an install event for file descriptors that are not a socket because the filtering on the file descriptor kind is done in userspace. So using wchan, we minimize that.

@2opremio
Copy link
Contributor

Fair enough


weave_on "$HOST1" launch

# Launch the server before Scope

This comment was marked as abuse.

This comment was marked as abuse.

list_containers "$HOST1"
list_connections "$HOST1"

has_connection containers "$HOST1" client server

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@alban alban force-pushed the alban/fdinstall branch from 71f25fc to a31e7df Compare May 18, 2017 16:01
@@ -43,14 +43,15 @@ scope_end_suite() {
list_containers() {
local host=$1
echo "Listing containers on ${host}:"
curl -s "http://${host}:4040/api/topology/containers?system=show" | jq -r '.nodes[] | select(has("metadata")) | .metadata[] | select(.id == "docker_image_name") | .value'
curl -s "http://${host}:4040/api/topology/containers?system=show" | jq -r '.nodes[] | select(has("metadata")) | { "image": .metadata[] | select(.id == "docker_image_name") | .value, "label": .label, "id": .id} | .id + " (" + .image + ", " + .label + ")"'

This comment was marked as abuse.

netNamespacePath := fmt.Sprintf("/proc/%d/ns/net", pid)
var statNsFile syscall.Stat_t
err := fs.Stat(netNamespacePath, &statNsFile)
if err != nil {

This comment was marked as abuse.

This comment was marked as abuse.

fdFilename := fmt.Sprintf("/proc/%d/fd/%d", pid, fd)
var statFdFile syscall.Stat_t
err = fs.Stat(fdFilename, &statFdFile)
if err != nil {

This comment was marked as abuse.

This comment was marked as abuse.

@@ -125,6 +133,99 @@ func lostCb(count uint64) {
ebpfTracker.stop()
}

func tupleFromPidFd(pid int, fd int) (tuple fourTuple, netns string, ok bool) {

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@alban alban force-pushed the alban/fdinstall branch from a31e7df to 74ae8f2 Compare May 19, 2017 12:37
@alban alban changed the title [RFC][WIP] handle fdinstall events from tcptracer-bpf (aka "accept before kretprobe" issue) handle fdinstall events from tcptracer-bpf (aka "accept before kretprobe" issue) May 19, 2017
@alban
Copy link
Contributor Author

alban commented May 19, 2017

I rebased, re-vendored tcptracer-bpf and factorized the 2 /proc readings.

I also pre-fetch the busybox image in the integration tests on GCE since we have a test using busybox now, following the same pattern as the other images.

Let's see if the tests pass on CircleCI.

alban added 3 commits May 19, 2017 14:49
Since weaveworks/tcptracer-bpf#39, tcptracer-bpf
can generate "fd_install" events when a process installs a new file
descriptor in its fd table. Those events must be requested explicitely
on a per-pid basis with tracer.AddFdInstallWatcher(pid).

This is useful to know about "accept" events that would otherwise be
missed because kretprobes are not triggered for functions that were
called before the installation of the kretprobe.

This patch find all the processes that are currently blocked on an
accept() syscall during the EbpfTracker initialization.
feedInitialConnections() will use tracer.AddFdInstallWatcher() to
subscribe to fd_install  events. When a fd_install event is received,
synthesise an accept event with the connection tuple and the network
namespace (from /proc).
@alban alban force-pushed the alban/fdinstall branch from 74ae8f2 to b761b5e Compare May 19, 2017 12:50
@2opremio
Copy link
Contributor

LGTM (on green)

@alban
Copy link
Contributor Author

alban commented May 19, 2017

green

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants