ebpf connection tracker: perf map fixes #2507

alban · 2017-05-09T09:35:17Z

This adds fixes in the perf map that receives the tcp events from the ebpf probes:

gobpf: use the correct timestamp function instead of assuming the timestamp is at offset zero in the struct
gobpf: fix loop on events that could drop events
gobpf: add a go channel to notify lost events
gobpf: make the size of the ring buffers configurable instead of using 8 page (32KB)
tcptracer-bpf: change the size of the ring buffer "maps/tcp_event_ipv4" to 256 pages (1MB)
tcptracer-bpf: use the go channel to notify lost events
scope: add a callback on lost events: log the error and stop the ebpf tracker correctly. So if it ever happens, we will know about it.

This is to address the address the following failure that happened a few weeks ago:

<probe> ERRO: 2017/04/13 07:34:17.299127 tcp tracer received event with timestamp 677884430530033 even though the last timestamp was 677884430530738. Stopping the eBPF tracker.
<probe> WARN: 2017/04/13 07:34:17.337241 ebpf tracker died, gently falling back to proc scanning

This is a "work in progress" because this is vendoring unmerged branches in gobpf and tcptracer-bpf.

I tried to reduce the size of the perf ring buffers to one page (4KB) in order to test it. I made lots of connections in parallel in 4 terminals (but on the same cpu to stress one perf ring buffer):

for i in $(seq 1 10000) ; do echo -n "$i " ; taskset --cpu-list 2 wget -O /dev/null http://172.17.0.2 2>/dev/null; done

Then, I checked that the fallback worked correctly:

<probe> ERRO: 2017/05/09 09:17:40.798823 tcp tracer lost 7 events. Stopping the eBPF tracker
<probe> WARN: 2017/05/09 09:17:41.701317 ebpf tracker died, gently falling back to proc scanning

/cc @iaguis @2opremio

Lost events were previously unnoticed. This patch adds an error in the log and stops the ebpf tracker if an event is lost.

alban · 2017-05-10T16:40:55Z

I rebased and revendored gobpf + tcptracer-bpf since PRs have been merged.

I'm keeping the "WIP" title because I'd like to test again to make sure nothing was wrong in the rebase/revendor/git-subtrees changes.

alban · 2017-05-11T09:29:40Z

I tried to quickly create plenty of connections to nginx on several cpus with this script (in several terminals):

cpu=0
for i in $(seq 1 10000) ; do
  echo -n "$i "
  taskset --cpu-list $cpu wget -O /dev/null http://172.17.0.2 2>/dev/null
done
echo

And I didn't notice any problems in the logs.

However, chrome quickly took all the cpu and crashed:

2opremio · 2017-05-11T09:30:38Z

Mind creating a separate ticket for this?

2opremio · 2017-05-11T09:31:12Z

(It's not a real scenario, but it's worth keeping track of it)

alban · 2017-05-11T09:34:49Z

Mind creating a separate ticket for this?

#2517

alban · 2017-05-15T13:17:09Z

@2opremio PTAL

alban mentioned this pull request May 9, 2017

perf map fixes weaveworks/tcptracer-bpf#37

Merged

4 tasks

alban added 2 commits May 10, 2017 18:37

vendoring: update gobpf and tcptracer-bpf

fc0e449

ebpf tracker: add callback for lost events

9079677

Lost events were previously unnoticed. This patch adds an error in the log and stops the ebpf tracker if an event is lost.

alban force-pushed the alban/perf-map-fixes branch from 86bdeee to 9079677 Compare May 10, 2017 16:37

alban changed the title ~~[WIP] ebpf connection tracker: perf map fixes~~ ebpf connection tracker: perf map fixes May 11, 2017

alban mentioned this pull request May 11, 2017

Browser taking a lot of cpu with lots of connections and then crashes #2517

Open

2opremio approved these changes May 16, 2017

View reviewed changes

2opremio merged commit 5079c11 into weaveworks:master May 16, 2017

This was referenced Jun 27, 2017

ebpf: fall back to proc parsing when we detect a late event #2334

Closed

probes falling back to proc parsing due to late eBPF events #2650

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ebpf connection tracker: perf map fixes #2507

ebpf connection tracker: perf map fixes #2507

alban commented May 9, 2017 •

edited

Loading

alban commented May 10, 2017

alban commented May 11, 2017

2opremio commented May 11, 2017

2opremio commented May 11, 2017

alban commented May 11, 2017

alban commented May 15, 2017

ebpf connection tracker: perf map fixes #2507

ebpf connection tracker: perf map fixes #2507

Conversation

alban commented May 9, 2017 • edited Loading

alban commented May 10, 2017

alban commented May 11, 2017

2opremio commented May 11, 2017

2opremio commented May 11, 2017

alban commented May 11, 2017

alban commented May 15, 2017

alban commented May 9, 2017 •

edited

Loading