Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ebpf connection tracker: perf map fixes #2507

Merged
merged 2 commits into from
May 16, 2017

Conversation

alban
Copy link
Contributor

@alban alban commented May 9, 2017

This adds fixes in the perf map that receives the tcp events from the ebpf probes:

  • gobpf: use the correct timestamp function instead of assuming the timestamp is at offset zero in the struct
  • gobpf: fix loop on events that could drop events
  • gobpf: add a go channel to notify lost events
  • gobpf: make the size of the ring buffers configurable instead of using 8 page (32KB)
  • tcptracer-bpf: change the size of the ring buffer "maps/tcp_event_ipv4" to 256 pages (1MB)
  • tcptracer-bpf: use the go channel to notify lost events
  • scope: add a callback on lost events: log the error and stop the ebpf tracker correctly. So if it ever happens, we will know about it.

This is to address the address the following failure that happened a few weeks ago:

<probe> ERRO: 2017/04/13 07:34:17.299127 tcp tracer received event with timestamp 677884430530033 even though the last timestamp was 677884430530738. Stopping the eBPF tracker.
<probe> WARN: 2017/04/13 07:34:17.337241 ebpf tracker died, gently falling back to proc scanning

This is a "work in progress" because this is vendoring unmerged branches in gobpf and tcptracer-bpf.


I tried to reduce the size of the perf ring buffers to one page (4KB) in order to test it. I made lots of connections in parallel in 4 terminals (but on the same cpu to stress one perf ring buffer):

for i in $(seq 1 10000) ; do echo -n "$i " ; taskset --cpu-list 2 wget -O /dev/null http://172.17.0.2 2>/dev/null; done

Then, I checked that the fallback worked correctly:

<probe> ERRO: 2017/05/09 09:17:40.798823 tcp tracer lost 7 events. Stopping the eBPF tracker
<probe> WARN: 2017/05/09 09:17:41.701317 ebpf tracker died, gently falling back to proc scanning

/cc @iaguis @2opremio

@alban alban mentioned this pull request May 9, 2017
4 tasks
alban added 2 commits May 10, 2017 18:37
Lost events were previously unnoticed. This patch adds an error in the
log and stops the ebpf tracker if an event is lost.
@alban alban force-pushed the alban/perf-map-fixes branch from 86bdeee to 9079677 Compare May 10, 2017 16:37
@alban
Copy link
Contributor Author

alban commented May 10, 2017

I rebased and revendored gobpf + tcptracer-bpf since PRs have been merged.

I'm keeping the "WIP" title because I'd like to test again to make sure nothing was wrong in the rebase/revendor/git-subtrees changes.

@alban
Copy link
Contributor Author

alban commented May 11, 2017

I tried to quickly create plenty of connections to nginx on several cpus with this script (in several terminals):

cpu=0
for i in $(seq 1 10000) ; do
  echo -n "$i "
  taskset --cpu-list $cpu wget -O /dev/null http://172.17.0.2 2>/dev/null
done
echo

And I didn't notice any problems in the logs.

However, chrome quickly took all the cpu and crashed:

chrome-crash

@alban alban changed the title [WIP] ebpf connection tracker: perf map fixes ebpf connection tracker: perf map fixes May 11, 2017
@2opremio
Copy link
Contributor

Mind creating a separate ticket for this?

@2opremio
Copy link
Contributor

(It's not a real scenario, but it's worth keeping track of it)

@alban
Copy link
Contributor Author

alban commented May 11, 2017

Mind creating a separate ticket for this?

#2517

@alban
Copy link
Contributor Author

alban commented May 15, 2017

@2opremio PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants