List of endpoints inside probe can grow over time #3645

bboreham · 2019-07-01T16:42:37Z

Another picture showing report sizes over time, similar to #3576

It's not clear what the trigger is, but looking inside the reports the number of Endpoints grows by a few every time; up to 36,000 after 8 days. Restarting the probe resets the size down to normal levels.

conntrack -L on the node lists around 8,000 connections, mostly in TIME_WAIT.

The text was updated successfully, but these errors were encountered:

bboreham · 2019-07-04T13:02:56Z

I suspect all the probes that are affected are using conntrack rather than ebpf. E.g. I see this in the logs:

<probe> ERRO: 2019/07/01 13:39:49.650529 tcp tracer received event with timestamp 726896371146595 even though the last timestamp was 726896371186140. Stopping the eBPF tracker.
<probe> WARN: 2019/07/01 13:39:50.818972 ebpf tracker died, restarting it
<probe> ERRO: 2019/07/01 13:42:41.287671 tcp tracer received event with timestamp 727068008148899 even though the last timestamp was 727068008151558. Stopping the eBPF tracker.
<probe> WARN: 2019/07/01 13:42:41.816997 ebpf tracker died again, gently falling back to proc scanning

bboreham · 2019-07-05T08:53:09Z

Still happening after #3648.
Right now I am out of ideas how we can be leaking connections.
I wonder if we should do a periodic resync, e.g. once an hour, which would mask whatever is really causing it.

bboreham · 2019-07-09T11:19:30Z

I had an idea to improve the first problem - ebpf tracker died: iovisor/gobpf#42 (comment)

bboreham · 2019-08-02T14:00:00Z

#3653 made a big improvement, according to my stats, but still seeing fall-back to conntrack in a few cases and then constant growth over hours.

bboreham added the performance Excessive resource usage and latency; usually a bug or chore label Jul 1, 2019

bboreham mentioned this issue Jul 4, 2019

fix: handle errors reported by the conntrack package #3648

Merged

bboreham mentioned this issue Aug 13, 2019

Reduce leaks in probe endpoint reporter #3661

Merged

qiell closed this as completed in #3661 Aug 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List of endpoints inside probe can grow over time #3645

List of endpoints inside probe can grow over time #3645

bboreham commented Jul 1, 2019

bboreham commented Jul 4, 2019

bboreham commented Jul 5, 2019

bboreham commented Jul 9, 2019

bboreham commented Aug 2, 2019

List of endpoints inside probe can grow over time #3645

List of endpoints inside probe can grow over time #3645

Comments

bboreham commented Jul 1, 2019

bboreham commented Jul 4, 2019

bboreham commented Jul 5, 2019

bboreham commented Jul 9, 2019

bboreham commented Aug 2, 2019