-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connections created during eBPF initialisation can be missed #2689
Comments
btw, there's a small concurrency bug here too: |
I am tempted to simply ditch We then need to be careful not to accidentally add back connections which get closed between steps 5 and 7. A cheap way to do that is to not add a procspied connection when its 4-tuple is in |
There really should be two variants - with and without eBPF - but the latter is broken due to #2689.
To reproduce the first type of failure - connections missed completely - apply diff --git a/probe/endpoint/connection_tracker.go b/probe/endpoint/connection_tracker.go
index 7d6f7ea2..9cd90bcc 100644
--- a/probe/endpoint/connection_tracker.go
+++ b/probe/endpoint/connection_tracker.go
@@ -2,6 +2,7 @@ package endpoint
import (
"strconv"
+ "time"
log "github.com/Sirupsen/logrus"
"github.com/weaveworks/scope/probe/endpoint/procspy"
@@ -170,6 +171,8 @@ func (t *connectionTracker) getInitialState() {
}
})
+ time.Sleep(10 * time.Second)
+
t.ebpfTracker.feedInitialConnections(conns, seenTuples, processesWaitingInAccept, report.MakeHostNodeID(t.conf.HostID))
} and then run scope normally and within 10 seconds execute
Then inspect the json:
|
Actually, step 2 obtains the connection information. Step 5 only reads That makes the window for the "connections present but missing pids" type failure quite narrow, moving it inside the proc walking logic. |
The "connections present but missing pids" failure can be reproduced by applying diff --git a/probe/endpoint/procspy/proc_linux.go b/probe/endpoint/procspy/proc_linux.go
index ec676072..0eccabac 100644
--- a/probe/endpoint/procspy/proc_linux.go
+++ b/probe/endpoint/procspy/proc_linux.go
@@ -254,6 +254,8 @@ func (w pidWalker) walk(buf *bytes.Buffer) (map[uint64]*Proc, error) {
namespaces[namespaceID] = append(namespaces[namespaceID], &p)
})
+ time.Sleep(10 * time.Second)
+
for namespaceID, procs := range namespaces {
select {
case <-w.tickc: and following the same steps as above. The json inspection reveals:
|
...when initialising eBPF-based connection tracking. Previously we were ignoring all eBPF events until we had gathered the existing connections. That means we could a) miss connections created during the gathering, and b) fail to forget connections that got closed during the gathering. The fix comprises the following changes: 1. pay attention to eBPF events immediately. That way we do not miss anything. 2. remember connections for which we received a Close event during the initalisation phase, and subsequently drop gathered existing connections that match these. That way we do not erroneously consider a gathered connection as open when it got closed since the gathering. 3. drop gathered existing connections which match connections detected through eBPF events. The latter typically have more / current metadata. In particular, PIDs can be missing from the former. Fixes #2689. Fixes #2700.
...when initialising eBPF-based connection tracking. Previously we were ignoring all eBPF events until we had gathered the existing connections. That means we could a) miss connections created during the gathering, and b) fail to forget connections that got closed during the gathering. The fix comprises the following changes: 1. pay attention to eBPF events immediately. That way we do not miss anything. 2. remember connections for which we received a Close event during the initalisation phase, and subsequently drop gathered existing connections that match these. That way we do not erroneously consider a gathered connection as open when it got closed since the gathering. 3. drop gathered existing connections which match connections detected through eBPF events. The latter typically have more / current metadata. In particular, PIDs can be missing from the former. Fixes #2689. Fixes #2700.
eBPF-based connection tracking proceeds in the following stages:
tracer.NewTracer
. This will kick off some go-routines that send events to ourEbfTracker
by invokingtcpEventCbV4
. However, we ignore these events to start with (this is controlled via thereadyToHandleConnections
boolean member)/proc
, to obtain information about processes and the sockets they ownconntrack
run/proc
walk from (2)/net/tcp{6}
, create an iterator that produces connection information from the latter, annotated with pids from the former.EbpfTracker
.There are several ways in which connections created during this process can be missed, even when they are relatively long-lived, i.e. survive past step 7.
We have seen a number of test failures in the 330 and 340 tests in CircleCI, and @2opremio has done some instrumentation in #2674 to help us track down the cause. Analysis of the scope reports produced in run 7608 strongly suggests we are hitting the 2nd case since the report contains the expected edge but without associated PIDs at the endpoints.
The text was updated successfully, but these errors were encountered: