Keep multiple per-node remoteConns in localSite #11074

espadolini · 2022-03-11T18:48:55Z

This PR makes it so that we keep track of all the open reverse tunnels coming from a node, not just the last one.

This is important during restarts, as there's no guarantee that the last connection that we accepted is from the process that will ultimately continue running instead of being a spurious connection from a process that will shut down soon after - and, in fact, the decision on which process will persist in a graceful upgrade depends entirely on the user, as it's entirely possible that the new process will get terminated and the old process is then meant to continue working as is.

Fixes a connectivity loss that occasionally happens to services behind reverse tunnel during CA rotations (as both proxy and nodes will restart multiple times), especially noticeable with the kube agent.

espadolini · 2022-03-11T18:51:26Z

As @fspmarshall pointed out, the one localSite's remoteConns map is a massive contention point as every reverse tunnel service (including each individual node) will attempt to register itself on every (re)start, so it might be worth it to shard it.

edit: from some local tests it seems like the contention for even 60k connections is not enough to cause issues, so it's just something to keep in mind for later

lib/reversetunnel/localsite.go

ravicious · 2022-03-16T10:58:02Z

lib/reversetunnel/localsite.go

-	for key := range s.remoteConns {
-		if s.remoteConns[key].isInvalid() {
+	for key, conns := range s.remoteConns {
+		validConns := conns[:0]


Is this generally preferred to any other method of obtaining an empty slice?

No, this is very much a specific trick to filter items out of a slice in-place if you know that you're the only reference to the underlying buffer. validConns is an empty slice at the beginning of conns, so appending to it will overwrite the same buffer instead of allocating a new one. This would blow up in a very weird way if we ever returned a slice to the same memory, but here we're the only ones in possession of those buffers and we're only returning copies of the values, so we're good - and avoiding allocations here it's kind of important, as this happens for all existing tunnels on every getRemoteConn.

espadolini added the robustness Resistance to crashes and reliability label Mar 11, 2022

espadolini requested review from fspmarshall and rosstimothy March 11, 2022 18:48

espadolini changed the base branch from master to espadolini/ephemeral-cache March 11, 2022 18:49

espadolini force-pushed the espadolini/localsite-multimap branch 2 times, most recently from c3ecdeb to 08ad76f Compare March 14, 2022 16:15

espadolini changed the base branch from espadolini/ephemeral-cache to master March 14, 2022 17:25

espadolini force-pushed the espadolini/localsite-multimap branch from 08ad76f to e75bf0e Compare March 14, 2022 17:26

espadolini changed the base branch from master to espadolini/ephemeral-cache March 14, 2022 17:28

rosstimothy reviewed Mar 15, 2022

View reviewed changes

lib/reversetunnel/localsite.go Show resolved Hide resolved

espadolini force-pushed the espadolini/ephemeral-cache branch from 982e4ff to d30ba60 Compare March 15, 2022 17:09

espadolini added 2 commits March 15, 2022 18:13

Keep multiple per-node remoteConns in localSite

4d464c3

Test coverage

1c5a3f8

espadolini force-pushed the espadolini/localsite-multimap branch from e75bf0e to 1c5a3f8 Compare March 15, 2022 17:13

espadolini changed the base branch from espadolini/ephemeral-cache to master March 15, 2022 17:15

espadolini marked this pull request as ready for review March 15, 2022 17:48

github-actions bot requested review from r0mant and ravicious March 15, 2022 17:49

espadolini force-pushed the espadolini/localsite-multimap branch from ac63512 to 1c5a3f8 Compare March 15, 2022 17:50

rosstimothy approved these changes Mar 15, 2022

View reviewed changes

ravicious approved these changes Mar 16, 2022

View reviewed changes

probakowski approved these changes Mar 16, 2022

View reviewed changes

Merge branch 'master' into espadolini/localsite-multimap

ee2e63e

espadolini enabled auto-merge (squash) March 16, 2022 12:09

espadolini merged commit 4d99f1c into master Mar 16, 2022

espadolini deleted the espadolini/localsite-multimap branch March 16, 2022 12:17

espadolini added the backport-required label Mar 16, 2022

espadolini mentioned this pull request Mar 16, 2022

[v9] backport #11074 (localSite multimap) #11184

Merged

espadolini mentioned this pull request Mar 16, 2022

[v8] backport #11074 (localSite multimap) #11185

Merged

espadolini mentioned this pull request Mar 16, 2022

[v7] backport #11074 (localSite multimap) #11186

Merged

espadolini added a commit that referenced this pull request Mar 16, 2022

Keep multiple per-node remoteConns in localSite (#11074)

3afdb79

espadolini removed the backport-required label Mar 16, 2022

espadolini added a commit that referenced this pull request Mar 17, 2022

Keep multiple per-node remoteConns in localSite (#11074)

2a897c7

espadolini added a commit that referenced this pull request Mar 17, 2022

Keep multiple per-node remoteConns in localSite (#11074)

4cd0312

espadolini added a commit that referenced this pull request Mar 18, 2022

Keep multiple per-node remoteConns in localSite (#11074) (#11186)

084455c

espadolini added a commit that referenced this pull request Mar 18, 2022

Keep multiple per-node remoteConns in localSite (#11074) (#11184)

57ba25a

espadolini added a commit that referenced this pull request Mar 18, 2022

Keep multiple per-node remoteConns in localSite (#11074) (#11185)

34b6fd0

This was referenced Mar 21, 2022

SIGHUP of proxy results in TLS errors until original process exits. #6945

Closed

CA rotation does not apply to teleport-kube-agent pods, preventing them from rejoining a cluster #9815

Closed

espadolini mentioned this pull request Apr 1, 2022

sql: database is closed errors after bouncing teleport process on proxy server #5083

Closed

webvictim mentioned this pull request Apr 19, 2022

opened in error #12068

Closed

webvictim mentioned this pull request Jun 8, 2022

Opened in error #13285

Closed

alexatcanva mentioned this pull request Oct 13, 2022

BUGFIX | Fix Teleport ALPN Proxy not being HTTP CONNECT Proxy Aware alexatcanva/teleport#30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep multiple per-node remoteConns in localSite #11074

Keep multiple per-node remoteConns in localSite #11074

espadolini commented Mar 11, 2022 •

edited

Loading

espadolini commented Mar 11, 2022 •

edited

Loading

ravicious Mar 16, 2022

espadolini Mar 16, 2022

Keep multiple per-node remoteConns in localSite #11074

Keep multiple per-node remoteConns in localSite #11074

Conversation

espadolini commented Mar 11, 2022 • edited Loading

espadolini commented Mar 11, 2022 • edited Loading

ravicious Mar 16, 2022

Choose a reason for hiding this comment

espadolini Mar 16, 2022

Choose a reason for hiding this comment

espadolini commented Mar 11, 2022 •

edited

Loading

espadolini commented Mar 11, 2022 •

edited

Loading