Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep multiple per-node remoteConns in localSite #11074

Merged
merged 3 commits into from
Mar 16, 2022

Conversation

espadolini
Copy link
Contributor

@espadolini espadolini commented Mar 11, 2022

This PR makes it so that we keep track of all the open reverse tunnels coming from a node, not just the last one.

This is important during restarts, as there's no guarantee that the last connection that we accepted is from the process that will ultimately continue running instead of being a spurious connection from a process that will shut down soon after - and, in fact, the decision on which process will persist in a graceful upgrade depends entirely on the user, as it's entirely possible that the new process will get terminated and the old process is then meant to continue working as is.

Fixes a connectivity loss that occasionally happens to services behind reverse tunnel during CA rotations (as both proxy and nodes will restart multiple times), especially noticeable with the kube agent.

@espadolini espadolini added the robustness Resistance to crashes and reliability label Mar 11, 2022
@espadolini espadolini changed the base branch from master to espadolini/ephemeral-cache March 11, 2022 18:49
@espadolini
Copy link
Contributor Author

espadolini commented Mar 11, 2022

As @fspmarshall pointed out, the one localSite's remoteConns map is a massive contention point as every reverse tunnel service (including each individual node) will attempt to register itself on every (re)start, so it might be worth it to shard it.

edit: from some local tests it seems like the contention for even 60k connections is not enough to cause issues, so it's just something to keep in mind for later

@espadolini espadolini force-pushed the espadolini/localsite-multimap branch 2 times, most recently from c3ecdeb to 08ad76f Compare March 14, 2022 16:15
@espadolini espadolini changed the base branch from espadolini/ephemeral-cache to master March 14, 2022 17:25
@espadolini espadolini force-pushed the espadolini/localsite-multimap branch from 08ad76f to e75bf0e Compare March 14, 2022 17:26
@espadolini espadolini changed the base branch from master to espadolini/ephemeral-cache March 14, 2022 17:28
@espadolini espadolini force-pushed the espadolini/ephemeral-cache branch from 982e4ff to d30ba60 Compare March 15, 2022 17:09
@espadolini espadolini force-pushed the espadolini/localsite-multimap branch from e75bf0e to 1c5a3f8 Compare March 15, 2022 17:13
@espadolini espadolini changed the base branch from espadolini/ephemeral-cache to master March 15, 2022 17:15
@espadolini espadolini marked this pull request as ready for review March 15, 2022 17:48
@github-actions github-actions bot requested review from r0mant and ravicious March 15, 2022 17:49
@espadolini espadolini force-pushed the espadolini/localsite-multimap branch from ac63512 to 1c5a3f8 Compare March 15, 2022 17:50
for key := range s.remoteConns {
if s.remoteConns[key].isInvalid() {
for key, conns := range s.remoteConns {
validConns := conns[:0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this generally preferred to any other method of obtaining an empty slice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is very much a specific trick to filter items out of a slice in-place if you know that you're the only reference to the underlying buffer. validConns is an empty slice at the beginning of conns, so appending to it will overwrite the same buffer instead of allocating a new one. This would blow up in a very weird way if we ever returned a slice to the same memory, but here we're the only ones in possession of those buffers and we're only returning copies of the values, so we're good - and avoiding allocations here it's kind of important, as this happens for all existing tunnels on every getRemoteConn.

@espadolini espadolini enabled auto-merge (squash) March 16, 2022 12:09
@espadolini espadolini merged commit 4d99f1c into master Mar 16, 2022
@espadolini espadolini deleted the espadolini/localsite-multimap branch March 16, 2022 12:17
@webvictim webvictim mentioned this pull request Apr 19, 2022
@webvictim webvictim mentioned this pull request Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
robustness Resistance to crashes and reliability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants