-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep multiple per-node remoteConns in localSite #11074
Conversation
As @fspmarshall pointed out, the one edit: from some local tests it seems like the contention for even 60k connections is not enough to cause issues, so it's just something to keep in mind for later |
c3ecdeb
to
08ad76f
Compare
08ad76f
to
e75bf0e
Compare
982e4ff
to
d30ba60
Compare
e75bf0e
to
1c5a3f8
Compare
ac63512
to
1c5a3f8
Compare
for key := range s.remoteConns { | ||
if s.remoteConns[key].isInvalid() { | ||
for key, conns := range s.remoteConns { | ||
validConns := conns[:0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this generally preferred to any other method of obtaining an empty slice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is very much a specific trick to filter items out of a slice in-place if you know that you're the only reference to the underlying buffer. validConns
is an empty slice at the beginning of conns
, so appending to it will overwrite the same buffer instead of allocating a new one. This would blow up in a very weird way if we ever returned a slice to the same memory, but here we're the only ones in possession of those buffers and we're only returning copies of the values, so we're good - and avoiding allocations here it's kind of important, as this happens for all existing tunnels on every getRemoteConn
.
This PR makes it so that we keep track of all the open reverse tunnels coming from a node, not just the last one.
This is important during restarts, as there's no guarantee that the last connection that we accepted is from the process that will ultimately continue running instead of being a spurious connection from a process that will shut down soon after - and, in fact, the decision on which process will persist in a graceful upgrade depends entirely on the user, as it's entirely possible that the new process will get terminated and the old process is then meant to continue working as is.
Fixes a connectivity loss that occasionally happens to services behind reverse tunnel during CA rotations (as both proxy and nodes will restart multiple times), especially noticeable with the kube agent.