-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only send server updates to listeners when the opaque protocol changes #11907
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Alex Leong <[email protected]>
zaharidichev
approved these changes
Jan 10, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM good catch!
mateiidavid
approved these changes
Jan 10, 2024
mateiidavid
added a commit
that referenced
this pull request
Jan 12, 2024
This edge release introduces a number of different fixes and improvements. More notably, it introduces a new `cni-repair-controller` binary to the CNI plugin image. The controller will automatically restart pods that have not received their iptables configuration. * Removed shortnames from Tap API resources to avoid colliding with existing Kubernetes resources ([#11816]; fixes [#11784]) * Introduced a new ExternalWorkload CRD to support upcoming mesh expansion feature ([#11805]) * Changed `MeshTLSAuthentication` resource validation to allow SPIFFE URI identities ([#11882]) * Introduced a new `cni-repair-controller` to the `linkerd-cni` DaemonSet to automatically restart misconfigured pods that are missing iptables rules ([#11699]; fixes [#11073]) * Fixed a `"duplicate metrics"` warning in the multicluster service-mirror component ([#11875]; fixes [#11839]) * Added metric labels and weights to `linkerd diagnostics endpoints` json output ([#11889]) * Changed how `Server` updates are handled in the destination service. The change will ensure that during a cluster resync, consumers won't be overloaded by redundant updates ([#11907]) * Changed `linkerd install` error output to add a newline when a Kubernetes client cannot be successfully initialised [#11816]: #11816 [#11784]: #11784 [#11805]: #11805 [#11882]: #11882 [#11699]: #11699 [#11073]: #11073 [#11875]: #11875 [#11839]: #11839 [#11889]: #11889 [#11907]: #11907 [#11917]: #11917 Signed-off-by: Matei David <[email protected]>
Merged
mateiidavid
added a commit
that referenced
this pull request
Jan 12, 2024
This edge release introduces a number of different fixes and improvements. More notably, it introduces a new `cni-repair-controller` binary to the CNI plugin image. The controller will automatically restart pods that have not received their iptables configuration. * Removed shortnames from Tap API resources to avoid colliding with existing Kubernetes resources ([#11816]; fixes [#11784]) * Introduced a new ExternalWorkload CRD to support upcoming mesh expansion feature ([#11805]) * Changed `MeshTLSAuthentication` resource validation to allow SPIFFE URI identities ([#11882]) * Introduced a new `cni-repair-controller` to the `linkerd-cni` DaemonSet to automatically restart misconfigured pods that are missing iptables rules ([#11699]; fixes [#11073]) * Fixed a `"duplicate metrics"` warning in the multicluster service-mirror component ([#11875]; fixes [#11839]) * Added metric labels and weights to `linkerd diagnostics endpoints` json output ([#11889]) * Changed how `Server` updates are handled in the destination service. The change will ensure that during a cluster resync, consumers won't be overloaded by redundant updates ([#11907]) * Changed `linkerd install` error output to add a newline when a Kubernetes client cannot be successfully initialised ([#11917]) [#11816]: #11816 [#11784]: #11784 [#11805]: #11805 [#11882]: #11882 [#11699]: #11699 [#11073]: #11073 [#11875]: #11875 [#11839]: #11839 [#11889]: #11889 [#11907]: #11907 [#11917]: #11917 Signed-off-by: Matei David <[email protected]>
mateiidavid
added a commit
that referenced
this pull request
Jan 12, 2024
This edge release introduces a number of different fixes and improvements. More notably, it introduces a new `cni-repair-controller` binary to the CNI plugin image. The controller will automatically restart pods that have not received their iptables configuration. * Removed shortnames from Tap API resources to avoid colliding with existing Kubernetes resources ([#11816]; fixes [#11784]) * Introduced a new ExternalWorkload CRD to support upcoming mesh expansion feature ([#11805]) * Changed `MeshTLSAuthentication` resource validation to allow SPIFFE URI identities ([#11882]) * Introduced a new `cni-repair-controller` to the `linkerd-cni` DaemonSet to automatically restart misconfigured pods that are missing iptables rules ([#11699]; fixes [#11073]) * Fixed a `"duplicate metrics"` warning in the multicluster service-mirror component ([#11875]; fixes [#11839]) * Added metric labels and weights to `linkerd diagnostics endpoints` json output ([#11889]) * Changed how `Server` updates are handled in the destination service. The change will ensure that during a cluster resync, consumers won't be overloaded by redundant updates ([#11907]) * Changed `linkerd install` error output to add a newline when a Kubernetes client cannot be successfully initialised ([#11917]) [#11816]: #11816 [#11784]: #11784 [#11805]: #11805 [#11882]: #11882 [#11699]: #11699 [#11073]: #11073 [#11875]: #11875 [#11839]: #11839 [#11889]: #11889 [#11907]: #11907 [#11917]: #11917 Signed-off-by: Matei David <[email protected]>
adleong
added a commit
that referenced
this pull request
Jan 18, 2024
#11907) Whenever the destination controller's informer receives an update of a Server resource, it checks every portPublisher in the endpointsWatcher to see if the Server selects any pods in that servicePort and updates those pods' opaque protocol field. Regardless of if any pods were matched or if the opaque protocol changed, an update is sent to each listener. This results in an update to every endpointTranslator each time a Server is updated. During a resync, we get an update for every Server in the cluster which results in N updates to each endpointTranslator where N is the number of Servers in the cluster. If N is greater than 100, it becomes possible that these N updates could overflow the endpointTranslator update queue if the queue is not being drained fast enough. We change this to only send the update for a Server if at least one of the servicePort addresses was selected by that server AND it's opaque protocol field changed. Signed-off-by: Alex Leong <[email protected]>
Merged
adleong
added a commit
that referenced
this pull request
Jan 19, 2024
This stable release adds a cni-repair-controller which fixes the issue of injected pods that cannot acquire proper network config because linkerd-cni and/or the cluster's network CNI haven't fully started ([#11699]). It also fixes a bug in the destination controller where having a large number of Server resources could cause the destination controller to use an excessive amount of CPU ([#11907]). Finally, it fixes a conflict with tap resource shortnames which was causing warnings from kubectl v1.29.0+ ([#11816]). [#11699]: #11699 [#11907]: #11907 [#11816]: #11816
5 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Whenever the destination controller's informer receives an update of a Server resource, it checks every portPublisher in the endpointsWatcher to see if the Server selects any pods in that servicePort and updates those pods' opaque protocol field. Regardless of if any pods were matched or if the opaque protocol changed, an update is sent to each listener. This results in an update to every endpointTranslator each time a Server is updated. During a resync, we get an update for every Server in the cluster which results in N updates to each endpointTranslator where N is the number of Servers in the cluster.
If N is greater than 100, it becomes possible that these N updates could overflow the endpointTranslator update queue if the queue is not being drained fast enough.
We change this to only send the update for a Server if at least one of the servicePort addresses was selected by that server AND it's opaque protocol field changed.